Reasoning Models
Our research in reasoning models focuses on creating AI training algorithms that enable models to reason in a deeper and more generalised way.
Current RL training for LLMs focuses on verifiable rewards, which limits the ability to reason in non-verifiable domains that are more fuzzy, open-ended, and linguistic.
Enabling deeper reasoning in these areas is crucial for real-world domains such as linguistic logic, law, and science.
Our reward assignment strategy allows for the extension of reward attribution to every textual domain, including non-verifiable ones.
Deeper Reasoning
Creating AI training algorithms that enable models to reason in a deeper and more generalised way
Beyond Verifiable Rewards
Moving beyond current RL training limitations to handle fuzzy, open-ended, and linguistic domains
Real-World Applications
Enabling deeper reasoning for linguistic logic, law, and science domains
Universal Rewards
Extending reward attribution to every textual domain, including non-verifiable ones