Reasoning Models

Our research in reasoning models focuses on creating AI training algorithms that enable models to reason in a deeper and more generalised way.

Current RL training for LLMs focuses on verifiable rewards, which limits the ability to reason in non-verifiable domains that are more fuzzy, open-ended, and linguistic.

Enabling deeper reasoning in these areas is crucial for real-world domains such as linguistic logic, law, and science.

Our reward assignment strategy allows for the extension of reward attribution to every textual domain, including non-verifiable ones.

Deeper Reasoning

Creating AI training algorithms that enable models to reason in a deeper and more generalised way

Beyond Verifiable Rewards

Moving beyond current RL training limitations to handle fuzzy, open-ended, and linguistic domains

Real-World Applications

Enabling deeper reasoning for linguistic logic, law, and science domains

Universal Rewards

Extending reward attribution to every textual domain, including non-verifiable ones