We are an Applied Reinforcement Learning research lab dedicated to bridging the gap between theoretical advances and practical applications. Our work focuses on developing robust and scalable RL systems that can operate effectively in complex, real world environments.
While AI applications have seen dramatic advances, key gaps remain. Current AI systems excel at narrow, well defined tasks, but struggle with complex, multi-level objectives that unfold over extended time horizons. This fundamental limitation constrains the practical utility in real world scenarios, where a human has to always check the work done by AI. This is clearly evident in domains where AI took off: code generation and search. Both these domains have fast feedback loops, and easy to check for errors.
Humans need systems that can independently handle uncertainly, backtrack or try different approaches, and maintain coherent goal oriented behavior for hours. In other words, humans need reliable AI agents they can confidently delegate their tasks to, not a copilot they have to handhold at every step. This is what we are building at Clio AI.
Working on accurate agents for industrial automation with RL. We are deploying AI agents that are dependable, especially in high stakes industries where mistakes are costly. Our work focuses on building agents that understand how to deal with messy and ambiguous inputs, reflect, backtrack, and verify their outputs, all in a single RL run.
Scaling test time compute beyond the current limits, to solve for multi-step tasks. Real world tasks often have sparse rewards and require huge amount of data to get right. Our research works on extending the current paradigms (max limit of two hours for o3) to automate real world tasks without human intervention.
Engineering collaborative AI teams with multi turn RL is a focus for every big lab. We are looking at coordination aspects in working towards a shared objective in dynamic real world environments.
Designing Adaptive RL systems that learn from experience, improve with a feedback loop. Our primary work here is this paper which adapts a model to any domain at 10% compute, enabling teams to work with RL on domain specific problems, and can be trained over a weekend.
We are working on automating compliance for life sciences. Our state of the art agent acts as a FDA digital twin to generate your INDs, NDAs, 510(k)s as soon as lab notes are updated. We estimate that teams can increase patent life and get to market faster by almost a year using this agent.
Here is a quick demo of the setup and how it works. If this is interesting, would love to understand your use case in detail. Please get in touch at ankit@clioapp.ai.
We are researchers, engineers, and builders who have worked in AI and GTM teams at companies like Tokopedia (Indonesian Super App), Tiki (vietnamese Amazon), Icertis (Contract intelligence), O9 Solutions (Enterprise AI unicorn), Urban Company, Plivo. Built an AI travel chatbot in 2017 (intent based), and implemented semantic search (RAG) in Tokopedia in 2020.
You can check out our latest research here
For research collaborations, technical discussions or other queries:
Email: ankit@clioapp.ai