Akhil Theerthala PRO
AI & ML interests
Recent Activity
Organizations
View Running Jobs Count from the User Menu
Key Contributions:
- FinForge Framework: A hybrid pipeline integrating manual/programmatic corpus construction with rigorous LM-based synthesis.
- FinForge-5k Dataset: A new snapshot benchmark comprising over 5,000 human-validated Q&A pairs across 11 financial subdomains, derived from a curated corpus of 100,000 verified documents (143M tokens).
- Benchmarking Results: Evaluation of state-of-the-art open and closed-source models reveals significant variance in financial reasoning capabilities, with leading models achieving approximately 80% accuracy.
Huge thanks to my co-authors @glennmatlin , Anant Gupta, Anirudh JM, Rayan Castilla, and Yi Mei Ng for this collaboration.
You can read the paper: FinForge: Semi-Synthetic Financial Benchmark Generation (2601.06747)
Diversity Vs Density: A data strategy comparison for fine-tuning VLMs
In just one year, the Cosmos ecosystem has grown rapidly:
๐ง Cosmos Reason and Cosmos Predict have surpassed 2 MILLION downloads each on @HuggingFace , topping physical AI leaderboards
๐ Cosmos Transfer is enabling adaptation across domains and tasks
๐ฎ Cosmos Cookbook is the go-to hub for recipes from developers and partners like Uber and IntBot.
Thank you to our amazing developer community for making this possible. Here's to pushing the boundaries of world foundation models together!
๐ง๐ปโ๐ณRead the Cosmos Cookbook: https://nvda.ws/4qevli8
๐ Explore Models & Datasets: https://huggingface.co/collections/nvidia/nvidia-cosmos-2
I have always wanted to do an ablation study on this and recently I got the chance to do exactly that. Why? In applied domains like robotics, manufacturing, or banking, we rarely have the luxury of internet-scale diverse image datasets. We are often "Data Poor" in terms of diversity but "Data Rich" in depth.
The takeaway? Density is efficient for facts but dangerous for reasoning (logical collapse) if you don't have larger scale data.
More details:
https://huggingface.co/blog/Akhil-Theerthala/diversity-density-for-vision-language-models
Great.
For 1, Indeed, that was a mistake on my part, I have done the validations but I forgot to add them over here. I orignally had a new section in "The Setup" which talked about using Qwen3-8b-thikning as the baseline performance. It showed ~52.8% on the validation set. I was under the impression that I replaced the section with the updated figure. I missed the fact that the figure wasn't updated. Will go back and change the figure now with the baselines of Qwen3-8B-Thinking on them.
Indeed the various other categories for the validation is an interesting idea to work on, will add that. Mainly would start by a deeper analysis of question_format Vs Validation. Then would go around and think about adding more sections like target_object category, orientation etc.
Hey, Thanks a lot for the amazing response!
I plan to expand upon this. The reason I chose this as a checkpoint is because of two key reasons, one being that for the current data scale, this was the best I could do in the budgets that I had. Secondly, I wanted to test the waters first with a smaller scale data. I trained these until the losses kept diverging to have a good estimate.
I also have to get better teacher models, as currently the chain of thought is generated by GPT-5-mini for the training set. I would like to follow OpenThoughts and use Qwen3-32B or Qwen3-Next family models to get better inference.
Additionally I want to see add more VL models like Molmo2-8B and also gemma3 12b with kimi-vl. But again that scope creep had to be controlled.
As for training on differnet datasets after each epoch, that can be done. However, that's still out of scope for the main question as the goal was to see if density approach could help or should we strictly prefer diversity.
The collection of artefacts for the blogpost can be found here
https://hf.co/collections/Akhil-Theerthala/density-vs-diversity-blogpost
The collection of artefacts for the blogpost can be found here
https://hf.co/collections/Akhil-Theerthala/density-vs-diversity-blogpost