Haitao Mi
haitaominlp
AI & ML interests
Large Language Models
Recent Activity
upvoted
a
paper
about 1 month ago
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
upvoted
a
paper
3 months ago
The End of Manual Decoding: Towards Truly End-to-End Language Models
upvoted
a
paper
3 months ago
Every Question Has Its Own Value: Reinforcement Learning with Explicit
Human Values