Co-Reward is a self-supervised reinforcement learning method for LLM reasoning, which leverages contrastive agreement between original and rephrased q
AI & ML interests
Trustworthy Machine Learning and Reasoning
Recent Activity
View all activity
Organization Card
Trustworthy Machine Learning and Reasoning (TMLR) Group, an online-offline-mixed machine learning research group, locates in different cities, including Hong Kong, Melbourne, Shanghai, Nottingham and Sydney. We share the vision for the future ML technology: building trustworthy learning and reasoning algorithms, theories and systems.
models
30

TMLR-Group-HF/Entropy-Qwen2.5-7B
8B
•
Updated
•
13

TMLR-Group-HF/Entropy-Llama-3.2-3B-Instruct
4B
•
Updated
•
16

TMLR-Group-HF/Self-Certainty-Qwen3-1.7B-Base
2B
•
Updated
•
17

TMLR-Group-HF/Entropy-Qwen3-1.7B-Base
2B
•
Updated
•
20

TMLR-Group-HF/Majority-Voting-Qwen3-1.7B-Base
2B
•
Updated
•
17

TMLR-Group-HF/GT-Qwen3-1.7B-Base
2B
•
Updated
•
23

TMLR-Group-HF/Self-Certainty-Llama-3.2-3B-Instruct
4B
•
Updated
•
19

TMLR-Group-HF/GT-Llama-3.2-3B-Instruct
4B
•
Updated
•
18

TMLR-Group-HF/Majority-Voting-Llama-3.2-3B-Instruct
4B
•
Updated
•
20

TMLR-Group-HF/Majority-Voting-Qwen2.5-3B
3B
•
Updated
•
20
•
1