jlzhou (Junlin Zhou)

upvoted a paper 1 day ago

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Paper • 2502.18080 • Published 17 days ago • 2

upvoted an article 1 day ago

Article

Open R1: Update #3

By

and 9 others •

2 days ago

• 194

liked a model 2 days ago

open-r1/OlympicCoder-32B

Text Generation • Updated about 9 hours ago • 434 • 64

liked a dataset 2 days ago

smolagents/benchmark-v1

Viewer • Updated 10 days ago • 132 • 273 • 6

upvoted an article 4 days ago

Article

From Files to Chunks: Improving Hugging Face Storage Efficiency

Nov 20, 2024

• 52

reacted to Kseniase's post with 👍 11 days ago

Post

6071

9 types of "Chain-of-..." approaches:

Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:

1. Chain-of-Action-Thought (COAT) -> Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (2502.02508)
Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens.

2. Chain of Draft (CoD) -> Chain of Draft: Thinking Faster by Writing Less (2502.18600)
It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster

3. Chain-of-Agents -> Chain of Agents: Large Language Models Collaborating on Long-Context Tasks (2406.02818)
Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results

4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342
Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number

5. Chain-of-Shot Prompting (CoS) -> CoS: Chain-of-Shot Prompting for Long Video Understanding (2502.06428)
Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module.

6. Chain of Hindsight (CoH) -> Chain of Hindsight Aligns Language Models with Feedback (2302.02676)
Converts all feedback into sequences to fine-tune the model and refine outputs

7. Chain-of-Note (CoN) -> Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (2311.09210)
Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer

8. Chain of Diagnosis (CoD) -> CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301)
Transforms the diagnostic process into a diagnostic chain

9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok
Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors

liked a model 17 days ago

NousResearch/DeepHermes-3-Llama-3-8B-Preview

Text Generation • Updated about 8 hours ago • 17.4k • 294

liked a dataset 21 days ago

nvidia/HelpSteer2

Viewer • Updated Dec 18, 2024 • 21.4k • 3.47k • 409

commented on Distributed SFT with trl and DeepSpeed Part 2: Scaling Locally 23 days ago

I'm glad you found it helpful!

Yes, this is planned. I was originally planning to write an article about training with the training operator, but now I'm wondering if I should skip that and focus on training with the new trainer instead.

PS: Kubeflow is migrating their training component from v1 (Kubeflow Training Operator) to v2 (Kubeflow Trainer).

upvoted a paper 25 days ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 111

liked a dataset 25 days ago

cognitivecomputations/dolphin-r1

Viewer • Updated Jan 30 • 814k • 4.8k • 272

upvoted a paper 28 days ago

The Differences Between Direct Alignment Algorithms are a Blur

Paper • 2502.01237 • Published Feb 3 • 112

updated a model 29 days ago

tablegpt/TableGPT2-7B

Updated 29 days ago • 3.16k • 190

updated a collection 29 days ago

TableGPT2

Collection

3 items • Updated 28 days ago • 5

commented a paper about 1 month ago

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Paper • 2502.01584 • Published Feb 3 • 9 •

6

reacted to schuler's post with 👍 about 1 month ago

Post

7226

📢 New Research Alert: Making Language Models Smaller & Smarter!

Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.

The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.

🔑 Key Findings:
• 77% parameter reduction.
• Maintained model capabilities.
• Improved generalization.

Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm

2 replies

·

upvoted an article about 1 month ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

By

•

Feb 7

• 70

updated a model about 1 month ago

jlzhou/Qwen2.5-3B-Infinity-Instruct-0625

Text Generation • Updated Feb 8 • 124

liked a Space about 1 month ago

50

Open LLM Leaderboard Results PR Opener

🧐

Add results to model card from Open LLM Leaderboard

New activity in jlzhou/Qwen2.5-3B-Infinity-Instruct-0625 about 1 month ago

Adding Evaluation Results

#1 opened about 1 month ago by

jlzhou

Junlin Zhou

AI & ML interests

Recent Activity

Organizations

jlzhou's activity

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Open R1: Update #3

open-r1/OlympicCoder-32B

smolagents/benchmark-v1

From Files to Chunks: Improving Hugging Face Storage Efficiency

NousResearch/DeepHermes-3-Llama-3-8B-Preview

nvidia/HelpSteer2

s1: Simple test-time scaling

cognitivecomputations/dolphin-r1

The Differences Between Direct Alignment Algorithms are a Blur

tablegpt/TableGPT2-7B

TableGPT2

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

jlzhou/Qwen2.5-3B-Infinity-Instruct-0625

Open LLM Leaderboard Results PR Opener

Adding Evaluation Results