AK's picture

AK

akhaliq

·

_akhaliq

AI & ML interests

None yet

Recent Activity

updated a Space about 3 hours ago

akhaliq/anychat

upvoted a collection about 9 hours ago

commented on a paper about 23 hours ago

Atla Selene Mini: A General Purpose Evaluation Model

View all activity

Organizations

akhaliq's activity

updated a Space about 3 hours ago

Running on CPU Upgrade

Anychat

upvoted a collection about 9 hours ago

Tulu 3 Models

All models released with Tulu 3 -- state of the art open post-training recipes. • 10 items • Updated 1 day ago • 55

commented 2 papers about 23 hours ago

Atla Selene Mini: A General Purpose Evaluation Model

Paper • 2501.17195 • Published 3 days ago • 24 •

Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation

Paper • 2501.17749 • Published 1 day ago • 8 •

commented a paper about 24 hours ago

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Paper • 2501.16937 • Published 3 days ago • 3 •

commented 5 papers 2 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 2 days ago • 48 •

Optimizing Large Language Model Training Using FP4 Quantization

Paper • 2501.17116 • Published 2 days ago • 23 •

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published 2 days ago • 15 •

Open Problems in Mechanistic Interpretability

Paper • 2501.16496 • Published 3 days ago • 11 •

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

Paper • 2501.16372 • Published 8 days ago • 5 •

liked 2 Spaces 2 days ago

Running on A100

YuE

Qwen2.5 Max Demo

upvoted an article 2 days ago

Article

Welcome to Inference Providers on the Hub 🔥

3 days ago

• 171

commented 6 papers 3 days ago

iFormer: Integrating ConvNet and Transformer for Mobile Application

Paper • 2501.15369 • Published 5 days ago • 9 •

Feasible Learning

Paper • 2501.14912 • Published 6 days ago • 4 •

Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity

Paper • 2501.16295 • Published 3 days ago • 5 •

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published 3 days ago • 20 •

Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published 5 days ago • 41 •

Baichuan-Omni-1.5 Technical Report

Paper • 2501.15368 • Published 5 days ago • 45 •

liked a Space 3 days ago

Qwen2.5 VL 72B Instruct