Ersi Ni's picture

12 24

Ersi Ni

nilbot

·

nilbot

AI & ML interests

Transformers

Recent Activity

updated a collection 6 days ago

upvoted a paper 6 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

upvoted an article 19 days ago

G2P Shrinks Speech Models

View all activity

Organizations

None yet

nilbot's activity

updated a collection 6 days ago

towards AGI

7 items • Updated 6 days ago

upvoted a paper 6 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 9 days ago • 134

upvoted an article 19 days ago

Article

G2P Shrinks Speech Models

By

•

20 days ago

• 26

liked a model 19 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 24 days ago • 1.17M • 3.41k

updated a collection about 1 month ago

towards AGI

7 items • Updated 6 days ago

upvoted a paper about 1 month ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 330

upvoted an article about 1 month ago

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

By

•

Jan 15

• 41

upvoted an article about 2 months ago

Article

🌁#82: AI and ML in Real Life

By

•

Jan 7

• 16

updated a collection 3 months ago

Inbox

4 items • Updated Nov 25, 2024

upvoted a paper 3 months ago

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

Paper • 2411.14343 • Published Nov 21, 2024 • 7

liked a dataset 4 months ago

v2ray/anime-collection

Updated Nov 9, 2024 • 111 • 5

liked a model 4 months ago

mistralai/Ministral-8B-Instruct-2410

Updated Dec 6, 2024 • 48.3k • 435

liked a dataset 4 months ago

neuralwork/arxiver

Viewer • Updated Nov 1, 2024 • 63.4k • 713 • 359

liked 2 models 4 months ago

deepseek-ai/Janus-1.3B

Any-to-Any • Updated 29 days ago • 189k • 577

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Text Generation • Updated Oct 25, 2024 • 116k • • 2.02k

replied to PLB's post 4 months ago

Interesting, but how does this approach generalize to arbitrary user query / document domains? Would you need to train a separate network for each domain / dataset?

updated a collection 4 months ago

Inbox

4 items • Updated Nov 25, 2024