Nitish Pandey's picture

Hiring 💼

1 35 9

Nitish Pandey

nitishpandey04

·

AI & ML interests

LLMs, Translation

Recent Activity

upvoted an article 7 days ago

Deriving the PPO Loss from First Principles

updated a collection 22 days ago

Classic Reinforcement Learning

updated a model 22 days ago

nitishpandey04/CarRacing-v3

View all activity

Organizations

upvoted an article 7 days ago

Article

Deriving the PPO Loss from First Principles

9 days ago

•

31

updated a collection 22 days ago

Classic Reinforcement Learning

solved classic rl environments • 2 items • Updated 22 days ago

updated a model 22 days ago

nitishpandey04/CarRacing-v3

Reinforcement Learning • Updated 22 days ago

published a model 22 days ago

nitishpandey04/CarRacing-v3

Reinforcement Learning • Updated 22 days ago

updated a model about 1 month ago

nitishpandey04/CartPole-v1

Reinforcement Learning • Updated Nov 30, 2025

updated a collection about 1 month ago

Classic Reinforcement Learning

solved classic rl environments • 2 items • Updated 22 days ago

published a model about 1 month ago

nitishpandey04/CartPole-v1

Reinforcement Learning • Updated Nov 30, 2025

updated a collection 3 months ago

Reading List

34 items • Updated Sep 29, 2025

upvoted an article 3 months ago

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Apr 16, 2025

•

58

commented on Prefill and Decode for Concurrent Requests - Optimizing LLM Performance 3 months ago

yup

liked a Space 4 months ago

The Ultra-Scale Playbook

The ultimate guide to training LLM on large GPU Clusters

updated a collection 4 months ago

Reading List

34 items • Updated Sep 29, 2025

upvoted a paper 4 months ago

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7, 2025 • 180

updated a collection 5 months ago

Reading List

34 items • Updated Sep 29, 2025

upvoted 3 papers 6 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Paper • 2507.05687 • Published Jul 8, 2025 • 27

SingLoRA: Low Rank Adaptation Using a Single Matrix

Paper • 2507.05566 • Published Jul 8, 2025 • 113