Rui Yang's picture

Rui Yang PRO

Ray2333

·

https://yangrui2015.github.io

YangRui2015

AI & ML interests

Deep Reinforcement Learning

Organizations

commented 2 papers 3 months ago

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

Paper • 2510.25897 • Published Oct 29, 2025 • 16 •

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14, 2025 • 27 •

commented a paper 8 months ago

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

Paper • 2506.00123 • Published May 30, 2025 • 35 •

New activity in microsoft/GUI-Actor-Verifier-2B 8 months ago

Update README.md

#1 opened 8 months ago by

commented a paper 8 months ago

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Paper • 2505.24846 • Published May 30, 2025 • 15 •

commented a paper 9 months ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5, 2025 • 25 •

New activity in Ray2333/GRM-Llama3.2-3B-rewardmodel-ft 9 months ago

Bug in readme implementation

#3 opened 9 months ago by

New activity in microsoft/Magma-8B 11 months ago

generation_args in the example

#10 opened 11 months ago by

New activity in EmbodiedBench/EB-Manipulation 11 months ago

Add dataset card

#1 opened 11 months ago by

New activity in Ray2333/Gemma-2B-rewardmodel-baseline 11 months ago

trained dataset and fine-tuned method

#1 opened 11 months ago by

commented 2 papers 11 months ago

Rethinking Diverse Human Preference Learning through Principal Component Analysis

Paper • 2502.13131 • Published Feb 18, 2025 • 37 •

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Paper • 2502.09560 • Published Feb 13, 2025 • 35 •

New activity in Ray2333/GRM-Llama3.2-3B-rewardmodel-ft 12 months ago

Update default tokenization behavior to "longest" in README

#2 opened 12 months ago by

New activity in Ray2333/GRM-Llama3.2-3B-rewardmodel-ft about 1 year ago

Model Size

#1 opened about 1 year ago by

commented a paper about 1 year ago

DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models

Paper • 2411.00836 • Published Oct 29, 2024 • 15 •

New activity in Ray2333/GRM-llama3-8B-sftreg about 1 year ago

Adding `safetensors` variant of this model

#3 opened about 1 year ago by

New activity in Ray2333/GRM-llama3-8B-sftreg over 1 year ago

Abnormally Large Memory Footprint?

#2 opened over 1 year ago by

Some weights of the model checkpoint at Ray2333/GRM-llama3-8B-sftreg were not used when initializing

#1 opened over 1 year ago by

New activity in Ray2333/gpt2-large-harmless-reward_model over 1 year ago

Load failed:There is no "pytorch_model.bin", how to load the model?

#3 opened over 1 year ago by

a bug when loading model

#2 opened over 1 year ago by