Learning Smooth Reward Models with Temporal Difference for LLM RL and Inference
			
	
	Dan Zhang
zd21
		AI & ML interests
None yet
		Recent Activity
						updated
								a dataset
							
						about 1 month ago
						
					
						
						
						
						zd21/DataSciBench
						
						updated 
								a collection
							
						about 2 months ago
						
					TDRM
						
						updated
								a model
							
						about 2 months ago
						
					
						
						
						
						zd21/DeepSeek-TD1-PRM
						Organizations
None yet