Please add more details to readme

by xiaol - opened 15 days ago

Discussion

xiaol

15 days ago

Would you consider to add more info ?It can be like your tweets on x, then i can draw more attention to this work.

OpenMOSE

Owner 15 days ago

•

edited 15 days ago

Hello!
Of course, I'll continue to improve the Readme!

Currently, I'm working alone on

Optimizing the Inference engine(focusing on triton flash attention)
Building an hxa079 fine-tuning framework using RWKV-LM-RLHF(SFT, offline RL etc)
implementing tests in llama.cpp( its very difficult :( head=96, 128 still not working well )
Coding HF compatible inference code

and am behind in some areas. please stay tuned :)

OpenMOSE

Owner 15 days ago

readme updated :)

xiaol

14 days ago

Since this is the first MOE-RWKV hybrid model, could we create a technical report for this project? Also, can we run some benchmarks to evaluate its performance? epicially we have new methods to finetune or RL.

Sorry to bring this up, due to we can make this work more meaningful and impactful

OpenMOSE

Owner 14 days ago

sorry, this is 'not' MoE model. just dense.

but, Qwen3-30BA3B also trained, but due to limitations in the MoE implementation of RWKV-Infer, the inference speed is very slow. I plan to release it after resolving this issue. If you don't mind the slow inference speed, i can release it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment