the pass@1 of deepseek-v3.1-base in lcb benchmark
#57
by
wjmcat
- opened
Description
I have tested the pass@1 of deepseek-v3-base before, and the score was 30.39 (easy/medium/hard -> 84.54 / 24.82 / 4.92). However, when I test the pass@1 of deepseek-v3.1-base now, the score is 19.72 (60.85/12.85/2.21).
What is the internal evaluation of your deepseek - v3.1 - base on livecodebench?
Reproduction
I follow the latest code of livecodebench (https://github.com/LiveCodeBench/LiveCodeBench) to evaluate the deepseek - v3.1 - base model.
If my evaluation score is wrong, can you give the reproduction scripts ? Looking forward to hearing from you.