the pass@1 of deepseek-v3.1-base in lcb benchmark

#57
by wjmcat - opened

Description

I have tested the pass@1 of deepseek-v3-base before, and the score was 30.39 (easy/medium/hard -> 84.54 / 24.82 / 4.92). However, when I test the pass@1 of deepseek-v3.1-base now, the score is 19.72 (60.85/12.85/2.21).

What is the internal evaluation of your deepseek - v3.1 - base on livecodebench?

Reproduction

I follow the latest code of livecodebench (https://github.com/LiveCodeBench/LiveCodeBench) to evaluate the deepseek - v3.1 - base model.

If my evaluation score is wrong, can you give the reproduction scripts ? Looking forward to hearing from you.

Sign up or log in to comment