File size: 1,845 Bytes
da804e0
19e7282
42c2d33
f91d54c
 
42c2d33
 
 
 
 
 
 
 
 
 
 
 
 
73353ee
42c2d33
 
 
 
f91d54c
 
e6c12e5
c4ec4a6
e6c12e5
 
c4ec4a6
e6c12e5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
A natural way to evaluate code programs is to see if they pass unit tests, it is the idea behind the [pass@k](https://huggingface.co/metrics/code_eval) metric, a popular evaluation framework for code generation models, on [HumanEval](https://huggingface.co/datasets/openai_humaneval) dataset, which was introduced in [Codex paper](https://arxiv.org/pdf/2107.03374v2.pdf). The dataset includes 164 handwritten programming problems. In the pass@k metric, k code samples are generated per problem, and a problem is considered solved if any sample passes the unit tests and the total fraction of problems solved is reported.
In most papers, 200 candidate program completions are sampled, and pass@1, pass@10, and pass@100 are computed using an unbiased sampling estimator. Table 1 below shows the HumanEval scores of CodeParrot, InCoder, PolyCoder, CodeGen and Codex (not open-source).

<div align="center">

 Model | pass@1 | pass@10 | pass@100|
|-------|--------|---------|---------|
|CodeParrot (110M) | 3.80% | 6.57% | 12.78% | 
|CodeParrot (1.5B) | 3.58% | 8.03% | 14.96% |
|||||
|InCoder (6.7B) | 15.2% | 27.8% | 47.00% |
|||||
|PolyCoder (160M)| 2.13% | 3.35% | 4.88% |
|PolyCoder (400M)| 2.96% | 5.29% | 11.59% |
|PolyCoder (2.7B)| 5.59% | 9.84% | 17.68% |
|||||
|CodeGen-Mono (350M)| 12.76% | 23.11% | 35.19% |
|CodeGen-Mono (2.7B)| 23.70% | 36.64% | 57.01% |
|CodeGen-Mono (6.1B)| 26.13% | 42.29% | 65.82% |
|CodeGen-Mono (16.1B)| **29.28%** | **49.86%** | **75.00%** |
|||||
|Codex (25M)| 3.21% | 7.1% |	12.89%|
|Codex (300M)| 13.17%| 20.37% | 36.27% |
|Codex (12B)| 28.81%| 46.81% | 72.31% |  

</div>

For better visualization, we plot the pass@100 for the models above by model size.
<p align="center">
    <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/[email protected]" alt="drawing" width="550"/>
</p>