|
# roberta_python |
|
--- |
|
language: code |
|
datasets: |
|
- code_search_net |
|
- Fraser/python-lines |
|
tags: |
|
- python |
|
- code |
|
- masked-lm |
|
widget: |
|
- text "assert 6 == sum([i for i in range(<mask>)])" |
|
--- |
|
# Details |
|
This is a roBERTa-base model trained on the python part of [CodeSearchNet](https://github.com/github/CodeSearchNet) and reached a dev perplexity of 3.296 |
|
|
|
This model was used for the Programming Puzzles enumerative solver baseline detailed in [Programming Puzzles paper](https://arxiv.org/abs/2106.05784). |
|
|
|
See also the [Python Programming Puzzles (P3) Repository](https://github.com/microsoft/PythonProgrammingPuzzles) for more details. |
|
|
|
# Usage |
|
|
|
You can either load the model and further fine-tune it for a target task (as done for the puzzle solver), or you can experiment with mask-filling directly with this model as in the following example: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("tals/roberta_python") |
|
model = AutoModelWithLMHead.from_pretrained("tals/roberta_python") |
|
|
|
demo = pipeline("fill-mask", model=model, tokenizer=tokenizer) |
|
|
|
code = """sum= 0 |
|
for i in range(<mask>): |
|
sum += i |
|
assert sum == 6 |
|
""" |
|
demo(code) |
|
``` |
|
|
|
# BibTeX entry and citation info |
|
|
|
```bibtex |
|
@inproceedings{ |
|
schuster2021programming, |
|
title={Programming Puzzles}, |
|
author={Tal Schuster and Ashwin Kalyan and Alex Polozov and Adam Tauman Kalai}, |
|
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)}, |
|
year={2021}, |
|
url={https://openreview.net/forum?id=fe_hCc4RBrg} |
|
} |
|
``` |
|
|