File size: 1,629 Bytes
6dabb9a cc0e13a 6dabb9a f42b279 6dabb9a f42b279 6dabb9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# roberta_python
---
language: code
datasets:
- code_search_net
- Fraser/python-lines
tags:
- python
- code
- masked-lm
widget:
- text "assert 6 == sum([i for i in range(<mask>)])"
---
# Details
This is a roBERTa-base model trained on the python part of [CodeSearchNet](https://github.com/github/CodeSearchNet) and reached a dev perplexity of 3.296
This model was used for the Programming Puzzles enumerative solver baseline detailed in [Programming Puzzles paper](https://arxiv.org/abs/2106.05784).
See also the [Python Programming Puzzles (P3) Repository](https://github.com/microsoft/PythonProgrammingPuzzles) for more details.
# Usage
You can either load the model and further fine-tune it for a target task (as done for the puzzle solver), or you can experiment with mask-filling directly with this model as in the following example:
```python
from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
tokenizer = AutoTokenizer.from_pretrained("tals/roberta_python")
model = AutoModelWithLMHead.from_pretrained("tals/roberta_python")
demo = pipeline("fill-mask", model=model, tokenizer=tokenizer)
code = """sum= 0
for i in range(<mask>):
sum += i
assert sum == 6
"""
demo(code)
```
# BibTeX entry and citation info
```bibtex
@inproceedings{
schuster2021programming,
title={Programming Puzzles},
author={Tal Schuster and Ashwin Kalyan and Alex Polozov and Adam Tauman Kalai},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
year={2021},
url={https://openreview.net/forum?id=fe_hCc4RBrg}
}
```
|