CodeGen-ft-python

Generate python code from natural language prompts.

Model Details

Model Description

This model is a fine-tuned variant of Salesforce/codegen-350M-mono, specialized for natural language to code generation in Python. It takes natural language instructions (e.g., “check MySQL database connection”) and generates the corresponding Python code snippet. The model was trained on a curated text-to-code dataset containing diverse programming instructions and function-level examples to improve semantic and syntactic accuracy.

  • Developed by: Akshay Bharadwaj

  • Model type: Transformer-based Causal Language Model

  • Language(s) (NLP): English (Prompts) and Python (Code Outputs)

  • License: MIT License

  • Finetuned from model [optional]: Salesforce/codegen-350M-mono

Uses

Direct Use

The model can be used for:

  • Translating natural language prompts into functional Python code.

  • Assisting in code autocompletion or boilerplate generation.

  • Supporting educational and prototyping environments.

Downstream Use

Can be integrated into:

  • Developer tools (IDE plugins or assistants).

  • Chatbots for code assistance or educational coding tutors.

  • LLM pipelines for multi-step reasoning or coding workflows.

Out-of-Scope Use

  • Generating production-level code without human review.

  • Security-critical or real-time applications (e.g., code execution automation).

  • Generation of malicious or unsafe code.

Bias, Risks, and Limitations

  • The model may produce incomplete or syntactically incorrect code for ambiguous prompts.

  • It can misinterpret vague natural language queries (semantic drift).

  • Potential bias toward common Python idioms and limited handling of rare libraries or APIs.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "akshayb/nl-code-gen-python"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "write a python function to check mysql database connection"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The dataset contains paired natural language descriptions and Python function implementations, collected and cleaned from public code repositories and text-to-code benchmarks (e.g., CodeXGLUE). Preprocessing involved deduplication, tokenization, and removal of incomplete code samples.

Evaluation

Metrics

For Comparison between Base Model and Fine-tuned model, we use the following metrics:

Metric Focus Strength
BLEU Token-level similarity Measures fluency and lexical accuracy
CodeBLEU Lexical + syntactic + semantic Captures holistic code quality
Exact Match String equality Strict correctness measure
Syntax Match AST structure Validates syntactic and logical integrity

Citation [optional]

BibTeX:

@misc{akshay2025nlcodegen,
  title={Natural Language to Code Generation (Fine-tuned CodeGen-350M)},
  author={Akshay Bharadwaj},
  year={2025},
  howpublished={\url{https://huggingface.co/akshayb/nl-code-gen-python}}
}
  • PEFT 0.7.2.dev0
Downloads last month
59
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akshaybharadwaj96/nl-code-gen-python

Adapter
(57)
this model

Dataset used to train akshaybharadwaj96/nl-code-gen-python

Space using akshaybharadwaj96/nl-code-gen-python 1