IslamMesabah
/

CoderAPI

Text2Text Generation

Model card Files Files and versions Community

IslamMesabah commited on Mar 28, 2024

Commit

159aa21

·

verified ·

1 Parent(s): c4b5138

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -1,3 +1,16 @@
 ---
 license: mit
 ---

 ---
 license: mit
+datasets:
+- IslamMesabah/CoderAPI_Dataset
+language:
+- en
+metrics:
+- bleu
+- code_eval
+tags:
+- cod
+- API
 ---
+### Large Language Models for instructed and effective code generation using Documentation of APIs
+This thesis explores the effective utilization of Large Language Models, specifically the Instruct CodeT5+ 16 Billion model, for the generation of multi-line, ready-to-execute code in Python. Departing from conventional reliance solely on pre-trained LLM knowledge, we employ API documentation to enhance the correctness of generated code for both seen and unseen APIs in the LLM knowledge. We utilize the Retrieval-Augmented Generation technique to incorporate user intents expressed in English, specifically targeting APIs, to select the most suitable segments from the relevant API documentation. Subsequently, these user intents and API documentation segments are utilized in model prompt engineering and fine-tuning procedures. We collect a newly synthesized dataset comprising 938 data points encompassing 46 distinct APIs. Furthermore, we demonstrate significant advancements in code generation accuracy and utility, resulting in a remarkable 0.2 increase in ICE score and a 0.33% elevation in CodeBLEU. Our experimental evaluation provides valuable insights into code generation complexities, including the impact of seen and unseen API documentation on model performance and the effectiveness of prompt engineering strategies. This work underscores the importance of leveraging natural language processing techniques to address real-world challenges in software engineering, with implications for automated software development and enhanced developer productivity.