Model Card for Fine-tuned Phi-3.5-mini-instruct for MCQ Generation

Model Details

Model Description

This model is a fine-tuned version of unsloth/Phi-3.5-mini-instruct (an optimized 4-bit version of microsoft/Phi-3-mini-4k-instruct). It has been fine-tuned using Low-Rank Adaptation (LoRA) specifically for the task of generating multiple-choice questions (MCQs) in JSON format based on provided context text. The fine-tuning was performed using the script provided in the context.

Developed by: Fine-tuned based on the provided script. Base model by Microsoft. Optimization by Unsloth AI.
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: Language Model (Phi-3 architecture) fine-tuned with QLoRA.
Language(s) (NLP): English
License: The base model microsoft/Phi-3-mini-4k-instruct is licensed under the MIT License. The fine-tuned adapters are subject to the base model's license and potentially the license of the training data (asanchez75/medical_textbooks_mcq). Unsloth code is typically Apache 2.0. Please check the specific licenses for compliance.
Finetuned from model: unsloth/Phi-3.5-mini-instruct (4-bit quantized version).

Model Sources [optional]

Repository: [More Information Needed - Link to where the fine-tuned adapters are hosted, if applicable]
Paper [optional]: [Link to Phi-3 Paper, e.g., https://arxiv.org/abs/2404.14219]
Demo [optional]: [More Information Needed]

Uses

Direct Use

This model is intended for generating multiple-choice questions (MCQs) in a specific JSON format, given a piece of context text. It requires using the specific prompt structure employed during training (see Preprocessing section). The primary use case involves loading the base unsloth/Phi-3.5-mini-instruct model (in 4-bit) and then applying the saved LoRA adapters using the PEFT library.

Downstream Use [optional]

Could be integrated into educational tools, content creation pipelines for medical training materials, or automated assessment generation systems within the medical domain.

Out-of-Scope Use

Generating text in formats other than the targeted MCQ JSON structure.
Answering general knowledge questions or performing tasks unrelated to MCQ generation from context.
Use in domains significantly different from the medical textbook context used for training (performance may degrade).
Use without the specific prompt format defined during training.
Generating harmful, biased, or inaccurate content.
Any use violating the terms of the base model license or the dataset license.

Bias, Risks, and Limitations

Inherited Bias: The model inherits biases present in the base Phi-3 model and the asanchez75/medical_textbooks_mcq training dataset, which is derived from medical literature.
Accuracy: Generated MCQs may be factually incorrect, nonsensical, or poorly formulated. The correctness of the identified "correct_option" is not guaranteed.
Format Adherence: While trained to output JSON, the model might occasionally fail to produce perfectly valid JSON or might include extraneous text.
Domain Specificity: Performance is likely best on medical contexts similar to the training data. Performance on other domains or highly dissimilar medical texts is unknown.
Quantization: The use of 4-bit quantization (QLoRA) may slightly impact performance compared to a full-precision model, although Unsloth optimizations aim to minimize this.
Context Dependence: Output quality is highly dependent on the clarity and information content of the provided input context.
Limited Evaluation: The model was only evaluated qualitatively on one example from the training set within the script. Rigorous evaluation across a dedicated test set was not performed.

Recommendations

Verification: Always verify the factual accuracy, grammatical correctness, and appropriateness of generated MCQs before use.
Prompting: Use the specific prompt structure detailed in the "Preprocessing" section for optimal results.
Testing: Thoroughly test the model's performance on your specific use case and data distribution.
Bias Awareness: Be mindful of potential biases inherited from the base model and training data.
JSON Parsing: Implement robust JSON parsing with error handling for the model's output.

How to Get Started with the Model

Use the code below to load the 4-bit base model, apply the fine-tuned LoRA adapters, and run inference. Replace "path/to/your/saved/adapters/" with the actual path where you saved the adapter files (adapter_model.safetensors, adapter_config.json, etc.) and the tokenizer (tokenizer.json, etc.).

import torch
from transformers import AutoTokenizer
from unsloth import FastLanguageModel
from peft import PeftModel
import json # For parsing output

# --- Configuration ---
base_model_name = "unsloth/Phi-3.5-mini-instruct"
adapter_path = "path/to/your/saved/adapters/" # <--- CHANGE THIS
max_seq_length = 4096

# --- 1. Load Base Model and Tokenizer (4-bit) ---
print("Loading base model and tokenizer...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = base_model_name,
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True, # Load base in 4-bit
    device_map = "auto",
)
print("Base model loaded in 4-bit.")

# Set padding token if necessary
if tokenizer.pad_token is None:
    if tokenizer.pad_token_id is None:
        tokenizer.pad_token = tokenizer.eos_token
    else:
        tokenizer.pad_token = tokenizer.convert_ids_to_tokens(tokenizer.pad_token_id)
tokenizer.padding_side = 'right'
print(f"Tokenizer pad token: {tokenizer.pad_token}, ID: {tokenizer.pad_token_id}")

# --- 2. Load LoRA Adapters ---
print(f"Loading LoRA adapters from {adapter_path}...")
# Load adapters onto the base model
model = PeftModel.from_pretrained(model, adapter_path)
print("LoRA adapters loaded.")

# --- 3. Prepare for Inference ---
print("Preparing combined model for inference...")
FastLanguageModel.for_inference(model)
print("Model ready for inference.")

# --- 4. Prepare Inference Prompt ---
test_context = "Human beings are fallible and it is in their nature to make mistakes. An error of omission occurs when a necessary action has not been taken." # Example context
inference_prompt = f"<|user|>\nContext:\n{test_context}\n\nGenerate ONE valid multiple-choice question based strictly on the context above. Output ONLY the valid JSON object representing the question.\nMCQ JSON:<|end|>\n<|assistant|>\n"

inputs = tokenizer(inference_prompt, return_tensors="pt", truncation=True, max_length=max_seq_length).to("cuda")

# --- 5. Generate Output ---
print("Generating MCQ JSON...")
with torch.no_grad():
    outputs = model.generate(
        input_ids = inputs["input_ids"],
        max_new_tokens=512,        # Max length for the generated JSON
        temperature=0.1,           # Low temperature for more deterministic output
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
    )

# Decode the generated part
output_ids = outputs[0][inputs["input_ids"].shape[1]:]
generated_json_part = tokenizer.decode(output_ids, skip_special_tokens=True).strip()

print("\n--- Generated Output ---")
print(generated_json_part)

# --- 6. (Optional) Validate JSON ---
try:
    # Clean up potential markdown fences
    if generated_json_part.startswith("```json"):
        generated_json_part = generated_json_part[len("```json"):].strip()
    if generated_json_part.endswith("```"):
        generated_json_part = generated_json_part[:-len("```")].strip()

    parsed_json = json.loads(generated_json_part)
    print("\nGenerated JSON Parsed Successfully:")
    print(json.dumps(parsed_json, indent=2))
except json.JSONDecodeError as e:
    print(f"\nGenerated output IS NOT valid JSON. Error: {e}")

Example Output

The model aims to generate a valid JSON object structured like the example below. Note that while the training prompt focused on specific keys (question, options, correct_option), the model might also generate related fields like explanation based on patterns learned from the training data.

{
  "question": "What is the maximum duration of a temporary ban from practising as a disciplinary sanction in the medical profession?",
  "option_a": "1 year",
  "option_b": "2 years",
  "option_c": "3 years",
  "option_d": "5 years",
  "correct_option": "C",
  "explanation": "The correct answer is C, which states that the maximum duration of a temporary ban from practising as a disciplinary sanction in the medical profession is 3 years. This information is explicitly stated in the text, which mentions that a temporary ban from practising may be imposed for a maximum of three years. The other options are incorrect because they either underestimate or overestimate the maximum duration of the ban."
}