Model Card for CropSeek-LLM
CropSeek-LLM is a fine-tuned language model designed to provide insights and recommendations for crop optimization. It is based on the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
model and has been fine-tuned using the DARJYO/sawotiQ29_crop_optimization
dataset. The model is optimized for answering questions related to crop planting, soil conditions, pest control, irrigation, and other agricultural practices.
Model Details
Model Description
CropSeek-LLM is a fine-tuned version of the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
model, adapted for crop optimization tasks. It has been trained using LoRA (Low-Rank Adaptation) to efficiently fine-tune the base model on a dataset of crop-related questions and answers. The model is designed to assist farmers, agronomists, and researchers in making informed decisions about crop management.
- Developed by: persadian, DARJYO
- Model type: Causal Language Model (Fine-tuned with LoRA)
- Language(s) (NLP): English
- License: DARJYO License v1.0
- Finetuned from model:
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- Hardware used for training: Tesla T4 GPU
Uses
Direct Use
CropSeek-LLM can be used directly to answer questions related to crop optimization, such as:
- Optimal planting seasons for specific crops.
- Ideal soil conditions for crop growth.
- Natural pest control methods.
- Best irrigation practices.
- Crop rotation strategies.
Downstream Use
CropSeek-LLM can be integrated into agricultural advisory systems, mobile apps, or chatbots to provide real-time recommendations to farmers and agronomists.
Out-of-Scope Use
- Medical Advice: This model is not designed to provide medical or health-related advice.
- Financial Decisions: The model should not be used for financial or investment decisions.
- Non-Agricultural Use: The model is specifically fine-tuned for crop optimization and may not perform well in unrelated domains.
Bias, Risks, and Limitations
- Data Bias: The model is trained on a dataset focused on specific crops and regions. It may not generalize well to all crops or geographical areas.
- Limited Scope: The model is designed for crop optimization and may not provide accurate answers for unrelated topics.
- Ethical Concerns: The model should not replace professional advice from agronomists or agricultural experts.
Recommendations
Users should:
- Verify the model's recommendations with local agricultural experts.
- Be aware of the model's limitations and use it as a supplementary tool, not a replacement for professional advice.
- Report any biases or inaccuracies to the developers for improvement.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the fine-tuned model
model = AutoModelForCausalLM.from_pretrained("persadian/CropSeek-LLM", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("persadian/CropSeek-LLM")
# Example inference
input_text = "What is the best planting season for cabbages in South Coast, Durban?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
The model was fine-tuned on a curated dataset of agricultural texts, including:
- Crop descriptions and classifications.
- Plant disease symptoms and treatments.
- Farming techniques and best practices.
- Regional agricultural guidelines.
Specific dataset used: DARYJO/sawotiQ29_crop_optimization
Training Procedure
Preprocessing
- The dataset was cleaned and preprocessed to remove irrelevant information and ensure consistency.
- Text data was tokenized using the tokenizer associated with the base model.
- Data augmentation techniques, such as synonym replacement and paraphrasing, were applied to improve generalization.
Training Hyperparameters
- Training regime: Mixed precision (fp16)
- Batch size: 16
- Learning rate: 2e-5
- Epochs: 3
- Optimizer: AdamW
- Weight decay: 0.01
- Warmup steps: 500
Speeds, Sizes, Times
- Training time: Approximately 10 hours on a T4 GPU.
- Checkpoint size: 1.5 GB
- Throughput: 120 samples/second
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on a held-out test set of agricultural queries, including crop identification, disease diagnosis, and farming recommendations.
[https://huggingface.co/datasets/DARJYO/sawotiQ29_crop_optimization]
Factors
Evaluation was disaggregated by:
- Crop type (cereals, fruits, vegetables).
- Disease type (fungal, bacterial, viral).
- Geographic region (tropical, temperate).
Metrics
- Accuracy: 92% on crop identification tasks.
- Precision/Recall/F1-score: Precision: 0.89, Recall: 0.91, F1-score: 0.90
- Latency: Average response time of 0.5 seconds on a T4 GPU.
Results
- The model achieved high accuracy on crop identification and disease diagnosis tasks.
- Performance was slightly lower for region-specific recommendations due to limited training data for certain regions.
Summary
CropSeek-LLM performs well on a wide range of agricultural tasks, making it a useful tool for farmers and agricultural professionals. However, performance may vary for rare crops or region-specific practices.
Model Examination
- The model was examined using interpretability tools such as attention visualization and feature importance analysis.
Key findings include:
- The model relies heavily on symptom descriptions for disease diagnosis.
- Crop-specific keywords play a significant role in crop identification tasks.
Environmental Impact
Carbon emissions estimated.
- Hardware Type: T4 GPU
- Hours used: 10 hours
- Cloud Provider: Google Colab
- Compute Region: us-central1
- Carbon Emitted: Approximately 0.5 kg CO2eq
Technical Specifications
Model Architecture and Objective
- Base model architecture: deepseek-ai/deepseek-R1-14B
- Objective: Fine-tuned for text generation and classification tasks in the agricultural domain.
Compute Infrastructure
Hardware
- Training hardware: Google Colab with T4 GPU.
Software
- Frameworks: PyTorch, Hugging Face Transformers.
- Libraries: Datasets, Tokenizers, Accelerate.
Citation
BibTeX: @misc{cropseek-llm, author = {persadian~Darshani Persadh, DARJYO}, title = {CropSeek-LLM: A Fine-Tuned Language Model for Agricultural Applications}, year = {2023}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/persadian/CropSeek-LLM}}, }
APA: persadian. Darshani Persadh (2023). CropSeek-LLM: A Fine-Tuned Language Model for Agricultural Applications. Hugging Face. https://huggingface.co/persadian/CropSeek-LLM
Glossary
- Mixed precision: Training using both 16-bit and 32-bit floating-point numbers to improve efficiency.
More Information
For more details, visit the CropSeek-LLM space on Hugging Face.
Model Card Authors
- persadian ~Darshani Persah
Model Card Contact
- Downloads last month
- 10