Create readme
Browse files
readme
ADDED
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# **Classification Models Repository**
|
2 |
+
|
3 |
+
Welcome to the repository containing multiple classification models trained and evaluated on a news classification task. Below, you'll find details about each model, including its functionality, performance metrics, and potential use cases.
|
4 |
+
|
5 |
+
---
|
6 |
+
|
7 |
+
## **Models Overview and Performance**
|
8 |
+
|
9 |
+
This repository includes the following models with their corresponding accuracies:
|
10 |
+
|
11 |
+
| Model Name | Accuracy |
|
12 |
+
|------------------------------|----------|
|
13 |
+
| Logistic Regression Model | 0.97 |
|
14 |
+
| Decision Tree Model | 0.94 |
|
15 |
+
| Gradient Boosting Classifier | 0.91 |
|
16 |
+
| Random Forest Classifier | 0.99 |
|
17 |
+
|
18 |
+
---
|
19 |
+
|
20 |
+
## **Model Descriptions**
|
21 |
+
|
22 |
+
### **1. Logistic Regression Model**
|
23 |
+
#### Overview:
|
24 |
+
Logistic Regression is a statistical model used for binary or multiclass classification. It predicts the probability of an instance belonging to a specific class using a sigmoid function.
|
25 |
+
|
26 |
+
#### Features:
|
27 |
+
- Fast and computationally efficient.
|
28 |
+
- Performs well on linearly separable data.
|
29 |
+
- Provides probabilistic predictions.
|
30 |
+
|
31 |
+
#### Use Cases:
|
32 |
+
- Classifying Arabic news articles into predefined categories (e.g., politics, sports, technology).
|
33 |
+
- Interpretable model with coefficients indicating feature importance.
|
34 |
+
|
35 |
+
---
|
36 |
+
|
37 |
+
### **2. Decision Tree Model**
|
38 |
+
#### Overview:
|
39 |
+
The Decision Tree model builds a tree-like structure where each node represents a decision rule and each leaf represents a class label. It is simple yet powerful for many classification tasks.
|
40 |
+
|
41 |
+
#### Features:
|
42 |
+
- Easy to interpret and visualize.
|
43 |
+
- Handles both numerical and categorical data.
|
44 |
+
- Prone to overfitting on noisy data.
|
45 |
+
|
46 |
+
#### Use Cases:
|
47 |
+
- Classifying Arabic news articles into different categories.
|
48 |
+
- Tasks where interpretability is crucial.
|
49 |
+
|
50 |
+
---
|
51 |
+
|
52 |
+
### **3. Gradient Boosting Classifier**
|
53 |
+
#### Overview:
|
54 |
+
Gradient Boosting is an ensemble learning method that builds multiple weak learners (typically decision trees) and combines them to improve overall performance.
|
55 |
+
|
56 |
+
#### Features:
|
57 |
+
- Excellent for handling non-linear relationships.
|
58 |
+
- Robust to overfitting with proper hyperparameter tuning.
|
59 |
+
- Handles imbalanced datasets well.
|
60 |
+
|
61 |
+
#### Use Cases:
|
62 |
+
- Classifying complex Arabic news articles with nuanced patterns.
|
63 |
+
- Scenarios requiring high predictive performance.
|
64 |
+
|
65 |
+
---
|
66 |
+
|
67 |
+
### **4. Random Forest Classifier**
|
68 |
+
#### Overview:
|
69 |
+
Random Forest is a powerful ensemble method that builds multiple decision trees and averages their predictions to improve accuracy and reduce overfitting.
|
70 |
+
|
71 |
+
#### Features:
|
72 |
+
- High accuracy and robustness to noise.
|
73 |
+
- Handles large datasets with higher dimensionality.
|
74 |
+
- Reduces overfitting compared to individual decision trees.
|
75 |
+
|
76 |
+
#### Use Cases:
|
77 |
+
- Predicting the category of Arabic news articles.
|
78 |
+
- Applications requiring feature importance insights.
|
79 |
+
|
80 |
+
---
|
81 |
+
|
82 |
+
## **How to Use the Models**
|
83 |
+
|
84 |
+
All models are saved as `.joblib` files and can be easily loaded into your machine learning pipeline. Below is an example of how to use the **Random Forest Classifier** with Arabic news data:
|
85 |
+
|
86 |
+
```python
|
87 |
+
import joblib
|
88 |
+
|
89 |
+
# Load the model
|
90 |
+
model = joblib.load("RandomForestClassifier_model.joblib")
|
91 |
+
|
92 |
+
# Example input: Arabic news text
|
93 |
+
input_data = [
|
94 |
+
"أعلن المنتخب الوطني المغربي عن التشكيلة الرسمية التي ستشارك في المباراة القادمة ضمن تصفيات كأس العالم."
|
95 |
+
]
|
96 |
+
|
97 |
+
# Get prediction
|
98 |
+
prediction = model.predict(input_data)
|
99 |
+
print(f"Predicted class: {prediction}")
|