Spaces:
Running
Running
SidddhantJain
commited on
Commit
·
850a7ff
0
Parent(s):
Grason app was built for ai detection and humanizer
Browse files- README.md +178 -0
- README_deployment.md +164 -0
- STATUS.md +160 -0
- app.py +654 -0
- humanizer_app.py +823 -0
- humanizer_batch.py +329 -0
- humanizer_robust.py +300 -0
- humanizer_simple.py +249 -0
- requirements.txt +7 -0
- research_humanizer_dataset.csv +11 -0
README.md
ADDED
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 🤖➡️👨 AI Text Humanizer
|
2 |
+
|
3 |
+
An advanced tool to transform robotic, AI-generated text into natural, human-like writing that can bypass AI detection tools.
|
4 |
+
|
5 |
+
## 🚀 Features
|
6 |
+
|
7 |
+
- **Multiple AI Models**: Uses T5 and Pegasus models for diverse paraphrasing
|
8 |
+
- **Advanced Techniques**: Vocabulary diversification, sentence restructuring, natural flow enhancement
|
9 |
+
- **Batch Processing**: Handle multiple texts and files at once
|
10 |
+
- **Academic Focus**: Preserves academic tone while making text more natural
|
11 |
+
- **Undetectable Output**: Creates human-like text that passes AI detection tools
|
12 |
+
- **Multiple Interfaces**: Simple, advanced, and batch processing versions
|
13 |
+
|
14 |
+
## 📁 Files
|
15 |
+
|
16 |
+
1. **`humanizer_app.py`** - Advanced version with multiple models and sophisticated techniques
|
17 |
+
2. **`humanizer_simple.py`** - Simplified version with reliable single model
|
18 |
+
3. **`humanizer_batch.py`** - Batch processing version for files and multiple texts
|
19 |
+
|
20 |
+
## 🛠️ Installation
|
21 |
+
|
22 |
+
### Prerequisites
|
23 |
+
|
24 |
+
1. Python 3.8+ installed
|
25 |
+
2. Virtual environment (recommended)
|
26 |
+
|
27 |
+
### Setup
|
28 |
+
|
29 |
+
```bash
|
30 |
+
# Clone or download the project
|
31 |
+
cd Humanizer
|
32 |
+
|
33 |
+
# Create virtual environment (if not already created)
|
34 |
+
python -m venv .venv
|
35 |
+
|
36 |
+
# Activate virtual environment
|
37 |
+
# Windows:
|
38 |
+
.venv\Scripts\activate
|
39 |
+
# Linux/Mac:
|
40 |
+
source .venv/bin/activate
|
41 |
+
|
42 |
+
# Install required packages
|
43 |
+
pip install gradio transformers torch tiktoken nltk textstat protobuf pandas
|
44 |
+
|
45 |
+
# Run the application
|
46 |
+
python humanizer_app.py # Advanced version
|
47 |
+
# OR
|
48 |
+
python humanizer_simple.py # Simple version
|
49 |
+
# OR
|
50 |
+
python humanizer_batch.py # Batch processing version
|
51 |
+
```
|
52 |
+
|
53 |
+
## 🎯 Usage
|
54 |
+
|
55 |
+
### Basic Usage
|
56 |
+
|
57 |
+
1. Run one of the Python files
|
58 |
+
2. Open your browser to the displayed URL (usually http://127.0.0.1:7860)
|
59 |
+
3. Paste your AI-generated text
|
60 |
+
4. Select humanization level
|
61 |
+
5. Click "Humanize" and get natural, human-like output
|
62 |
+
|
63 |
+
### Humanization Levels
|
64 |
+
|
65 |
+
- **Light**: Basic paraphrasing with minimal changes
|
66 |
+
- **Moderate/Medium**: Paraphrasing + vocabulary variations + natural connectors
|
67 |
+
- **Heavy**: All techniques + sentence structure modifications + advanced variations
|
68 |
+
|
69 |
+
### Batch Processing
|
70 |
+
|
71 |
+
The batch processor (`humanizer_batch.py`) supports:
|
72 |
+
- **.txt files**: Processes paragraph by paragraph
|
73 |
+
- **.csv files**: Adds a 'humanized' column with processed text
|
74 |
+
|
75 |
+
## 🔧 How It Works
|
76 |
+
|
77 |
+
### Advanced Techniques Used
|
78 |
+
|
79 |
+
1. **Multi-Model Paraphrasing**: Uses multiple AI models to avoid patterns
|
80 |
+
2. **Vocabulary Diversification**: Replaces words with contextual synonyms
|
81 |
+
3. **Sentence Structure Variation**: Modifies sentence patterns for natural flow
|
82 |
+
4. **Academic Connector Integration**: Adds natural transitional phrases
|
83 |
+
5. **Hedging Language**: Incorporates academic hedging for natural tone
|
84 |
+
6. **Smart Chunking**: Processes long texts in optimal chunks
|
85 |
+
|
86 |
+
### AI Models Used
|
87 |
+
|
88 |
+
- **T5 Paraphrase (Primary)**: `Vamsi/T5_Paraphrase_Paws`
|
89 |
+
- **Pegasus (Secondary)**: `tuner007/pegasus_paraphrase`
|
90 |
+
- **NLTK WordNet**: For synonym replacement
|
91 |
+
- **Custom Algorithms**: For structure and flow optimization
|
92 |
+
|
93 |
+
## 📊 Example Transformations
|
94 |
+
|
95 |
+
### Input (AI-generated):
|
96 |
+
```
|
97 |
+
The implementation of machine learning algorithms in data processing systems demonstrates significant improvements in efficiency and accuracy metrics across various benchmark datasets.
|
98 |
+
```
|
99 |
+
|
100 |
+
### Output (Humanized):
|
101 |
+
```
|
102 |
+
Implementing machine learning algorithms within data processing frameworks shows notable enhancements in both efficiency and accuracy measures when evaluated across different benchmark datasets. These improvements suggest that such approaches can effectively optimize computational performance.
|
103 |
+
```
|
104 |
+
|
105 |
+
## 🎮 Advanced Features
|
106 |
+
|
107 |
+
### Multi-Level Processing
|
108 |
+
- Processes texts of any length by intelligent chunking
|
109 |
+
- Maintains context across chunks
|
110 |
+
- Preserves academic integrity
|
111 |
+
|
112 |
+
### Natural Variations
|
113 |
+
- Dynamic vocabulary replacement
|
114 |
+
- Contextual synonym selection
|
115 |
+
- Academic phrase integration
|
116 |
+
- Sentence flow optimization
|
117 |
+
|
118 |
+
### Error Handling
|
119 |
+
- Graceful fallbacks if models fail
|
120 |
+
- Multiple backup techniques
|
121 |
+
- Robust error recovery
|
122 |
+
|
123 |
+
## 🔍 Best Practices
|
124 |
+
|
125 |
+
1. **Input Quality**: Use complete sentences and proper grammar
|
126 |
+
2. **Length Considerations**: Works best with 50-1000 word chunks
|
127 |
+
3. **Context Preservation**: Review output to ensure meaning is maintained
|
128 |
+
4. **Multiple Passes**: For heavy humanization, consider multiple rounds
|
129 |
+
5. **Manual Review**: Always review output for accuracy and flow
|
130 |
+
|
131 |
+
## 🚫 Troubleshooting
|
132 |
+
|
133 |
+
### Common Issues
|
134 |
+
|
135 |
+
1. **Model Loading Errors**:
|
136 |
+
- Ensure protobuf is installed: `pip install protobuf`
|
137 |
+
- Check internet connection for model downloads
|
138 |
+
- Try the simple version if advanced fails
|
139 |
+
|
140 |
+
2. **Memory Issues**:
|
141 |
+
- Reduce text chunk size
|
142 |
+
- Use lighter humanization levels
|
143 |
+
- Close other applications
|
144 |
+
|
145 |
+
3. **Performance Issues**:
|
146 |
+
- Use GPU if available
|
147 |
+
- Process smaller texts
|
148 |
+
- Try the simple version
|
149 |
+
|
150 |
+
## ⚖️ Ethical Usage
|
151 |
+
|
152 |
+
This tool is designed for:
|
153 |
+
- ✅ Improving writing quality
|
154 |
+
- ✅ Learning natural language patterns
|
155 |
+
- ✅ Enhancing academic writing
|
156 |
+
- ✅ Content optimization
|
157 |
+
|
158 |
+
Please use responsibly and:
|
159 |
+
- 🚫 Don't use for plagiarism
|
160 |
+
- 🚫 Don't violate academic integrity policies
|
161 |
+
- 🚫 Don't misrepresent authorship
|
162 |
+
- 🚫 Don't use for deceptive purposes
|
163 |
+
|
164 |
+
## 🤝 Contributing
|
165 |
+
|
166 |
+
Feel free to:
|
167 |
+
- Report bugs
|
168 |
+
- Suggest improvements
|
169 |
+
- Add new models
|
170 |
+
- Enhance techniques
|
171 |
+
|
172 |
+
## 📄 License
|
173 |
+
|
174 |
+
This project is for educational and research purposes. Please respect academic integrity and use responsibly.
|
175 |
+
|
176 |
+
---
|
177 |
+
|
178 |
+
**Made with ❤️ for better academic writing**
|
README_deployment.md
ADDED
@@ -0,0 +1,164 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 🤖➡️👨 AI Text Humanizer & Detector Pro
|
2 |
+
|
3 |
+
A comprehensive web application for transforming AI-generated text into natural, human-like writing while providing advanced AI detection capabilities.
|
4 |
+
|
5 |
+
## ✨ Features
|
6 |
+
|
7 |
+
### 🎭 Text Humanizer
|
8 |
+
- **Advanced Vocabulary Enhancement**: Replace robotic terms with natural alternatives
|
9 |
+
- **Sentence Flow Optimization**: Improve readability and natural rhythm
|
10 |
+
- **Structure Diversification**: Break up repetitive patterns
|
11 |
+
- **Academic Tone Preservation**: Maintain professional quality while adding humanity
|
12 |
+
- **Multi-level Processing**: Light, Medium, and Heavy humanization options
|
13 |
+
|
14 |
+
### 🕵️ AI Detector
|
15 |
+
- **7-Point Analysis System**: Comprehensive AI probability assessment
|
16 |
+
- **Detailed Scoring**: Individual metrics for each detection factor
|
17 |
+
- **Confidence Levels**: Clear interpretation of results
|
18 |
+
- **Pattern Recognition**: Identifies common AI writing patterns
|
19 |
+
- **Real-time Analysis**: Instant feedback on text authenticity
|
20 |
+
|
21 |
+
### 🔄 Combined Processing
|
22 |
+
- **One-Click Workflow**: Humanize and test in a single process
|
23 |
+
- **Optimization Loop**: Perfect for iterative improvements
|
24 |
+
- **Quality Validation**: Ensure humanization effectiveness
|
25 |
+
|
26 |
+
## 🚀 Live Demo
|
27 |
+
|
28 |
+
Visit the live application: [Hugging Face Spaces](https://huggingface.co/spaces/YOUR_USERNAME/ai-text-humanizer)
|
29 |
+
|
30 |
+
## 📦 Installation
|
31 |
+
|
32 |
+
### Local Setup
|
33 |
+
|
34 |
+
1. Clone the repository:
|
35 |
+
```bash
|
36 |
+
git clone https://github.com/YOUR_USERNAME/ai-text-humanizer.git
|
37 |
+
cd ai-text-humanizer
|
38 |
+
```
|
39 |
+
|
40 |
+
2. Install dependencies:
|
41 |
+
```bash
|
42 |
+
pip install -r requirements.txt
|
43 |
+
```
|
44 |
+
|
45 |
+
3. Run the application:
|
46 |
+
```bash
|
47 |
+
python app.py
|
48 |
+
```
|
49 |
+
|
50 |
+
4. Open your browser to `http://localhost:7860`
|
51 |
+
|
52 |
+
### Requirements
|
53 |
+
- Python 3.8+
|
54 |
+
- Gradio 4.44.0
|
55 |
+
- NLTK 3.8.1
|
56 |
+
- textstat 0.7.3
|
57 |
+
- numpy 1.24.3
|
58 |
+
- pandas 2.0.3
|
59 |
+
|
60 |
+
## 🛠️ Technical Details
|
61 |
+
|
62 |
+
### Humanization Algorithms
|
63 |
+
- **Vocabulary Diversification**: WordNet-based synonym replacement
|
64 |
+
- **Structural Variation**: Sentence pattern modification
|
65 |
+
- **Natural Flow Enhancement**: Academic connector and hedge phrase insertion
|
66 |
+
- **Linguistic Pattern Breaking**: AI-specific phrase elimination
|
67 |
+
|
68 |
+
### AI Detection Metrics
|
69 |
+
1. **AI Phrase Detection**: Identifies common AI-generated expressions
|
70 |
+
2. **Vocabulary Repetition**: Analyzes overuse of academic terms
|
71 |
+
3. **Structure Patterns**: Detects repetitive sentence starters
|
72 |
+
4. **Transition Overuse**: Measures excessive formal connectors
|
73 |
+
5. **Formal Pattern Recognition**: Identifies robotic phrasing
|
74 |
+
6. **Sentence Consistency**: Analyzes unnatural uniformity
|
75 |
+
7. **Readability Assessment**: Evaluates writing naturalness
|
76 |
+
|
77 |
+
## 📈 Usage Examples
|
78 |
+
|
79 |
+
### Input (AI-Generated):
|
80 |
+
```
|
81 |
+
The implementation of artificial intelligence algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.
|
82 |
+
```
|
83 |
+
|
84 |
+
### Output (Humanized):
|
85 |
+
```
|
86 |
+
AI algorithms show notable improvements in both computational efficiency and accuracy when tested across different benchmark datasets. These results indicate considerable advances in performance.
|
87 |
+
```
|
88 |
+
|
89 |
+
## 🔧 Configuration
|
90 |
+
|
91 |
+
### Humanization Levels:
|
92 |
+
- **Light**: Basic vocabulary substitution
|
93 |
+
- **Medium**: Vocabulary + natural flow enhancement
|
94 |
+
- **Heavy**: All techniques including structure modification
|
95 |
+
|
96 |
+
### AI Detection Thresholds:
|
97 |
+
- **0-20%**: Likely human-written
|
98 |
+
- **21-40%**: Possibly AI-generated
|
99 |
+
- **41-60%**: Probably AI-generated
|
100 |
+
- **61-80%**: Likely AI-generated
|
101 |
+
- **81-100%**: Very likely AI-generated
|
102 |
+
|
103 |
+
## 🌐 Deployment Options
|
104 |
+
|
105 |
+
### Hugging Face Spaces (Recommended)
|
106 |
+
1. Fork this repository
|
107 |
+
2. Create a new Space on Hugging Face
|
108 |
+
3. Link your GitHub repository
|
109 |
+
4. Automatic deployment with free GPU access
|
110 |
+
|
111 |
+
### Railway
|
112 |
+
1. Connect your GitHub repository
|
113 |
+
2. Deploy with one click
|
114 |
+
3. Free tier available
|
115 |
+
|
116 |
+
### Heroku
|
117 |
+
1. Create new Heroku app
|
118 |
+
2. Connect GitHub repository
|
119 |
+
3. Deploy from dashboard
|
120 |
+
|
121 |
+
## ⚖️ Ethical Usage
|
122 |
+
|
123 |
+
This tool is designed for:
|
124 |
+
- ✅ Improving writing quality and naturalness
|
125 |
+
- ✅ Educational purposes and learning
|
126 |
+
- ✅ Understanding AI detection mechanisms
|
127 |
+
- ✅ Research and development
|
128 |
+
|
129 |
+
**Important Guidelines:**
|
130 |
+
- 🚫 Do not use for plagiarism or academic dishonesty
|
131 |
+
- 🚫 Do not violate institutional policies
|
132 |
+
- 🚫 Do not misrepresent authorship
|
133 |
+
- ✅ Maintain transparency about AI assistance
|
134 |
+
- ✅ Follow academic integrity guidelines
|
135 |
+
|
136 |
+
## 🤝 Contributing
|
137 |
+
|
138 |
+
Contributions are welcome! Please feel free to submit pull requests or open issues for:
|
139 |
+
- Bug fixes
|
140 |
+
- Feature enhancements
|
141 |
+
- Algorithm improvements
|
142 |
+
- Documentation updates
|
143 |
+
|
144 |
+
## 📄 License
|
145 |
+
|
146 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
147 |
+
|
148 |
+
## 🙏 Acknowledgments
|
149 |
+
|
150 |
+
- NLTK team for natural language processing tools
|
151 |
+
- Hugging Face for hosting and deployment platform
|
152 |
+
- Gradio team for the web interface framework
|
153 |
+
- Open source community for various libraries and tools
|
154 |
+
|
155 |
+
## 📞 Support
|
156 |
+
|
157 |
+
For questions, issues, or suggestions:
|
158 |
+
- Open an issue on GitHub
|
159 |
+
- Contact: [[email protected]]
|
160 |
+
- Documentation: [Link to detailed docs]
|
161 |
+
|
162 |
+
---
|
163 |
+
|
164 |
+
**Disclaimer**: This tool is for educational and research purposes. Users are responsible for ensuring compliance with their institution's policies and maintaining academic integrity.
|
STATUS.md
ADDED
@@ -0,0 +1,160 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 🎯 AI Text Humanizer - Version Summary
|
2 |
+
|
3 |
+
## 📊 **Current Status**
|
4 |
+
|
5 |
+
✅ **WORKING APPLICATIONS:**
|
6 |
+
- **Robust Humanizer** (Port 7862) - **RECOMMENDED** ⭐
|
7 |
+
- Advanced Humanizer (Port 7860) - Running with fallbacks
|
8 |
+
- Simple Humanizer (Port 7861) - Running with fallbacks
|
9 |
+
|
10 |
+
## 🚀 **Available Versions**
|
11 |
+
|
12 |
+
### 1. **`humanizer_robust.py`** ⭐ **BEST CHOICE**
|
13 |
+
- **Port:** 7862
|
14 |
+
- **Status:** ✅ **FULLY WORKING**
|
15 |
+
- **Dependencies:** None (pure Python)
|
16 |
+
- **Features:**
|
17 |
+
- Advanced vocabulary replacement (20+ word pairs)
|
18 |
+
- Natural sentence flow optimization
|
19 |
+
- Academic connector integration
|
20 |
+
- Sentence restructuring for variety
|
21 |
+
- Hedging language insertion
|
22 |
+
- Smart sentence breaking
|
23 |
+
- Multiple intensity levels
|
24 |
+
|
25 |
+
**Why Choose This:**
|
26 |
+
- 🛡️ **Always works** - No external dependencies
|
27 |
+
- 🎯 **Highly effective** - Advanced linguistic techniques
|
28 |
+
- ⚡ **Fast processing** - No model loading delays
|
29 |
+
- 🔧 **Reliable** - No network or model failures
|
30 |
+
|
31 |
+
### 2. **`humanizer_app.py`** (Advanced)
|
32 |
+
- **Port:** 7860
|
33 |
+
- **Status:** ⚠️ **Partial** (Models failing, fallbacks working)
|
34 |
+
- **Features:** Multi-model AI approach with NLTK integration
|
35 |
+
- **Issue:** SentencePiece tokenizer conversion problems
|
36 |
+
|
37 |
+
### 3. **`humanizer_simple.py`** (Simple)
|
38 |
+
- **Port:** 7861
|
39 |
+
- **Status:** ⚠️ **Partial** (Model failing, fallbacks working)
|
40 |
+
- **Features:** Single T5 model approach
|
41 |
+
- **Issue:** Same tokenizer conversion problems
|
42 |
+
|
43 |
+
### 4. **`humanizer_batch.py`** (Batch Processing)
|
44 |
+
- **Status:** 🚫 **Not Running** (Same model issues)
|
45 |
+
- **Features:** File upload and batch processing
|
46 |
+
|
47 |
+
## 🎮 **How to Use the Working Version**
|
48 |
+
|
49 |
+
### **Access the Robust Humanizer:**
|
50 |
+
```
|
51 |
+
http://127.0.0.1:7862
|
52 |
+
```
|
53 |
+
|
54 |
+
### **Three Intensity Levels:**
|
55 |
+
|
56 |
+
1. **Light Humanization:**
|
57 |
+
- Basic vocabulary substitutions
|
58 |
+
- Minimal structural changes
|
59 |
+
- Quick and conservative
|
60 |
+
|
61 |
+
2. **Medium Humanization:** ⭐ **RECOMMENDED**
|
62 |
+
- Vocabulary variations + natural flow
|
63 |
+
- Academic connectors and transitions
|
64 |
+
- Balanced approach
|
65 |
+
|
66 |
+
3. **Heavy Humanization:**
|
67 |
+
- All techniques + sentence restructuring
|
68 |
+
- Maximum transformation
|
69 |
+
- Most natural output
|
70 |
+
|
71 |
+
## 🔧 **Technical Details**
|
72 |
+
|
73 |
+
### **Robust Humanizer Techniques:**
|
74 |
+
|
75 |
+
1. **Advanced Vocabulary Replacement:**
|
76 |
+
```
|
77 |
+
"demonstrates" → ["shows", "reveals", "indicates", "illustrates"]
|
78 |
+
"significant" → ["notable", "considerable", "substantial"]
|
79 |
+
"utilize" → ["use", "employ", "apply", "implement"]
|
80 |
+
```
|
81 |
+
|
82 |
+
2. **Natural Flow Enhancement:**
|
83 |
+
- Academic sentence starters
|
84 |
+
- Transitional connectors
|
85 |
+
- Hedging phrases for natural tone
|
86 |
+
|
87 |
+
3. **Sentence Structure Variation:**
|
88 |
+
- Smart sentence breaking for long sentences
|
89 |
+
- Natural connection between ideas
|
90 |
+
- Variety in sentence beginnings
|
91 |
+
|
92 |
+
4. **Academic Tone Preservation:**
|
93 |
+
- Maintains scholarly language
|
94 |
+
- Preserves technical accuracy
|
95 |
+
- Enhances readability
|
96 |
+
|
97 |
+
## 📝 **Example Transformation**
|
98 |
+
|
99 |
+
### **Input (Robotic AI Text):**
|
100 |
+
```
|
101 |
+
The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets. These results indicate that the optimization of neural network architectures can facilitate enhanced performance in predictive analytics applications.
|
102 |
+
```
|
103 |
+
|
104 |
+
### **Output (Humanized - Medium Level):**
|
105 |
+
```
|
106 |
+
Implementing machine learning algorithms shows notable enhancements in computational efficiency and accuracy measures across various benchmark datasets. Moreover, these findings suggest that optimizing neural network architectures can help improve performance in predictive analytics applications. Research indicates that such approaches provide considerable benefits for data processing tasks.
|
107 |
+
```
|
108 |
+
|
109 |
+
## 🛠️ **If You Want to Fix the AI Model Versions:**
|
110 |
+
|
111 |
+
The main issue is with the SentencePiece tokenizer conversion. To potentially fix:
|
112 |
+
|
113 |
+
1. **Try different model versions:**
|
114 |
+
```bash
|
115 |
+
# Install specific transformers version
|
116 |
+
pip install transformers==4.30.0
|
117 |
+
```
|
118 |
+
|
119 |
+
2. **Use different models:**
|
120 |
+
```python
|
121 |
+
# Replace with models that have better tokenizer support
|
122 |
+
"google/flan-t5-base" # Instead of Vamsi/T5_Paraphrase_Paws
|
123 |
+
```
|
124 |
+
|
125 |
+
3. **Force slow tokenizer:**
|
126 |
+
```python
|
127 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
|
128 |
+
```
|
129 |
+
|
130 |
+
## 💡 **Recommendations**
|
131 |
+
|
132 |
+
1. **For Daily Use:** Use `humanizer_robust.py` (Port 7862)
|
133 |
+
2. **For Best Results:** Use "Medium" intensity level
|
134 |
+
3. **For Long Texts:** Process in chunks of 200-500 words
|
135 |
+
4. **For Academic Papers:** Always review output for accuracy
|
136 |
+
|
137 |
+
## ⚡ **Quick Start**
|
138 |
+
|
139 |
+
```bash
|
140 |
+
# Run the working version
|
141 |
+
D:/Siddhant/projects/Humanizer/.venv/Scripts/python.exe humanizer_robust.py
|
142 |
+
|
143 |
+
# Open in browser
|
144 |
+
http://127.0.0.1:7862
|
145 |
+
```
|
146 |
+
|
147 |
+
## 🎯 **Why This Solution Works**
|
148 |
+
|
149 |
+
The robust version is highly effective because it:
|
150 |
+
|
151 |
+
- **Targets AI Detection Patterns:** Replaces common AI-generated phrases
|
152 |
+
- **Adds Natural Variation:** Uses multiple alternatives for each replacement
|
153 |
+
- **Maintains Academic Quality:** Preserves scholarly tone and accuracy
|
154 |
+
- **Creates Natural Flow:** Adds appropriate connectors and transitions
|
155 |
+
- **Varies Structure:** Changes sentence patterns for authenticity
|
156 |
+
- **Always Works:** No dependencies on external models or services
|
157 |
+
|
158 |
+
---
|
159 |
+
|
160 |
+
**🎉 You now have a fully functional, robust AI text humanizer that will consistently produce natural, human-like text!**
|
app.py
ADDED
@@ -0,0 +1,654 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import random
|
3 |
+
import re
|
4 |
+
import warnings
|
5 |
+
import math
|
6 |
+
from collections import Counter
|
7 |
+
warnings.filterwarnings("ignore")
|
8 |
+
|
9 |
+
# Import NLTK with error handling
|
10 |
+
try:
|
11 |
+
import nltk
|
12 |
+
import textstat
|
13 |
+
from nltk.corpus import wordnet
|
14 |
+
from nltk.tokenize import sent_tokenize, word_tokenize
|
15 |
+
NLTK_AVAILABLE = True
|
16 |
+
|
17 |
+
# Download required NLTK data
|
18 |
+
try:
|
19 |
+
nltk.data.find('tokenizers/punkt_tab')
|
20 |
+
except LookupError:
|
21 |
+
nltk.download('punkt_tab')
|
22 |
+
try:
|
23 |
+
nltk.data.find('tokenizers/punkt')
|
24 |
+
except LookupError:
|
25 |
+
nltk.download('punkt')
|
26 |
+
try:
|
27 |
+
nltk.data.find('corpora/wordnet')
|
28 |
+
except LookupError:
|
29 |
+
nltk.download('wordnet')
|
30 |
+
try:
|
31 |
+
nltk.data.find('corpora/omw-1.4')
|
32 |
+
except LookupError:
|
33 |
+
nltk.download('omw-1.4')
|
34 |
+
|
35 |
+
except ImportError as e:
|
36 |
+
print(f"NLTK import error: {e}")
|
37 |
+
NLTK_AVAILABLE = False
|
38 |
+
import textstat
|
39 |
+
|
40 |
+
class AdvancedHumanizer:
|
41 |
+
def __init__(self):
|
42 |
+
self.transition_words = [
|
43 |
+
"However", "Nevertheless", "Furthermore", "Moreover", "Additionally",
|
44 |
+
"Consequently", "Therefore", "Thus", "In contrast", "Similarly",
|
45 |
+
"On the other hand", "Meanwhile", "Subsequently", "Notably",
|
46 |
+
"Importantly", "Significantly", "Interestingly", "Remarkably"
|
47 |
+
]
|
48 |
+
|
49 |
+
self.hedging_phrases = [
|
50 |
+
"appears to", "seems to", "tends to", "suggests that", "indicates that",
|
51 |
+
"may well", "might be", "could be", "potentially", "presumably",
|
52 |
+
"arguably", "to some extent", "in many cases", "generally speaking"
|
53 |
+
]
|
54 |
+
|
55 |
+
self.academic_connectors = [
|
56 |
+
"In light of this", "Building upon this", "This finding suggests",
|
57 |
+
"It is worth noting that", "This observation", "These results",
|
58 |
+
"The evidence indicates", "This approach", "The data reveals"
|
59 |
+
]
|
60 |
+
|
61 |
+
# Enhanced vocabulary replacements for better humanization
|
62 |
+
self.vocabulary_replacements = {
|
63 |
+
"significant": ["notable", "considerable", "substantial", "important", "remarkable"],
|
64 |
+
"demonstrate": ["show", "illustrate", "reveal", "display", "indicate"],
|
65 |
+
"utilize": ["use", "employ", "apply", "implement", "make use of"],
|
66 |
+
"implement": ["apply", "use", "put into practice", "carry out", "execute"],
|
67 |
+
"generate": ["create", "produce", "develop", "form", "make"],
|
68 |
+
"facilitate": ["help", "enable", "assist", "support", "aid"],
|
69 |
+
"optimize": ["improve", "enhance", "refine", "perfect", "better"],
|
70 |
+
"analyze": ["examine", "study", "investigate", "assess", "evaluate"],
|
71 |
+
"therefore": ["thus", "hence", "consequently", "as a result", "for this reason"],
|
72 |
+
"however": ["nevertheless", "nonetheless", "yet", "on the other hand", "but"],
|
73 |
+
"furthermore": ["moreover", "additionally", "in addition", "what is more", "besides"],
|
74 |
+
"substantial": ["significant", "considerable", "notable", "important", "major"],
|
75 |
+
"subsequently": ["later", "then", "afterward", "following this", "next"],
|
76 |
+
"approximately": ["about", "roughly", "around", "nearly", "close to"],
|
77 |
+
"numerous": ["many", "several", "multiple", "various", "a number of"],
|
78 |
+
"encompasses": ["includes", "covers", "contains", "involves", "comprises"],
|
79 |
+
"methodology": ["method", "approach", "technique", "procedure", "process"],
|
80 |
+
"comprehensive": ["complete", "thorough", "extensive", "detailed", "full"],
|
81 |
+
"indicates": ["shows", "suggests", "points to", "reveals", "demonstrates"],
|
82 |
+
"established": ["set up", "created", "formed", "developed", "built"]
|
83 |
+
}
|
84 |
+
|
85 |
+
def split_into_sentences(self, text):
|
86 |
+
"""Smart sentence splitting with NLTK fallback"""
|
87 |
+
if NLTK_AVAILABLE:
|
88 |
+
return sent_tokenize(text)
|
89 |
+
else:
|
90 |
+
# Enhanced fallback sentence splitting
|
91 |
+
sentences = []
|
92 |
+
current = ""
|
93 |
+
|
94 |
+
for char in text:
|
95 |
+
current += char
|
96 |
+
if char == '.' and len(current) > 10:
|
97 |
+
# Check if this looks like end of sentence
|
98 |
+
remaining = text[text.find(current) + len(current):]
|
99 |
+
if remaining and (remaining[0].isupper() or remaining.strip().startswith(('The ', 'This ', 'A '))):
|
100 |
+
sentences.append(current.strip())
|
101 |
+
current = ""
|
102 |
+
|
103 |
+
if current.strip():
|
104 |
+
sentences.append(current.strip())
|
105 |
+
|
106 |
+
return [s for s in sentences if len(s.strip()) > 5]
|
107 |
+
|
108 |
+
def add_natural_variations(self, text):
|
109 |
+
"""Add natural linguistic variations to make text less robotic"""
|
110 |
+
sentences = self.split_into_sentences(text)
|
111 |
+
varied_sentences = []
|
112 |
+
|
113 |
+
for i, sentence in enumerate(sentences):
|
114 |
+
sentence = sentence.strip()
|
115 |
+
if not sentence.endswith('.'):
|
116 |
+
sentence += '.'
|
117 |
+
|
118 |
+
# Randomly add hedging language
|
119 |
+
if random.random() < 0.3 and not any(phrase in sentence.lower() for phrase in self.hedging_phrases):
|
120 |
+
hedge = random.choice(self.hedging_phrases)
|
121 |
+
if sentence.startswith("The ") or sentence.startswith("This "):
|
122 |
+
words = sentence.split()
|
123 |
+
if len(words) > 2:
|
124 |
+
words.insert(2, hedge)
|
125 |
+
sentence = " ".join(words)
|
126 |
+
|
127 |
+
# Add transitional phrases for flow
|
128 |
+
if i > 0 and random.random() < 0.4:
|
129 |
+
connector = random.choice(self.academic_connectors)
|
130 |
+
sentence = f"{connector}, {sentence.lower()}"
|
131 |
+
|
132 |
+
varied_sentences.append(sentence)
|
133 |
+
|
134 |
+
return " ".join(varied_sentences)
|
135 |
+
|
136 |
+
def diversify_vocabulary(self, text):
|
137 |
+
"""Replace common words with synonyms for variation"""
|
138 |
+
if NLTK_AVAILABLE:
|
139 |
+
words = word_tokenize(text)
|
140 |
+
result = []
|
141 |
+
|
142 |
+
for word in words:
|
143 |
+
if word.isalpha() and len(word) > 4 and random.random() < 0.2:
|
144 |
+
synonyms = []
|
145 |
+
for syn in wordnet.synsets(word):
|
146 |
+
for lemma in syn.lemmas():
|
147 |
+
if lemma.name() != word and '_' not in lemma.name():
|
148 |
+
synonyms.append(lemma.name())
|
149 |
+
|
150 |
+
if synonyms:
|
151 |
+
replacement = random.choice(synonyms[:3])
|
152 |
+
result.append(replacement)
|
153 |
+
else:
|
154 |
+
result.append(word)
|
155 |
+
else:
|
156 |
+
result.append(word)
|
157 |
+
|
158 |
+
return " ".join(result)
|
159 |
+
else:
|
160 |
+
# Enhanced fallback with more replacements
|
161 |
+
result = text
|
162 |
+
for original, alternatives in self.vocabulary_replacements.items():
|
163 |
+
if original.lower() in result.lower():
|
164 |
+
replacement = random.choice(alternatives)
|
165 |
+
pattern = re.compile(re.escape(original), re.IGNORECASE)
|
166 |
+
result = pattern.sub(replacement, result, count=1)
|
167 |
+
|
168 |
+
return result
|
169 |
+
|
170 |
+
def adjust_sentence_structure(self, text):
|
171 |
+
"""Modify sentence structures for more natural flow"""
|
172 |
+
sentences = self.split_into_sentences(text)
|
173 |
+
modified = []
|
174 |
+
|
175 |
+
for sentence in sentences:
|
176 |
+
words = sentence.split()
|
177 |
+
|
178 |
+
# For long sentences, sometimes break them up
|
179 |
+
if len(words) > 20 and random.random() < 0.4:
|
180 |
+
# Find a good break point
|
181 |
+
break_words = ['and', 'but', 'which', 'that', 'because', 'since', 'while']
|
182 |
+
for i, word in enumerate(words[8:18], 8): # Look in middle section
|
183 |
+
if word.lower() in break_words:
|
184 |
+
part1 = " ".join(words[:i]) + "."
|
185 |
+
part2 = " ".join(words[i+1:])
|
186 |
+
if len(part2) > 5: # Only if second part is substantial
|
187 |
+
part2 = part2[0].upper() + part2[1:] if part2 else part2
|
188 |
+
modified.extend([part1, part2])
|
189 |
+
break
|
190 |
+
else:
|
191 |
+
modified.append(sentence)
|
192 |
+
else:
|
193 |
+
modified.append(sentence)
|
194 |
+
|
195 |
+
return " ".join(modified)
|
196 |
+
|
197 |
+
def clean_and_format(self, text):
|
198 |
+
"""Clean up the text formatting"""
|
199 |
+
# Remove extra spaces
|
200 |
+
text = re.sub(r'\s+', ' ', text)
|
201 |
+
text = re.sub(r'\s+([.,!?;:])', r'\1', text)
|
202 |
+
|
203 |
+
# Fix capitalization
|
204 |
+
sentences = self.split_into_sentences(text)
|
205 |
+
formatted = []
|
206 |
+
|
207 |
+
for sentence in sentences:
|
208 |
+
sentence = sentence.strip()
|
209 |
+
if sentence:
|
210 |
+
# Capitalize first letter
|
211 |
+
sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
|
212 |
+
|
213 |
+
# Ensure proper ending
|
214 |
+
if not sentence.endswith(('.', '!', '?')):
|
215 |
+
sentence += '.'
|
216 |
+
|
217 |
+
formatted.append(sentence)
|
218 |
+
|
219 |
+
return " ".join(formatted)
|
220 |
+
|
221 |
+
def humanize_text(self, text, intensity="medium"):
|
222 |
+
"""Main humanization function"""
|
223 |
+
if not text or len(text.strip()) < 10:
|
224 |
+
return "Please enter substantial text to humanize (at least 10 characters)."
|
225 |
+
|
226 |
+
result = text.strip()
|
227 |
+
|
228 |
+
try:
|
229 |
+
# Apply different levels of humanization
|
230 |
+
if intensity.lower() in ["light", "low"]:
|
231 |
+
# Just vocabulary changes
|
232 |
+
result = self.diversify_vocabulary(result)
|
233 |
+
|
234 |
+
elif intensity.lower() in ["medium", "moderate"]:
|
235 |
+
# Vocabulary + natural flow
|
236 |
+
result = self.diversify_vocabulary(result)
|
237 |
+
result = self.add_natural_variations(result)
|
238 |
+
|
239 |
+
elif intensity.lower() in ["heavy", "high", "maximum"]:
|
240 |
+
# All techniques
|
241 |
+
result = self.diversify_vocabulary(result)
|
242 |
+
result = self.add_natural_variations(result)
|
243 |
+
result = self.adjust_sentence_structure(result)
|
244 |
+
|
245 |
+
# Always clean up formatting
|
246 |
+
result = self.clean_and_format(result)
|
247 |
+
|
248 |
+
return result if result and len(result) > 10 else text
|
249 |
+
|
250 |
+
except Exception as e:
|
251 |
+
print(f"Humanization error: {e}")
|
252 |
+
return "Error processing text. Please try again with different input."
|
253 |
+
|
254 |
+
class AIDetector:
|
255 |
+
def __init__(self):
|
256 |
+
"""Initialize AI detection patterns and thresholds"""
|
257 |
+
self.ai_phrases = [
|
258 |
+
"demonstrates significant", "substantial improvements", "comprehensive analysis",
|
259 |
+
"furthermore", "moreover", "additionally", "consequently", "therefore",
|
260 |
+
"implementation of", "utilization of", "optimization of", "enhancement of",
|
261 |
+
"facilitate", "demonstrate", "indicate", "substantial", "comprehensive",
|
262 |
+
"significant improvements", "notable enhancements", "effective approach",
|
263 |
+
"robust methodology", "systematic approach", "extensive evaluation",
|
264 |
+
"empirical results", "experimental validation", "performance metrics",
|
265 |
+
"benchmark datasets", "state-of-the-art", "cutting-edge", "novel approach",
|
266 |
+
"innovative solution", "groundbreaking", "revolutionary", "paradigm shift"
|
267 |
+
]
|
268 |
+
|
269 |
+
self.overused_academic_words = [
|
270 |
+
"significant", "substantial", "comprehensive", "extensive", "robust",
|
271 |
+
"novel", "innovative", "efficient", "effective", "optimal", "superior",
|
272 |
+
"enhanced", "improved", "advanced", "sophisticated", "cutting-edge",
|
273 |
+
"state-of-the-art", "groundbreaking", "revolutionary", "paradigm"
|
274 |
+
]
|
275 |
+
|
276 |
+
self.excessive_transitions = [
|
277 |
+
"furthermore", "moreover", "additionally", "consequently", "therefore",
|
278 |
+
"thus", "hence", "nevertheless", "nonetheless", "however"
|
279 |
+
]
|
280 |
+
|
281 |
+
self.formal_patterns = [
|
282 |
+
r"the implementation of \w+",
|
283 |
+
r"the utilization of \w+",
|
284 |
+
r"in order to \w+",
|
285 |
+
r"it is important to note that",
|
286 |
+
r"it should be emphasized that",
|
287 |
+
r"it can be observed that",
|
288 |
+
r"the results demonstrate that",
|
289 |
+
r"the findings indicate that"
|
290 |
+
]
|
291 |
+
|
292 |
+
def calculate_ai_probability(self, text):
|
293 |
+
"""Calculate the probability that text is AI-generated"""
|
294 |
+
if not text or len(text.strip()) < 50:
|
295 |
+
return {"probability": 0, "confidence": "Low", "details": {"error": "Text too short for analysis"}}
|
296 |
+
|
297 |
+
scores = {}
|
298 |
+
|
299 |
+
# Various AI detection checks
|
300 |
+
scores['ai_phrases'] = self._check_ai_phrases(text)
|
301 |
+
scores['vocab_repetition'] = self._check_vocabulary_repetition(text)
|
302 |
+
scores['structure_patterns'] = self._check_structure_patterns(text)
|
303 |
+
scores['transition_overuse'] = self._check_transition_overuse(text)
|
304 |
+
scores['formal_patterns'] = self._check_formal_patterns(text)
|
305 |
+
scores['sentence_consistency'] = self._check_sentence_consistency(text)
|
306 |
+
scores['readability'] = self._check_readability_patterns(text)
|
307 |
+
|
308 |
+
# Calculate weighted final score
|
309 |
+
weights = {
|
310 |
+
'ai_phrases': 0.2, 'vocab_repetition': 0.15, 'structure_patterns': 0.15,
|
311 |
+
'transition_overuse': 0.15, 'formal_patterns': 0.15,
|
312 |
+
'sentence_consistency': 0.1, 'readability': 0.1
|
313 |
+
}
|
314 |
+
|
315 |
+
final_score = sum(scores[key] * weights[key] for key in weights)
|
316 |
+
final_score = min(100, max(0, final_score))
|
317 |
+
|
318 |
+
# Determine confidence level
|
319 |
+
if final_score >= 80:
|
320 |
+
confidence, verdict = "Very High", "Likely AI-Generated"
|
321 |
+
elif final_score >= 60:
|
322 |
+
confidence, verdict = "High", "Probably AI-Generated"
|
323 |
+
elif final_score >= 40:
|
324 |
+
confidence, verdict = "Medium", "Possibly AI-Generated"
|
325 |
+
elif final_score >= 20:
|
326 |
+
confidence, verdict = "Low", "Probably Human-Written"
|
327 |
+
else:
|
328 |
+
confidence, verdict = "Very Low", "Likely Human-Written"
|
329 |
+
|
330 |
+
return {
|
331 |
+
"probability": round(final_score, 1),
|
332 |
+
"confidence": confidence,
|
333 |
+
"verdict": verdict,
|
334 |
+
"details": {k: round(v, 1) for k, v in scores.items()}
|
335 |
+
}
|
336 |
+
|
337 |
+
def _check_ai_phrases(self, text):
|
338 |
+
text_lower = text.lower()
|
339 |
+
phrase_count = sum(1 for phrase in self.ai_phrases if phrase in text_lower)
|
340 |
+
words = len(text.split())
|
341 |
+
return min(100, (phrase_count / words) * 1000 * 10) if words > 0 else 0
|
342 |
+
|
343 |
+
def _check_vocabulary_repetition(self, text):
|
344 |
+
words = [word.lower().strip('.,!?;:') for word in text.split() if word.isalpha()]
|
345 |
+
if len(words) < 10:
|
346 |
+
return 0
|
347 |
+
word_counts = Counter(words)
|
348 |
+
overused_count = sum(1 for word in self.overused_academic_words if word_counts.get(word, 0) > 1)
|
349 |
+
return min(100, (overused_count / len(self.overused_academic_words)) * 200)
|
350 |
+
|
351 |
+
def _check_structure_patterns(self, text):
|
352 |
+
if NLTK_AVAILABLE:
|
353 |
+
sentences = sent_tokenize(text)
|
354 |
+
else:
|
355 |
+
sentences = [s.strip() for s in text.split('.') if s.strip()]
|
356 |
+
|
357 |
+
if len(sentences) < 3:
|
358 |
+
return 0
|
359 |
+
|
360 |
+
starters = [s.split()[:3] for s in sentences if len(s.split()) >= 3]
|
361 |
+
starter_counts = Counter([' '.join(starter) for starter in starters])
|
362 |
+
repeated_starters = sum(1 for count in starter_counts.values() if count > 1)
|
363 |
+
return min(100, (repeated_starters / len(sentences)) * 150) if sentences else 0
|
364 |
+
|
365 |
+
def _check_transition_overuse(self, text):
|
366 |
+
text_lower = text.lower()
|
367 |
+
transition_count = sum(1 for transition in self.excessive_transitions if transition in text_lower)
|
368 |
+
words = len(text.split())
|
369 |
+
return min(100, (transition_count / words) * 100 * 20) if words > 0 else 0
|
370 |
+
|
371 |
+
def _check_formal_patterns(self, text):
|
372 |
+
pattern_count = sum(len(re.findall(pattern, text.lower())) for pattern in self.formal_patterns)
|
373 |
+
words = len(text.split())
|
374 |
+
return min(100, (pattern_count / words) * 1000 * 15) if words > 0 else 0
|
375 |
+
|
376 |
+
def _check_sentence_consistency(self, text):
|
377 |
+
if NLTK_AVAILABLE:
|
378 |
+
sentences = sent_tokenize(text)
|
379 |
+
else:
|
380 |
+
sentences = [s.strip() for s in text.split('.') if s.strip()]
|
381 |
+
|
382 |
+
if len(sentences) < 5:
|
383 |
+
return 0
|
384 |
+
|
385 |
+
lengths = [len(s.split()) for s in sentences]
|
386 |
+
avg_length = sum(lengths) / len(lengths)
|
387 |
+
variance = sum((length - avg_length) ** 2 for length in lengths) / len(lengths)
|
388 |
+
std_dev = math.sqrt(variance)
|
389 |
+
consistency_score = 100 - min(100, std_dev * 10)
|
390 |
+
return max(0, consistency_score - 20)
|
391 |
+
|
392 |
+
def _check_readability_patterns(self, text):
|
393 |
+
try:
|
394 |
+
words = text.split()
|
395 |
+
sentences = len([s for s in text.split('.') if s.strip()])
|
396 |
+
if sentences == 0:
|
397 |
+
return 0
|
398 |
+
avg_words_per_sentence = len(words) / sentences
|
399 |
+
if 15 <= avg_words_per_sentence <= 25:
|
400 |
+
return 30
|
401 |
+
elif 25 < avg_words_per_sentence <= 35:
|
402 |
+
return 50
|
403 |
+
else:
|
404 |
+
return 10
|
405 |
+
except:
|
406 |
+
return 0
|
407 |
+
|
408 |
+
# Initialize components
|
409 |
+
humanizer = AdvancedHumanizer()
|
410 |
+
ai_detector = AIDetector()
|
411 |
+
|
412 |
+
def process_text(input_text, humanization_level):
|
413 |
+
"""Process the input text"""
|
414 |
+
return humanizer.humanize_text(input_text, humanization_level)
|
415 |
+
|
416 |
+
def detect_ai_text(input_text):
|
417 |
+
"""Detect if text is AI-generated"""
|
418 |
+
if not input_text.strip():
|
419 |
+
return "Please enter some text to analyze."
|
420 |
+
|
421 |
+
result = ai_detector.calculate_ai_probability(input_text)
|
422 |
+
|
423 |
+
return f"""
|
424 |
+
## 🤖 AI Detection Analysis
|
425 |
+
|
426 |
+
**Overall Assessment:** {result['verdict']}
|
427 |
+
**AI Probability:** {result['probability']}%
|
428 |
+
**Confidence Level:** {result['confidence']}
|
429 |
+
|
430 |
+
### 📊 Detailed Breakdown:
|
431 |
+
- **AI Phrases Score:** {result['details']['ai_phrases']}%
|
432 |
+
- **Vocabulary Repetition:** {result['details']['vocab_repetition']}%
|
433 |
+
- **Structure Patterns:** {result['details']['structure_patterns']}%
|
434 |
+
- **Transition Overuse:** {result['details']['transition_overuse']}%
|
435 |
+
- **Formal Patterns:** {result['details']['formal_patterns']}%
|
436 |
+
- **Sentence Consistency:** {result['details']['sentence_consistency']}%
|
437 |
+
- **Readability Score:** {result['details']['readability']}%
|
438 |
+
|
439 |
+
### 💡 Interpretation:
|
440 |
+
- **0-20%:** Likely human-written with natural variations
|
441 |
+
- **21-40%:** Possibly AI-generated or heavily edited
|
442 |
+
- **41-60%:** Probably AI-generated with some humanization
|
443 |
+
- **61-80%:** Likely AI-generated with minimal editing
|
444 |
+
- **81-100%:** Very likely raw AI-generated content
|
445 |
+
"""
|
446 |
+
|
447 |
+
def combined_process(text, level):
|
448 |
+
"""Humanize text and then analyze it"""
|
449 |
+
if not text.strip():
|
450 |
+
return "Please enter text to process.", "No analysis available."
|
451 |
+
|
452 |
+
humanized = process_text(text, level)
|
453 |
+
analysis = detect_ai_text(humanized)
|
454 |
+
return humanized, analysis
|
455 |
+
|
456 |
+
# Create Gradio interface
|
457 |
+
with gr.Blocks(theme="soft", title="AI Text Humanizer & Detector") as demo:
|
458 |
+
gr.Markdown("""
|
459 |
+
# 🤖➡️👨 AI Text Humanizer & Detector Pro
|
460 |
+
|
461 |
+
**Complete solution for AI text processing - Humanize AND Detect AI-generated content**
|
462 |
+
|
463 |
+
Transform robotic AI text into natural, human-like writing, then verify the results with our built-in AI detector.
|
464 |
+
|
465 |
+
⚠️ **Note:** This tool is for educational purposes. Please use responsibly and maintain academic integrity.
|
466 |
+
""")
|
467 |
+
|
468 |
+
with gr.Tabs():
|
469 |
+
# Humanization Tab
|
470 |
+
with gr.TabItem("🎭 Text Humanizer"):
|
471 |
+
gr.Markdown("### Transform AI text into natural, human-like writing")
|
472 |
+
|
473 |
+
with gr.Row():
|
474 |
+
with gr.Column():
|
475 |
+
humanize_input = gr.Textbox(
|
476 |
+
lines=10,
|
477 |
+
placeholder="Enter machine-generated or robotic academic text here...",
|
478 |
+
label="Raw Input Text",
|
479 |
+
info="Paste your AI-generated text that needs to be humanized"
|
480 |
+
)
|
481 |
+
|
482 |
+
humanization_level = gr.Radio(
|
483 |
+
choices=["Light", "Medium", "Heavy"],
|
484 |
+
value="Medium",
|
485 |
+
label="Humanization Level",
|
486 |
+
info="Light: Basic changes | Medium: Vocabulary + flow | Heavy: All techniques"
|
487 |
+
)
|
488 |
+
|
489 |
+
humanize_btn = gr.Button("🚀 Humanize Text", variant="primary", size="lg")
|
490 |
+
|
491 |
+
with gr.Column():
|
492 |
+
humanize_output = gr.Textbox(
|
493 |
+
label="Humanized Academic Output",
|
494 |
+
lines=10,
|
495 |
+
show_copy_button=True,
|
496 |
+
info="Copy this natural, human-like text"
|
497 |
+
)
|
498 |
+
|
499 |
+
# Examples for humanizer
|
500 |
+
gr.Examples(
|
501 |
+
examples=[
|
502 |
+
[
|
503 |
+
"The implementation of artificial intelligence algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.",
|
504 |
+
"Medium"
|
505 |
+
],
|
506 |
+
[
|
507 |
+
"Machine learning models exhibit superior performance characteristics when evaluated against traditional statistical approaches in predictive analytics applications.",
|
508 |
+
"Heavy"
|
509 |
+
]
|
510 |
+
],
|
511 |
+
inputs=[humanize_input, humanization_level],
|
512 |
+
outputs=humanize_output
|
513 |
+
)
|
514 |
+
|
515 |
+
# AI Detection Tab
|
516 |
+
with gr.TabItem("🕵️ AI Detector"):
|
517 |
+
gr.Markdown("### Analyze text to detect if it's AI-generated")
|
518 |
+
|
519 |
+
with gr.Row():
|
520 |
+
with gr.Column():
|
521 |
+
detect_input = gr.Textbox(
|
522 |
+
lines=10,
|
523 |
+
placeholder="Paste text here to check if it's AI-generated...",
|
524 |
+
label="Text to Analyze",
|
525 |
+
info="Enter any text to check its AI probability"
|
526 |
+
)
|
527 |
+
|
528 |
+
detect_btn = gr.Button("🔍 Analyze Text", variant="secondary", size="lg")
|
529 |
+
|
530 |
+
with gr.Column():
|
531 |
+
detect_output = gr.Markdown(
|
532 |
+
label="AI Detection Results",
|
533 |
+
value="Analysis results will appear here..."
|
534 |
+
)
|
535 |
+
|
536 |
+
# Examples for detector
|
537 |
+
gr.Examples(
|
538 |
+
examples=[
|
539 |
+
["The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets. Furthermore, these results indicate substantial enhancements in performance."],
|
540 |
+
["I love going to the coffee shop on weekends. The barista there makes the best cappuccino I've ever had, and I always end up chatting with other customers about random stuff."],
|
541 |
+
["The comprehensive analysis reveals that the optimization of neural network architectures facilitates enhanced performance characteristics in predictive analytics applications."]
|
542 |
+
],
|
543 |
+
inputs=[detect_input],
|
544 |
+
outputs=detect_output
|
545 |
+
)
|
546 |
+
|
547 |
+
# Combined Analysis Tab
|
548 |
+
with gr.TabItem("🔄 Humanize & Test"):
|
549 |
+
gr.Markdown("### Humanize text and immediately test the results")
|
550 |
+
|
551 |
+
with gr.Column():
|
552 |
+
combined_input = gr.Textbox(
|
553 |
+
lines=8,
|
554 |
+
placeholder="Enter AI-generated text to humanize and test...",
|
555 |
+
label="Original AI Text",
|
556 |
+
info="This will be humanized and then tested for AI detection"
|
557 |
+
)
|
558 |
+
|
559 |
+
combined_level = gr.Radio(
|
560 |
+
choices=["Light", "Medium", "Heavy"],
|
561 |
+
value="Medium",
|
562 |
+
label="Humanization Level"
|
563 |
+
)
|
564 |
+
|
565 |
+
combined_btn = gr.Button("🔄 Humanize & Analyze", variant="primary", size="lg")
|
566 |
+
|
567 |
+
with gr.Row():
|
568 |
+
with gr.Column():
|
569 |
+
combined_humanized = gr.Textbox(
|
570 |
+
label="Humanized Text",
|
571 |
+
lines=8,
|
572 |
+
show_copy_button=True
|
573 |
+
)
|
574 |
+
|
575 |
+
with gr.Column():
|
576 |
+
combined_analysis = gr.Markdown(
|
577 |
+
label="AI Detection Analysis",
|
578 |
+
value="Analysis will appear here..."
|
579 |
+
)
|
580 |
+
|
581 |
+
# Info Tab
|
582 |
+
with gr.TabItem("ℹ️ Instructions"):
|
583 |
+
gr.Markdown("""
|
584 |
+
### 🎯 How to Use:
|
585 |
+
|
586 |
+
**Text Humanizer:**
|
587 |
+
1. Paste your AI-generated text
|
588 |
+
2. Choose humanization level
|
589 |
+
3. Get natural, human-like output
|
590 |
+
|
591 |
+
**AI Detector:**
|
592 |
+
1. Paste any text
|
593 |
+
2. Get detailed AI probability analysis
|
594 |
+
3. See breakdown of detection factors
|
595 |
+
|
596 |
+
**Combined Mode:**
|
597 |
+
1. Humanize and test in one step
|
598 |
+
2. Perfect for optimizing results
|
599 |
+
3. Iterate until satisfied
|
600 |
+
|
601 |
+
### 🔧 Features:
|
602 |
+
|
603 |
+
**Humanization Techniques:**
|
604 |
+
- ✅ Advanced vocabulary variations
|
605 |
+
- ✅ Natural sentence flow enhancement
|
606 |
+
- ✅ Academic tone preservation
|
607 |
+
- ✅ Structure diversification
|
608 |
+
- ✅ Linguistic pattern breaking
|
609 |
+
|
610 |
+
**AI Detection:**
|
611 |
+
- 🔍 7-point analysis system
|
612 |
+
- 📊 Detailed scoring breakdown
|
613 |
+
- 🎯 Confidence assessment
|
614 |
+
- 💡 Improvement suggestions
|
615 |
+
|
616 |
+
### ⚖️ Ethical Usage:
|
617 |
+
This tool is designed for:
|
618 |
+
- ✅ Improving writing quality
|
619 |
+
- ✅ Learning natural language patterns
|
620 |
+
- ✅ Educational purposes
|
621 |
+
- ✅ Understanding AI detection
|
622 |
+
|
623 |
+
**Please use responsibly:**
|
624 |
+
- 🚫 Don't use for plagiarism
|
625 |
+
- 🚫 Don't violate academic policies
|
626 |
+
- 🚫 Don't misrepresent authorship
|
627 |
+
- ✅ Maintain academic integrity
|
628 |
+
""")
|
629 |
+
|
630 |
+
# Event handlers
|
631 |
+
humanize_btn.click(
|
632 |
+
fn=process_text,
|
633 |
+
inputs=[humanize_input, humanization_level],
|
634 |
+
outputs=humanize_output
|
635 |
+
)
|
636 |
+
|
637 |
+
detect_btn.click(
|
638 |
+
fn=detect_ai_text,
|
639 |
+
inputs=[detect_input],
|
640 |
+
outputs=detect_output
|
641 |
+
)
|
642 |
+
|
643 |
+
combined_btn.click(
|
644 |
+
fn=combined_process,
|
645 |
+
inputs=[combined_input, combined_level],
|
646 |
+
outputs=[combined_humanized, combined_analysis]
|
647 |
+
)
|
648 |
+
|
649 |
+
if __name__ == "__main__":
|
650 |
+
demo.launch(
|
651 |
+
share=True, # Enable public sharing
|
652 |
+
server_name="0.0.0.0",
|
653 |
+
server_port=7860
|
654 |
+
)
|
humanizer_app.py
ADDED
@@ -0,0 +1,823 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
|
3 |
+
import torch
|
4 |
+
import random
|
5 |
+
import re
|
6 |
+
import warnings
|
7 |
+
import math
|
8 |
+
from collections import Counter
|
9 |
+
warnings.filterwarnings("ignore")
|
10 |
+
|
11 |
+
# Import NLTK with error handling
|
12 |
+
try:
|
13 |
+
import nltk
|
14 |
+
import textstat
|
15 |
+
from nltk.corpus import wordnet
|
16 |
+
from nltk.tokenize import sent_tokenize, word_tokenize
|
17 |
+
NLTK_AVAILABLE = True
|
18 |
+
except ImportError as e:
|
19 |
+
print(f"NLTK import error: {e}")
|
20 |
+
NLTK_AVAILABLE = False
|
21 |
+
# Fallback imports
|
22 |
+
import textstat
|
23 |
+
|
24 |
+
# Download required NLTK data if available
|
25 |
+
if NLTK_AVAILABLE:
|
26 |
+
try:
|
27 |
+
nltk.data.find('tokenizers/punkt_tab')
|
28 |
+
except LookupError:
|
29 |
+
print("Downloading punkt_tab...")
|
30 |
+
nltk.download('punkt_tab')
|
31 |
+
try:
|
32 |
+
nltk.data.find('tokenizers/punkt')
|
33 |
+
except LookupError:
|
34 |
+
print("Downloading punkt...")
|
35 |
+
nltk.download('punkt')
|
36 |
+
try:
|
37 |
+
nltk.data.find('corpora/wordnet')
|
38 |
+
except LookupError:
|
39 |
+
print("Downloading wordnet...")
|
40 |
+
nltk.download('wordnet')
|
41 |
+
try:
|
42 |
+
nltk.data.find('corpora/omw-1.4')
|
43 |
+
except LookupError:
|
44 |
+
print("Downloading omw-1.4...")
|
45 |
+
nltk.download('omw-1.4')
|
46 |
+
|
47 |
+
# Load multiple models for diverse paraphrasing
|
48 |
+
models = {
|
49 |
+
"t5_paraphrase": {
|
50 |
+
"model_name": "Vamsi/T5_Paraphrase_Paws",
|
51 |
+
"tokenizer": None,
|
52 |
+
"model": None
|
53 |
+
},
|
54 |
+
"pegasus": {
|
55 |
+
"model_name": "tuner007/pegasus_paraphrase",
|
56 |
+
"tokenizer": None,
|
57 |
+
"model": None
|
58 |
+
}
|
59 |
+
}
|
60 |
+
|
61 |
+
# Initialize models
|
62 |
+
for key, model_info in models.items():
|
63 |
+
try:
|
64 |
+
model_info["tokenizer"] = AutoTokenizer.from_pretrained(model_info["model_name"])
|
65 |
+
model_info["model"] = AutoModelForSeq2SeqLM.from_pretrained(model_info["model_name"])
|
66 |
+
print(f"Loaded {key} model successfully")
|
67 |
+
except Exception as e:
|
68 |
+
print(f"Failed to load {key}: {e}")
|
69 |
+
|
70 |
+
class AdvancedHumanizer:
|
71 |
+
def __init__(self):
|
72 |
+
self.transition_words = [
|
73 |
+
"However", "Nevertheless", "Furthermore", "Moreover", "Additionally",
|
74 |
+
"Consequently", "Therefore", "Thus", "In contrast", "Similarly",
|
75 |
+
"On the other hand", "Meanwhile", "Subsequently", "Notably",
|
76 |
+
"Importantly", "Significantly", "Interestingly", "Remarkably"
|
77 |
+
]
|
78 |
+
|
79 |
+
self.hedging_phrases = [
|
80 |
+
"appears to", "seems to", "tends to", "suggests that", "indicates that",
|
81 |
+
"may well", "might be", "could be", "potentially", "presumably",
|
82 |
+
"arguably", "to some extent", "in many cases", "generally speaking"
|
83 |
+
]
|
84 |
+
|
85 |
+
self.academic_connectors = [
|
86 |
+
"In light of this", "Building upon this", "This finding suggests",
|
87 |
+
"It is worth noting that", "This observation", "These results",
|
88 |
+
"The evidence indicates", "This approach", "The data reveals"
|
89 |
+
]
|
90 |
+
|
91 |
+
def add_natural_variations(self, text):
|
92 |
+
"""Add natural linguistic variations to make text less robotic"""
|
93 |
+
if NLTK_AVAILABLE:
|
94 |
+
sentences = sent_tokenize(text)
|
95 |
+
else:
|
96 |
+
# Fallback: simple sentence splitting
|
97 |
+
sentences = [s.strip() for s in text.split('.') if s.strip()]
|
98 |
+
|
99 |
+
varied_sentences = []
|
100 |
+
|
101 |
+
for i, sentence in enumerate(sentences):
|
102 |
+
if not sentence.endswith('.') and NLTK_AVAILABLE:
|
103 |
+
sentence += '.'
|
104 |
+
elif not sentence.endswith('.') and not NLTK_AVAILABLE:
|
105 |
+
sentence += '.'
|
106 |
+
|
107 |
+
# Randomly add hedging language
|
108 |
+
if random.random() < 0.3 and not any(phrase in sentence.lower() for phrase in self.hedging_phrases):
|
109 |
+
hedge = random.choice(self.hedging_phrases)
|
110 |
+
if sentence.startswith("The ") or sentence.startswith("This "):
|
111 |
+
sentence = sentence.replace("The ", f"The {hedge} ", 1)
|
112 |
+
sentence = sentence.replace("This ", f"This {hedge} ", 1)
|
113 |
+
|
114 |
+
# Add transitional phrases for flow
|
115 |
+
if i > 0 and random.random() < 0.4:
|
116 |
+
connector = random.choice(self.academic_connectors)
|
117 |
+
sentence = f"{connector}, {sentence.lower()}"
|
118 |
+
|
119 |
+
varied_sentences.append(sentence)
|
120 |
+
|
121 |
+
return " ".join(varied_sentences)
|
122 |
+
|
123 |
+
def diversify_vocabulary(self, text):
|
124 |
+
"""Replace common words with synonyms for variation"""
|
125 |
+
if not NLTK_AVAILABLE:
|
126 |
+
# Fallback: simple word replacements
|
127 |
+
replacements = {
|
128 |
+
"significant": "notable", "important": "crucial", "demonstrate": "show",
|
129 |
+
"utilize": "use", "implement": "apply", "generate": "create",
|
130 |
+
"facilitate": "help", "optimize": "improve", "analyze": "examine"
|
131 |
+
}
|
132 |
+
result = text
|
133 |
+
for old, new in replacements.items():
|
134 |
+
result = re.sub(r'\b' + old + r'\b', new, result, flags=re.IGNORECASE)
|
135 |
+
return result
|
136 |
+
|
137 |
+
words = word_tokenize(text)
|
138 |
+
result = []
|
139 |
+
|
140 |
+
for word in words:
|
141 |
+
if word.isalpha() and len(word) > 4 and random.random() < 0.2:
|
142 |
+
synonyms = []
|
143 |
+
for syn in wordnet.synsets(word):
|
144 |
+
for lemma in syn.lemmas():
|
145 |
+
if lemma.name() != word and '_' not in lemma.name():
|
146 |
+
synonyms.append(lemma.name())
|
147 |
+
|
148 |
+
if synonyms:
|
149 |
+
replacement = random.choice(synonyms[:3]) # Use top 3 synonyms
|
150 |
+
result.append(replacement)
|
151 |
+
else:
|
152 |
+
result.append(word)
|
153 |
+
else:
|
154 |
+
result.append(word)
|
155 |
+
|
156 |
+
return " ".join(result)
|
157 |
+
|
158 |
+
def adjust_sentence_structure(self, text):
|
159 |
+
"""Modify sentence structures for more natural flow"""
|
160 |
+
if NLTK_AVAILABLE:
|
161 |
+
sentences = sent_tokenize(text)
|
162 |
+
else:
|
163 |
+
# Fallback: simple sentence splitting
|
164 |
+
sentences = [s.strip() + '.' for s in text.split('.') if s.strip()]
|
165 |
+
|
166 |
+
modified = []
|
167 |
+
|
168 |
+
for sentence in sentences:
|
169 |
+
# Randomly split long sentences
|
170 |
+
if len(sentence.split()) > 20 and random.random() < 0.4:
|
171 |
+
words = sentence.split()
|
172 |
+
mid_point = len(words) // 2
|
173 |
+
# Find a good breaking point near the middle
|
174 |
+
for i in range(mid_point - 3, mid_point + 3):
|
175 |
+
if i < len(words) and words[i].rstrip('.,').lower() in ['and', 'but', 'which', 'that']:
|
176 |
+
part1 = " ".join(words[:i]) + "."
|
177 |
+
part2 = " ".join(words[i+1:])
|
178 |
+
if part2:
|
179 |
+
part2 = part2[0].upper() + part2[1:]
|
180 |
+
modified.extend([part1, part2])
|
181 |
+
break
|
182 |
+
else:
|
183 |
+
modified.append(sentence)
|
184 |
+
else:
|
185 |
+
modified.append(sentence)
|
186 |
+
|
187 |
+
return " ".join(modified)
|
188 |
+
|
189 |
+
def paraphrase_with_multiple_models(self, text, chunk_size=300):
|
190 |
+
"""Use multiple models to paraphrase different parts of the text"""
|
191 |
+
# Check if any models are available
|
192 |
+
available_models = [k for k, v in models.items() if v["model"] is not None]
|
193 |
+
if not available_models:
|
194 |
+
# No models available, use fallback humanization
|
195 |
+
return self.fallback_humanization(text)
|
196 |
+
|
197 |
+
if len(text) <= chunk_size:
|
198 |
+
return self.paraphrase_single_chunk(text)
|
199 |
+
|
200 |
+
# Split into chunks
|
201 |
+
if NLTK_AVAILABLE:
|
202 |
+
sentences = sent_tokenize(text)
|
203 |
+
else:
|
204 |
+
sentences = [s.strip() + '.' for s in text.split('.') if s.strip()]
|
205 |
+
|
206 |
+
chunks = []
|
207 |
+
current_chunk = ""
|
208 |
+
|
209 |
+
for sentence in sentences:
|
210 |
+
if len(current_chunk + sentence) <= chunk_size:
|
211 |
+
current_chunk += sentence + " "
|
212 |
+
else:
|
213 |
+
if current_chunk:
|
214 |
+
chunks.append(current_chunk.strip())
|
215 |
+
current_chunk = sentence + " "
|
216 |
+
|
217 |
+
if current_chunk:
|
218 |
+
chunks.append(current_chunk.strip())
|
219 |
+
|
220 |
+
# Paraphrase each chunk with different models
|
221 |
+
paraphrased_chunks = []
|
222 |
+
for i, chunk in enumerate(chunks):
|
223 |
+
paraphrased = self.paraphrase_single_chunk(chunk, model_choice=i % len(available_models))
|
224 |
+
paraphrased_chunks.append(paraphrased)
|
225 |
+
|
226 |
+
return " ".join(paraphrased_chunks)
|
227 |
+
|
228 |
+
def fallback_humanization(self, text):
|
229 |
+
"""Fallback humanization when no AI models are available"""
|
230 |
+
# Use the vocabulary diversification and natural variations
|
231 |
+
result = self.diversify_vocabulary(text)
|
232 |
+
result = self.add_natural_variations(result)
|
233 |
+
return result
|
234 |
+
|
235 |
+
def paraphrase_single_chunk(self, text, model_choice=0):
|
236 |
+
"""Paraphrase a single chunk of text"""
|
237 |
+
available_models = [k for k, v in models.items() if v["model"] is not None]
|
238 |
+
if not available_models:
|
239 |
+
# No models available, use fallback
|
240 |
+
return self.fallback_humanization(text)
|
241 |
+
|
242 |
+
model_key = available_models[model_choice % len(available_models)]
|
243 |
+
model_info = models[model_key]
|
244 |
+
|
245 |
+
try:
|
246 |
+
if model_key == "t5_paraphrase":
|
247 |
+
input_ids = model_info["tokenizer"].encode(
|
248 |
+
f"paraphrase: {text}",
|
249 |
+
return_tensors="pt",
|
250 |
+
max_length=512,
|
251 |
+
truncation=True
|
252 |
+
)
|
253 |
+
outputs = model_info["model"].generate(
|
254 |
+
input_ids=input_ids,
|
255 |
+
max_length=len(text.split()) + 50,
|
256 |
+
num_beams=5,
|
257 |
+
num_return_sequences=1,
|
258 |
+
temperature=1.2,
|
259 |
+
top_k=50,
|
260 |
+
top_p=0.92,
|
261 |
+
do_sample=True,
|
262 |
+
early_stopping=True
|
263 |
+
)
|
264 |
+
result = model_info["tokenizer"].decode(outputs[0], skip_special_tokens=True)
|
265 |
+
|
266 |
+
elif model_key == "pegasus":
|
267 |
+
input_ids = model_info["tokenizer"].encode(
|
268 |
+
text,
|
269 |
+
return_tensors="pt",
|
270 |
+
max_length=512,
|
271 |
+
truncation=True
|
272 |
+
)
|
273 |
+
outputs = model_info["model"].generate(
|
274 |
+
input_ids=input_ids,
|
275 |
+
max_length=len(text.split()) + 30,
|
276 |
+
num_beams=4,
|
277 |
+
temperature=1.1,
|
278 |
+
top_p=0.9,
|
279 |
+
do_sample=True
|
280 |
+
)
|
281 |
+
result = model_info["tokenizer"].decode(outputs[0], skip_special_tokens=True)
|
282 |
+
|
283 |
+
return result if result and len(result) > 10 else self.fallback_humanization(text)
|
284 |
+
except Exception as e:
|
285 |
+
print(f"Error with {model_key}: {e}")
|
286 |
+
return self.fallback_humanization(text)
|
287 |
+
|
288 |
+
class AIDetector:
|
289 |
+
def __init__(self):
|
290 |
+
"""Initialize AI detection patterns and thresholds"""
|
291 |
+
# Common AI-generated text patterns
|
292 |
+
self.ai_phrases = [
|
293 |
+
"demonstrates significant", "substantial improvements", "comprehensive analysis",
|
294 |
+
"furthermore", "moreover", "additionally", "consequently", "therefore",
|
295 |
+
"implementation of", "utilization of", "optimization of", "enhancement of",
|
296 |
+
"facilitate", "demonstrate", "indicate", "substantial", "comprehensive",
|
297 |
+
"significant improvements", "notable enhancements", "effective approach",
|
298 |
+
"robust methodology", "systematic approach", "extensive evaluation",
|
299 |
+
"empirical results", "experimental validation", "performance metrics",
|
300 |
+
"benchmark datasets", "state-of-the-art", "cutting-edge", "novel approach",
|
301 |
+
"innovative solution", "groundbreaking", "revolutionary", "paradigm shift"
|
302 |
+
]
|
303 |
+
|
304 |
+
# Academic buzzwords that AI overuses
|
305 |
+
self.overused_academic_words = [
|
306 |
+
"significant", "substantial", "comprehensive", "extensive", "robust",
|
307 |
+
"novel", "innovative", "efficient", "effective", "optimal", "superior",
|
308 |
+
"enhanced", "improved", "advanced", "sophisticated", "cutting-edge",
|
309 |
+
"state-of-the-art", "groundbreaking", "revolutionary", "paradigm"
|
310 |
+
]
|
311 |
+
|
312 |
+
# Transition words AI uses excessively
|
313 |
+
self.excessive_transitions = [
|
314 |
+
"furthermore", "moreover", "additionally", "consequently", "therefore",
|
315 |
+
"thus", "hence", "nevertheless", "nonetheless", "however"
|
316 |
+
]
|
317 |
+
|
318 |
+
# Formal structures AI tends to overuse
|
319 |
+
self.formal_patterns = [
|
320 |
+
r"the implementation of \w+",
|
321 |
+
r"the utilization of \w+",
|
322 |
+
r"in order to \w+",
|
323 |
+
r"it is important to note that",
|
324 |
+
r"it should be emphasized that",
|
325 |
+
r"it can be observed that",
|
326 |
+
r"the results demonstrate that",
|
327 |
+
r"the findings indicate that"
|
328 |
+
]
|
329 |
+
|
330 |
+
def calculate_ai_probability(self, text):
|
331 |
+
"""Calculate the probability that text is AI-generated"""
|
332 |
+
if not text or len(text.strip()) < 50:
|
333 |
+
return {"probability": 0, "confidence": "Low", "details": {"error": "Text too short for analysis"}}
|
334 |
+
|
335 |
+
scores = {}
|
336 |
+
|
337 |
+
# 1. Check for AI phrases
|
338 |
+
scores['ai_phrases'] = self._check_ai_phrases(text)
|
339 |
+
|
340 |
+
# 2. Check vocabulary repetition
|
341 |
+
scores['vocab_repetition'] = self._check_vocabulary_repetition(text)
|
342 |
+
|
343 |
+
# 3. Check sentence structure patterns
|
344 |
+
scores['structure_patterns'] = self._check_structure_patterns(text)
|
345 |
+
|
346 |
+
# 4. Check transition word overuse
|
347 |
+
scores['transition_overuse'] = self._check_transition_overuse(text)
|
348 |
+
|
349 |
+
# 5. Check formal pattern overuse
|
350 |
+
scores['formal_patterns'] = self._check_formal_patterns(text)
|
351 |
+
|
352 |
+
# 6. Check sentence length consistency
|
353 |
+
scores['sentence_consistency'] = self._check_sentence_consistency(text)
|
354 |
+
|
355 |
+
# 7. Check readability patterns
|
356 |
+
scores['readability'] = self._check_readability_patterns(text)
|
357 |
+
|
358 |
+
# Calculate weighted final score
|
359 |
+
weights = {
|
360 |
+
'ai_phrases': 0.2,
|
361 |
+
'vocab_repetition': 0.15,
|
362 |
+
'structure_patterns': 0.15,
|
363 |
+
'transition_overuse': 0.15,
|
364 |
+
'formal_patterns': 0.15,
|
365 |
+
'sentence_consistency': 0.1,
|
366 |
+
'readability': 0.1
|
367 |
+
}
|
368 |
+
|
369 |
+
final_score = sum(scores[key] * weights[key] for key in weights)
|
370 |
+
final_score = min(100, max(0, final_score)) # Clamp between 0-100
|
371 |
+
|
372 |
+
# Determine confidence level
|
373 |
+
if final_score >= 80:
|
374 |
+
confidence = "Very High"
|
375 |
+
verdict = "Likely AI-Generated"
|
376 |
+
elif final_score >= 60:
|
377 |
+
confidence = "High"
|
378 |
+
verdict = "Probably AI-Generated"
|
379 |
+
elif final_score >= 40:
|
380 |
+
confidence = "Medium"
|
381 |
+
verdict = "Possibly AI-Generated"
|
382 |
+
elif final_score >= 20:
|
383 |
+
confidence = "Low"
|
384 |
+
verdict = "Probably Human-Written"
|
385 |
+
else:
|
386 |
+
confidence = "Very Low"
|
387 |
+
verdict = "Likely Human-Written"
|
388 |
+
|
389 |
+
return {
|
390 |
+
"probability": round(final_score, 1),
|
391 |
+
"confidence": confidence,
|
392 |
+
"verdict": verdict,
|
393 |
+
"details": {
|
394 |
+
"ai_phrases_score": round(scores['ai_phrases'], 1),
|
395 |
+
"vocabulary_repetition": round(scores['vocab_repetition'], 1),
|
396 |
+
"structure_patterns": round(scores['structure_patterns'], 1),
|
397 |
+
"transition_overuse": round(scores['transition_overuse'], 1),
|
398 |
+
"formal_patterns": round(scores['formal_patterns'], 1),
|
399 |
+
"sentence_consistency": round(scores['sentence_consistency'], 1),
|
400 |
+
"readability_score": round(scores['readability'], 1)
|
401 |
+
}
|
402 |
+
}
|
403 |
+
|
404 |
+
def _check_ai_phrases(self, text):
|
405 |
+
"""Check for common AI-generated phrases"""
|
406 |
+
text_lower = text.lower()
|
407 |
+
phrase_count = sum(1 for phrase in self.ai_phrases if phrase in text_lower)
|
408 |
+
words = len(text.split())
|
409 |
+
|
410 |
+
if words == 0:
|
411 |
+
return 0
|
412 |
+
|
413 |
+
# Score based on phrase density
|
414 |
+
density = (phrase_count / words) * 1000 # Per 1000 words
|
415 |
+
return min(100, density * 10) # Scale to 0-100
|
416 |
+
|
417 |
+
def _check_vocabulary_repetition(self, text):
|
418 |
+
"""Check for repetitive vocabulary typical of AI"""
|
419 |
+
words = [word.lower().strip('.,!?;:') for word in text.split() if word.isalpha()]
|
420 |
+
if len(words) < 10:
|
421 |
+
return 0
|
422 |
+
|
423 |
+
word_counts = Counter(words)
|
424 |
+
overused_count = sum(1 for word in self.overused_academic_words if word_counts.get(word, 0) > 1)
|
425 |
+
|
426 |
+
# Calculate repetition score
|
427 |
+
total_overused_words = len(self.overused_academic_words)
|
428 |
+
repetition_ratio = overused_count / total_overused_words if total_overused_words > 0 else 0
|
429 |
+
|
430 |
+
return min(100, repetition_ratio * 200) # Scale to 0-100
|
431 |
+
|
432 |
+
def _check_structure_patterns(self, text):
|
433 |
+
"""Check for repetitive sentence structures"""
|
434 |
+
if NLTK_AVAILABLE:
|
435 |
+
sentences = sent_tokenize(text)
|
436 |
+
else:
|
437 |
+
sentences = [s.strip() for s in text.split('.') if s.strip()]
|
438 |
+
|
439 |
+
if len(sentences) < 3:
|
440 |
+
return 0
|
441 |
+
|
442 |
+
# Check for similar sentence starters
|
443 |
+
starters = [s.split()[:3] for s in sentences if len(s.split()) >= 3]
|
444 |
+
starter_counts = Counter([' '.join(starter) for starter in starters])
|
445 |
+
|
446 |
+
repeated_starters = sum(1 for count in starter_counts.values() if count > 1)
|
447 |
+
repetition_ratio = repeated_starters / len(sentences) if len(sentences) > 0 else 0
|
448 |
+
|
449 |
+
return min(100, repetition_ratio * 150) # Scale to 0-100
|
450 |
+
|
451 |
+
def _check_transition_overuse(self, text):
|
452 |
+
"""Check for excessive use of transition words"""
|
453 |
+
text_lower = text.lower()
|
454 |
+
transition_count = sum(1 for transition in self.excessive_transitions if transition in text_lower)
|
455 |
+
words = len(text.split())
|
456 |
+
|
457 |
+
if words == 0:
|
458 |
+
return 0
|
459 |
+
|
460 |
+
# Score based on transition density
|
461 |
+
density = (transition_count / words) * 100 # Percentage
|
462 |
+
return min(100, density * 20) # Scale to 0-100
|
463 |
+
|
464 |
+
def _check_formal_patterns(self, text):
|
465 |
+
"""Check for overly formal patterns typical of AI"""
|
466 |
+
pattern_count = 0
|
467 |
+
text_lower = text.lower()
|
468 |
+
|
469 |
+
for pattern in self.formal_patterns:
|
470 |
+
matches = re.findall(pattern, text_lower)
|
471 |
+
pattern_count += len(matches)
|
472 |
+
|
473 |
+
words = len(text.split())
|
474 |
+
if words == 0:
|
475 |
+
return 0
|
476 |
+
|
477 |
+
density = (pattern_count / words) * 1000 # Per 1000 words
|
478 |
+
return min(100, density * 15) # Scale to 0-100
|
479 |
+
|
480 |
+
def _check_sentence_consistency(self, text):
|
481 |
+
"""Check for unnaturally consistent sentence lengths"""
|
482 |
+
if NLTK_AVAILABLE:
|
483 |
+
sentences = sent_tokenize(text)
|
484 |
+
else:
|
485 |
+
sentences = [s.strip() for s in text.split('.') if s.strip()]
|
486 |
+
|
487 |
+
if len(sentences) < 5:
|
488 |
+
return 0
|
489 |
+
|
490 |
+
lengths = [len(s.split()) for s in sentences]
|
491 |
+
avg_length = sum(lengths) / len(lengths)
|
492 |
+
|
493 |
+
# Calculate variance
|
494 |
+
variance = sum((length - avg_length) ** 2 for length in lengths) / len(lengths)
|
495 |
+
std_dev = math.sqrt(variance)
|
496 |
+
|
497 |
+
# Low variance indicates AI (unnaturally consistent)
|
498 |
+
consistency_score = 100 - min(100, std_dev * 10) # Invert score
|
499 |
+
return max(0, consistency_score - 20) # Adjust threshold
|
500 |
+
|
501 |
+
def _check_readability_patterns(self, text):
|
502 |
+
"""Check readability patterns that suggest AI generation"""
|
503 |
+
try:
|
504 |
+
# Simple readability metrics
|
505 |
+
words = text.split()
|
506 |
+
sentences = len([s for s in text.split('.') if s.strip()])
|
507 |
+
|
508 |
+
if sentences == 0:
|
509 |
+
return 0
|
510 |
+
|
511 |
+
avg_words_per_sentence = len(words) / sentences
|
512 |
+
|
513 |
+
# AI tends to have very consistent, moderate sentence lengths
|
514 |
+
if 15 <= avg_words_per_sentence <= 25:
|
515 |
+
return 30 # Moderate AI indicator
|
516 |
+
elif 25 < avg_words_per_sentence <= 35:
|
517 |
+
return 50 # Higher AI indicator
|
518 |
+
else:
|
519 |
+
return 10 # More natural variation
|
520 |
+
|
521 |
+
except:
|
522 |
+
return 0
|
523 |
+
|
524 |
+
# Initialize AI detector
|
525 |
+
ai_detector = AIDetector()
|
526 |
+
|
527 |
+
# Initialize humanizer
|
528 |
+
humanizer = AdvancedHumanizer()
|
529 |
+
|
530 |
+
def detect_ai_text(input_text):
|
531 |
+
"""Detect if text is AI-generated"""
|
532 |
+
if not input_text.strip():
|
533 |
+
return "Please enter some text to analyze."
|
534 |
+
|
535 |
+
result = ai_detector.calculate_ai_probability(input_text)
|
536 |
+
|
537 |
+
# Format the output
|
538 |
+
output = f"""
|
539 |
+
## 🤖 AI Detection Analysis
|
540 |
+
|
541 |
+
**Overall Assessment:** {result['verdict']}
|
542 |
+
**AI Probability:** {result['probability']}%
|
543 |
+
**Confidence Level:** {result['confidence']}
|
544 |
+
|
545 |
+
### 📊 Detailed Breakdown:
|
546 |
+
|
547 |
+
- **AI Phrases Score:** {result['details']['ai_phrases_score']}%
|
548 |
+
- **Vocabulary Repetition:** {result['details']['vocabulary_repetition']}%
|
549 |
+
- **Structure Patterns:** {result['details']['structure_patterns']}%
|
550 |
+
- **Transition Overuse:** {result['details']['transition_overuse']}%
|
551 |
+
- **Formal Patterns:** {result['details']['formal_patterns']}%
|
552 |
+
- **Sentence Consistency:** {result['details']['sentence_consistency']}%
|
553 |
+
- **Readability Score:** {result['details']['readability_score']}%
|
554 |
+
|
555 |
+
### 💡 Interpretation:
|
556 |
+
- **0-20%:** Likely human-written with natural variations
|
557 |
+
- **21-40%:** Possibly AI-generated or heavily edited
|
558 |
+
- **41-60%:** Probably AI-generated with some humanization
|
559 |
+
- **61-80%:** Likely AI-generated with minimal editing
|
560 |
+
- **81-100%:** Very likely raw AI-generated content
|
561 |
+
|
562 |
+
### 🛡️ Tips to Improve:
|
563 |
+
- Add more natural vocabulary variations
|
564 |
+
- Use varied sentence structures
|
565 |
+
- Include personal insights or examples
|
566 |
+
- Reduce formal academic buzzwords
|
567 |
+
- Add natural transitions and flow
|
568 |
+
"""
|
569 |
+
|
570 |
+
return output
|
571 |
+
|
572 |
+
def humanize_academic_text(input_text, humanization_level="Moderate"):
|
573 |
+
"""
|
574 |
+
Advanced humanization with multiple techniques
|
575 |
+
"""
|
576 |
+
if not input_text.strip():
|
577 |
+
return "Please enter some text to humanize."
|
578 |
+
|
579 |
+
# Step 1: Initial paraphrasing with multiple models
|
580 |
+
paraphrased = humanizer.paraphrase_with_multiple_models(input_text)
|
581 |
+
|
582 |
+
# Apply different levels of humanization
|
583 |
+
if humanization_level == "Light":
|
584 |
+
# Minimal changes - just paraphrasing
|
585 |
+
result = paraphrased
|
586 |
+
elif humanization_level == "Moderate":
|
587 |
+
# Add natural variations and some vocabulary changes
|
588 |
+
result = humanizer.add_natural_variations(paraphrased)
|
589 |
+
result = humanizer.diversify_vocabulary(result)
|
590 |
+
else: # Heavy
|
591 |
+
# Apply all techniques
|
592 |
+
result = humanizer.add_natural_variations(paraphrased)
|
593 |
+
result = humanizer.diversify_vocabulary(result)
|
594 |
+
result = humanizer.adjust_sentence_structure(result)
|
595 |
+
|
596 |
+
# Clean up formatting
|
597 |
+
result = re.sub(r'\s+', ' ', result).strip()
|
598 |
+
result = re.sub(r'\s+([.,!?;:])', r'\1', result)
|
599 |
+
|
600 |
+
# Ensure proper capitalization
|
601 |
+
if NLTK_AVAILABLE:
|
602 |
+
sentences = sent_tokenize(result)
|
603 |
+
else:
|
604 |
+
sentences = [s.strip() for s in result.split('.') if s.strip()]
|
605 |
+
|
606 |
+
formatted_sentences = []
|
607 |
+
for sentence in sentences:
|
608 |
+
if sentence:
|
609 |
+
sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
|
610 |
+
if not sentence.endswith(('.', '!', '?')):
|
611 |
+
sentence += '.'
|
612 |
+
formatted_sentences.append(sentence)
|
613 |
+
|
614 |
+
final_result = " ".join(formatted_sentences)
|
615 |
+
|
616 |
+
return final_result if final_result else "Error processing text. Please try again."
|
617 |
+
|
618 |
+
# Create Gradio interface with tabs for both humanization and AI detection
|
619 |
+
with gr.Blocks(theme="soft", title="AI Text Humanizer & Detector") as demo:
|
620 |
+
gr.Markdown("""
|
621 |
+
# 🤖➡️👨 AI Text Humanizer & Detector Pro
|
622 |
+
|
623 |
+
**Complete solution for AI text processing - Humanize AND Detect AI-generated content**
|
624 |
+
|
625 |
+
Transform robotic AI text into natural, human-like writing, then verify the results with our built-in AI detector.
|
626 |
+
""")
|
627 |
+
|
628 |
+
with gr.Tabs():
|
629 |
+
# Humanization Tab
|
630 |
+
with gr.TabItem("🎭 Text Humanizer"):
|
631 |
+
gr.Markdown("### Transform AI text into natural, human-like writing")
|
632 |
+
|
633 |
+
with gr.Row():
|
634 |
+
with gr.Column():
|
635 |
+
humanize_input = gr.Textbox(
|
636 |
+
lines=10,
|
637 |
+
placeholder="Enter machine-generated or robotic academic text here...",
|
638 |
+
label="Raw Input Text",
|
639 |
+
info="Paste your AI-generated text that needs to be humanized"
|
640 |
+
)
|
641 |
+
|
642 |
+
humanization_level = gr.Radio(
|
643 |
+
choices=["Light", "Moderate", "Heavy"],
|
644 |
+
value="Moderate",
|
645 |
+
label="Humanization Level",
|
646 |
+
info="Light: Basic paraphrasing | Moderate: Natural variations + vocabulary | Heavy: All techniques"
|
647 |
+
)
|
648 |
+
|
649 |
+
humanize_btn = gr.Button("🚀 Humanize Text", variant="primary", size="lg")
|
650 |
+
|
651 |
+
with gr.Column():
|
652 |
+
humanize_output = gr.Textbox(
|
653 |
+
label="Humanized Academic Output",
|
654 |
+
lines=10,
|
655 |
+
show_copy_button=True,
|
656 |
+
info="Copy this natural, human-like text"
|
657 |
+
)
|
658 |
+
|
659 |
+
# Examples for humanizer
|
660 |
+
gr.Examples(
|
661 |
+
examples=[
|
662 |
+
[
|
663 |
+
"The implementation of artificial intelligence algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.",
|
664 |
+
"Moderate"
|
665 |
+
],
|
666 |
+
[
|
667 |
+
"Machine learning models exhibit superior performance characteristics when evaluated against traditional statistical approaches in predictive analytics applications.",
|
668 |
+
"Heavy"
|
669 |
+
]
|
670 |
+
],
|
671 |
+
inputs=[humanize_input, humanization_level],
|
672 |
+
outputs=humanize_output,
|
673 |
+
fn=humanize_academic_text
|
674 |
+
)
|
675 |
+
|
676 |
+
# AI Detection Tab
|
677 |
+
with gr.TabItem("🕵️ AI Detector"):
|
678 |
+
gr.Markdown("### Analyze text to detect if it's AI-generated")
|
679 |
+
|
680 |
+
with gr.Row():
|
681 |
+
with gr.Column():
|
682 |
+
detect_input = gr.Textbox(
|
683 |
+
lines=10,
|
684 |
+
placeholder="Paste text here to check if it's AI-generated...",
|
685 |
+
label="Text to Analyze",
|
686 |
+
info="Enter any text to check its AI probability"
|
687 |
+
)
|
688 |
+
|
689 |
+
detect_btn = gr.Button("🔍 Analyze Text", variant="secondary", size="lg")
|
690 |
+
|
691 |
+
with gr.Column():
|
692 |
+
detect_output = gr.Markdown(
|
693 |
+
label="AI Detection Results",
|
694 |
+
value="Analysis results will appear here..."
|
695 |
+
)
|
696 |
+
|
697 |
+
# Examples for detector
|
698 |
+
gr.Examples(
|
699 |
+
examples=[
|
700 |
+
["The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets. Furthermore, these results indicate substantial enhancements in performance."],
|
701 |
+
["I love going to the coffee shop on weekends. The barista there makes the best cappuccino I've ever had, and I always end up chatting with other customers about random stuff."],
|
702 |
+
["The comprehensive analysis reveals that the optimization of neural network architectures facilitates enhanced performance characteristics in predictive analytics applications."]
|
703 |
+
],
|
704 |
+
inputs=[detect_input],
|
705 |
+
outputs=detect_output,
|
706 |
+
fn=detect_ai_text
|
707 |
+
)
|
708 |
+
|
709 |
+
# Combined Analysis Tab
|
710 |
+
with gr.TabItem("🔄 Humanize & Test"):
|
711 |
+
gr.Markdown("### Humanize text and immediately test the results")
|
712 |
+
|
713 |
+
with gr.Column():
|
714 |
+
combined_input = gr.Textbox(
|
715 |
+
lines=8,
|
716 |
+
placeholder="Enter AI-generated text to humanize and test...",
|
717 |
+
label="Original AI Text",
|
718 |
+
info="This will be humanized and then tested for AI detection"
|
719 |
+
)
|
720 |
+
|
721 |
+
combined_level = gr.Radio(
|
722 |
+
choices=["Light", "Moderate", "Heavy"],
|
723 |
+
value="Moderate",
|
724 |
+
label="Humanization Level"
|
725 |
+
)
|
726 |
+
|
727 |
+
combined_btn = gr.Button("🔄 Humanize & Analyze", variant="primary", size="lg")
|
728 |
+
|
729 |
+
with gr.Row():
|
730 |
+
with gr.Column():
|
731 |
+
combined_humanized = gr.Textbox(
|
732 |
+
label="Humanized Text",
|
733 |
+
lines=8,
|
734 |
+
show_copy_button=True
|
735 |
+
)
|
736 |
+
|
737 |
+
with gr.Column():
|
738 |
+
combined_analysis = gr.Markdown(
|
739 |
+
label="AI Detection Analysis",
|
740 |
+
value="Analysis will appear here..."
|
741 |
+
)
|
742 |
+
|
743 |
+
# Settings & Info Tab
|
744 |
+
with gr.TabItem("ℹ️ Info & Settings"):
|
745 |
+
gr.Markdown("""
|
746 |
+
### 🎯 How to Use:
|
747 |
+
|
748 |
+
**Humanizer:**
|
749 |
+
1. Paste your AI-generated text
|
750 |
+
2. Choose humanization level
|
751 |
+
3. Get natural, human-like output
|
752 |
+
|
753 |
+
**AI Detector:**
|
754 |
+
1. Paste any text
|
755 |
+
2. Get detailed AI probability analysis
|
756 |
+
3. See breakdown of detection factors
|
757 |
+
|
758 |
+
**Combined Mode:**
|
759 |
+
1. Humanize and test in one step
|
760 |
+
2. Perfect for optimizing results
|
761 |
+
3. Iterate until satisfied
|
762 |
+
|
763 |
+
### 🔧 Features:
|
764 |
+
|
765 |
+
**Humanization:**
|
766 |
+
- ✅ Multiple AI models for paraphrasing
|
767 |
+
- ✅ Natural vocabulary variations
|
768 |
+
- ✅ Sentence structure optimization
|
769 |
+
- ✅ Academic tone preservation
|
770 |
+
- ✅ Three intensity levels
|
771 |
+
|
772 |
+
**AI Detection:**
|
773 |
+
- 🔍 Advanced pattern recognition
|
774 |
+
- 📊 Detailed scoring breakdown
|
775 |
+
- 🎯 Multiple detection criteria
|
776 |
+
- 📈 Confidence assessment
|
777 |
+
- 💡 Improvement suggestions
|
778 |
+
|
779 |
+
### ⚖️ Ethical Usage:
|
780 |
+
This tool is for improving writing quality and understanding AI detection.
|
781 |
+
Use responsibly and maintain academic integrity.
|
782 |
+
""")
|
783 |
+
|
784 |
+
# Event handlers
|
785 |
+
humanize_btn.click(
|
786 |
+
fn=humanize_academic_text,
|
787 |
+
inputs=[humanize_input, humanization_level],
|
788 |
+
outputs=humanize_output
|
789 |
+
)
|
790 |
+
|
791 |
+
detect_btn.click(
|
792 |
+
fn=detect_ai_text,
|
793 |
+
inputs=[detect_input],
|
794 |
+
outputs=detect_output
|
795 |
+
)
|
796 |
+
|
797 |
+
def combined_process(text, level):
|
798 |
+
"""Humanize text and then analyze it"""
|
799 |
+
if not text.strip():
|
800 |
+
return "Please enter text to process.", "No analysis available."
|
801 |
+
|
802 |
+
# First humanize
|
803 |
+
humanized = humanize_academic_text(text, level)
|
804 |
+
|
805 |
+
# Then analyze
|
806 |
+
analysis = detect_ai_text(humanized)
|
807 |
+
|
808 |
+
return humanized, analysis
|
809 |
+
|
810 |
+
combined_btn.click(
|
811 |
+
fn=combined_process,
|
812 |
+
inputs=[combined_input, combined_level],
|
813 |
+
outputs=[combined_humanized, combined_analysis]
|
814 |
+
)
|
815 |
+
|
816 |
+
if __name__ == "__main__":
|
817 |
+
demo.launch(
|
818 |
+
share=False,
|
819 |
+
debug=True,
|
820 |
+
show_error=True,
|
821 |
+
server_name="127.0.0.1",
|
822 |
+
server_port=7860
|
823 |
+
)
|
humanizer_batch.py
ADDED
@@ -0,0 +1,329 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import pandas as pd
|
3 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
4 |
+
import torch
|
5 |
+
import random
|
6 |
+
import re
|
7 |
+
import warnings
|
8 |
+
warnings.filterwarnings("ignore")
|
9 |
+
|
10 |
+
class BatchHumanizer:
|
11 |
+
def __init__(self):
|
12 |
+
try:
|
13 |
+
self.model_name = "Vamsi/T5_Paraphrase_Paws"
|
14 |
+
self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
|
15 |
+
self.model = AutoModelForSeq2SeqLM.from_pretrained(self.model_name)
|
16 |
+
print("✅ Batch Humanizer model loaded successfully")
|
17 |
+
except Exception as e:
|
18 |
+
print(f"❌ Error loading model: {e}")
|
19 |
+
self.tokenizer = None
|
20 |
+
self.model = None
|
21 |
+
|
22 |
+
def humanize_single_text(self, text, strength="medium"):
|
23 |
+
"""Humanize a single piece of text"""
|
24 |
+
if not self.model or not self.tokenizer:
|
25 |
+
return self.fallback_humanize(text)
|
26 |
+
|
27 |
+
try:
|
28 |
+
# Paraphrase using T5
|
29 |
+
input_text = f"paraphrase: {text}"
|
30 |
+
input_ids = self.tokenizer.encode(
|
31 |
+
input_text,
|
32 |
+
return_tensors="pt",
|
33 |
+
max_length=512,
|
34 |
+
truncation=True
|
35 |
+
)
|
36 |
+
|
37 |
+
# Adjust parameters based on strength
|
38 |
+
if strength == "light":
|
39 |
+
temp, top_p = 1.1, 0.9
|
40 |
+
elif strength == "heavy":
|
41 |
+
temp, top_p = 1.5, 0.95
|
42 |
+
else: # medium
|
43 |
+
temp, top_p = 1.3, 0.92
|
44 |
+
|
45 |
+
with torch.no_grad():
|
46 |
+
outputs = self.model.generate(
|
47 |
+
input_ids=input_ids,
|
48 |
+
max_length=min(len(text.split()) + 50, 512),
|
49 |
+
num_beams=5,
|
50 |
+
temperature=temp,
|
51 |
+
top_p=top_p,
|
52 |
+
do_sample=True,
|
53 |
+
early_stopping=True,
|
54 |
+
repetition_penalty=1.2
|
55 |
+
)
|
56 |
+
|
57 |
+
result = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
|
58 |
+
|
59 |
+
# Additional humanization
|
60 |
+
if strength in ["medium", "heavy"]:
|
61 |
+
result = self.add_natural_variations(result)
|
62 |
+
|
63 |
+
return self.clean_text(result) if result and len(result) > 10 else text
|
64 |
+
|
65 |
+
except Exception as e:
|
66 |
+
print(f"Error humanizing text: {e}")
|
67 |
+
return self.fallback_humanize(text)
|
68 |
+
|
69 |
+
def fallback_humanize(self, text):
|
70 |
+
"""Simple fallback humanization without model"""
|
71 |
+
# Basic word replacements
|
72 |
+
replacements = {
|
73 |
+
"utilize": "use", "demonstrate": "show", "facilitate": "help",
|
74 |
+
"optimize": "improve", "implement": "apply", "generate": "create",
|
75 |
+
"therefore": "thus", "however": "yet", "furthermore": "also"
|
76 |
+
}
|
77 |
+
|
78 |
+
result = text
|
79 |
+
for old, new in replacements.items():
|
80 |
+
result = re.sub(r'\b' + old + r'\b', new, result, flags=re.IGNORECASE)
|
81 |
+
|
82 |
+
return result
|
83 |
+
|
84 |
+
def add_natural_variations(self, text):
|
85 |
+
"""Add natural language variations"""
|
86 |
+
# Academic connectors
|
87 |
+
connectors = [
|
88 |
+
"Moreover", "Furthermore", "Additionally", "In contrast",
|
89 |
+
"Similarly", "Consequently", "Nevertheless", "Notably"
|
90 |
+
]
|
91 |
+
|
92 |
+
sentences = text.split('.')
|
93 |
+
varied = []
|
94 |
+
|
95 |
+
for i, sentence in enumerate(sentences):
|
96 |
+
sentence = sentence.strip()
|
97 |
+
if not sentence:
|
98 |
+
continue
|
99 |
+
|
100 |
+
# Sometimes add connectors
|
101 |
+
if i > 0 and random.random() < 0.2:
|
102 |
+
connector = random.choice(connectors)
|
103 |
+
sentence = f"{connector}, {sentence.lower()}"
|
104 |
+
|
105 |
+
varied.append(sentence)
|
106 |
+
|
107 |
+
return '. '.join(varied) + '.' if varied else text
|
108 |
+
|
109 |
+
def clean_text(self, text):
|
110 |
+
"""Clean and format text"""
|
111 |
+
# Remove extra spaces
|
112 |
+
text = re.sub(r'\s+', ' ', text)
|
113 |
+
text = re.sub(r'\s+([.!?,:;])', r'\1', text)
|
114 |
+
|
115 |
+
# Capitalize sentences
|
116 |
+
sentences = text.split('. ')
|
117 |
+
formatted = []
|
118 |
+
for sentence in sentences:
|
119 |
+
sentence = sentence.strip()
|
120 |
+
if sentence:
|
121 |
+
sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
|
122 |
+
formatted.append(sentence)
|
123 |
+
|
124 |
+
result = '. '.join(formatted)
|
125 |
+
if not result.endswith(('.', '!', '?')):
|
126 |
+
result += '.'
|
127 |
+
|
128 |
+
return result
|
129 |
+
|
130 |
+
# Initialize humanizer
|
131 |
+
batch_humanizer = BatchHumanizer()
|
132 |
+
|
133 |
+
def process_text_input(text_input, strength):
|
134 |
+
"""Process single text input"""
|
135 |
+
if not text_input or not text_input.strip():
|
136 |
+
return "Please enter some text to humanize."
|
137 |
+
|
138 |
+
return batch_humanizer.humanize_single_text(text_input, strength.lower())
|
139 |
+
|
140 |
+
def process_file_upload(file, strength):
|
141 |
+
"""Process uploaded file"""
|
142 |
+
if file is None:
|
143 |
+
return "Please upload a file.", None
|
144 |
+
|
145 |
+
try:
|
146 |
+
# Read the file
|
147 |
+
if file.name.endswith('.txt'):
|
148 |
+
with open(file.name, 'r', encoding='utf-8') as f:
|
149 |
+
content = f.read()
|
150 |
+
|
151 |
+
# Split into paragraphs or sentences for processing
|
152 |
+
paragraphs = [p.strip() for p in content.split('\n\n') if p.strip()]
|
153 |
+
|
154 |
+
humanized_paragraphs = []
|
155 |
+
for para in paragraphs:
|
156 |
+
if len(para) > 50: # Only process substantial paragraphs
|
157 |
+
humanized = batch_humanizer.humanize_single_text(para, strength.lower())
|
158 |
+
humanized_paragraphs.append(humanized)
|
159 |
+
else:
|
160 |
+
humanized_paragraphs.append(para)
|
161 |
+
|
162 |
+
result = '\n\n'.join(humanized_paragraphs)
|
163 |
+
|
164 |
+
# Save to new file
|
165 |
+
output_filename = file.name.replace('.txt', '_humanized.txt')
|
166 |
+
with open(output_filename, 'w', encoding='utf-8') as f:
|
167 |
+
f.write(result)
|
168 |
+
|
169 |
+
return result, output_filename
|
170 |
+
|
171 |
+
elif file.name.endswith('.csv'):
|
172 |
+
df = pd.read_csv(file.name)
|
173 |
+
|
174 |
+
# Assume the text column is named 'text' or the first column
|
175 |
+
text_column = 'text' if 'text' in df.columns else df.columns[0]
|
176 |
+
|
177 |
+
# Humanize each text entry
|
178 |
+
df['humanized'] = df[text_column].apply(
|
179 |
+
lambda x: batch_humanizer.humanize_single_text(str(x), strength.lower()) if pd.notna(x) else x
|
180 |
+
)
|
181 |
+
|
182 |
+
# Save to new CSV
|
183 |
+
output_filename = file.name.replace('.csv', '_humanized.csv')
|
184 |
+
df.to_csv(output_filename, index=False)
|
185 |
+
|
186 |
+
return f"Processed {len(df)} entries. Check the 'humanized' column.", output_filename
|
187 |
+
|
188 |
+
else:
|
189 |
+
return "Unsupported file format. Please upload .txt or .csv files.", None
|
190 |
+
|
191 |
+
except Exception as e:
|
192 |
+
return f"Error processing file: {str(e)}", None
|
193 |
+
|
194 |
+
# Create Gradio interface with tabs
|
195 |
+
with gr.Blocks(theme="soft", title="AI Text Humanizer Pro") as demo:
|
196 |
+
gr.Markdown("""
|
197 |
+
# 🤖➡️👨 AI Text Humanizer Pro
|
198 |
+
|
199 |
+
**Advanced tool to transform robotic AI-generated text into natural, human-like writing**
|
200 |
+
|
201 |
+
Perfect for:
|
202 |
+
- 📝 Academic papers and essays
|
203 |
+
- 📊 Research reports
|
204 |
+
- 📄 Business documents
|
205 |
+
- 💼 Professional content
|
206 |
+
- 🔍 Bypassing AI detection tools
|
207 |
+
""")
|
208 |
+
|
209 |
+
with gr.Tabs():
|
210 |
+
# Single Text Tab
|
211 |
+
with gr.TabItem("Single Text"):
|
212 |
+
gr.Markdown("### Humanize Individual Text")
|
213 |
+
|
214 |
+
with gr.Row():
|
215 |
+
with gr.Column(scale=2):
|
216 |
+
text_input = gr.Textbox(
|
217 |
+
lines=12,
|
218 |
+
placeholder="Paste your AI-generated text here...",
|
219 |
+
label="Input Text",
|
220 |
+
info="Enter the text you want to humanize"
|
221 |
+
)
|
222 |
+
|
223 |
+
strength_single = gr.Radio(
|
224 |
+
choices=["Light", "Medium", "Heavy"],
|
225 |
+
value="Medium",
|
226 |
+
label="Humanization Strength"
|
227 |
+
)
|
228 |
+
|
229 |
+
process_btn = gr.Button("🚀 Humanize Text", variant="primary")
|
230 |
+
|
231 |
+
with gr.Column(scale=2):
|
232 |
+
text_output = gr.Textbox(
|
233 |
+
lines=12,
|
234 |
+
label="Humanized Output",
|
235 |
+
show_copy_button=True
|
236 |
+
)
|
237 |
+
|
238 |
+
# Examples
|
239 |
+
gr.Examples(
|
240 |
+
examples=[
|
241 |
+
["The implementation of artificial intelligence algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.", "Medium"],
|
242 |
+
["Machine learning models exhibit superior performance characteristics when evaluated against traditional statistical approaches in predictive analytics applications.", "Heavy"],
|
243 |
+
["The research methodology utilized in this study involves comprehensive data collection and analysis procedures to ensure robust and reliable results.", "Light"]
|
244 |
+
],
|
245 |
+
inputs=[text_input, strength_single],
|
246 |
+
outputs=text_output,
|
247 |
+
fn=process_text_input
|
248 |
+
)
|
249 |
+
|
250 |
+
# Batch Processing Tab
|
251 |
+
with gr.TabItem("Batch Processing"):
|
252 |
+
gr.Markdown("### Process Files in Batch")
|
253 |
+
gr.Markdown("Upload .txt or .csv files to humanize multiple texts at once")
|
254 |
+
|
255 |
+
with gr.Row():
|
256 |
+
with gr.Column():
|
257 |
+
file_input = gr.File(
|
258 |
+
label="Upload File (.txt or .csv)",
|
259 |
+
file_types=[".txt", ".csv"]
|
260 |
+
)
|
261 |
+
|
262 |
+
strength_batch = gr.Radio(
|
263 |
+
choices=["Light", "Medium", "Heavy"],
|
264 |
+
value="Medium",
|
265 |
+
label="Humanization Strength"
|
266 |
+
)
|
267 |
+
|
268 |
+
process_file_btn = gr.Button("🔄 Process File", variant="primary")
|
269 |
+
|
270 |
+
with gr.Column():
|
271 |
+
file_output = gr.Textbox(
|
272 |
+
lines=10,
|
273 |
+
label="Processing Results",
|
274 |
+
show_copy_button=True
|
275 |
+
)
|
276 |
+
|
277 |
+
download_file = gr.File(
|
278 |
+
label="Download Processed File",
|
279 |
+
visible=False
|
280 |
+
)
|
281 |
+
|
282 |
+
# Settings Tab
|
283 |
+
with gr.TabItem("Settings & Info"):
|
284 |
+
gr.Markdown("""
|
285 |
+
### How it works:
|
286 |
+
|
287 |
+
1. **Light Humanization**: Basic paraphrasing with minimal changes
|
288 |
+
2. **Medium Humanization**: Paraphrasing + vocabulary variations
|
289 |
+
3. **Heavy Humanization**: All techniques + sentence structure changes
|
290 |
+
|
291 |
+
### Features:
|
292 |
+
- ✅ Advanced T5-based paraphrasing
|
293 |
+
- ✅ Natural vocabulary diversification
|
294 |
+
- ✅ Sentence structure optimization
|
295 |
+
- ✅ Academic tone preservation
|
296 |
+
- ✅ Batch file processing
|
297 |
+
- ✅ Multiple output formats
|
298 |
+
|
299 |
+
### Supported Formats:
|
300 |
+
- **Text files (.txt)**: Processes paragraph by paragraph
|
301 |
+
- **CSV files (.csv)**: Adds 'humanized' column with processed text
|
302 |
+
|
303 |
+
### Tips for best results:
|
304 |
+
- Use complete sentences and paragraphs
|
305 |
+
- Avoid very short fragments
|
306 |
+
- Choose appropriate humanization strength
|
307 |
+
- Review output for context accuracy
|
308 |
+
""")
|
309 |
+
|
310 |
+
# Event handlers
|
311 |
+
process_btn.click(
|
312 |
+
fn=process_text_input,
|
313 |
+
inputs=[text_input, strength_single],
|
314 |
+
outputs=text_output
|
315 |
+
)
|
316 |
+
|
317 |
+
process_file_btn.click(
|
318 |
+
fn=process_file_upload,
|
319 |
+
inputs=[file_input, strength_batch],
|
320 |
+
outputs=[file_output, download_file]
|
321 |
+
)
|
322 |
+
|
323 |
+
if __name__ == "__main__":
|
324 |
+
demo.launch(
|
325 |
+
share=False,
|
326 |
+
server_name="0.0.0.0",
|
327 |
+
server_port=7862,
|
328 |
+
debug=True
|
329 |
+
)
|
humanizer_robust.py
ADDED
@@ -0,0 +1,300 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import random
|
3 |
+
import re
|
4 |
+
import warnings
|
5 |
+
warnings.filterwarnings("ignore")
|
6 |
+
|
7 |
+
class RobustHumanizer:
|
8 |
+
def __init__(self):
|
9 |
+
"""Initialize with robust fallback techniques that don't require external models"""
|
10 |
+
self.academic_replacements = {
|
11 |
+
# Common AI patterns to humanize
|
12 |
+
"demonstrates": ["shows", "reveals", "indicates", "illustrates", "displays"],
|
13 |
+
"significant": ["notable", "considerable", "substantial", "important", "remarkable"],
|
14 |
+
"utilize": ["use", "employ", "apply", "implement", "make use of"],
|
15 |
+
"implement": ["apply", "use", "put into practice", "carry out", "execute"],
|
16 |
+
"generate": ["create", "produce", "develop", "form", "make"],
|
17 |
+
"facilitate": ["help", "enable", "assist", "support", "aid"],
|
18 |
+
"optimize": ["improve", "enhance", "refine", "perfect", "better"],
|
19 |
+
"analyze": ["examine", "study", "investigate", "assess", "evaluate"],
|
20 |
+
"therefore": ["thus", "hence", "consequently", "as a result", "for this reason"],
|
21 |
+
"however": ["nevertheless", "nonetheless", "yet", "on the other hand", "but"],
|
22 |
+
"furthermore": ["moreover", "additionally", "in addition", "what is more", "besides"],
|
23 |
+
"substantial": ["significant", "considerable", "notable", "important", "major"],
|
24 |
+
"subsequently": ["later", "then", "afterward", "following this", "next"],
|
25 |
+
"approximately": ["about", "roughly", "around", "nearly", "close to"],
|
26 |
+
"numerous": ["many", "several", "multiple", "various", "a number of"],
|
27 |
+
"encompasses": ["includes", "covers", "contains", "involves", "comprises"],
|
28 |
+
"methodology": ["method", "approach", "technique", "procedure", "process"],
|
29 |
+
"comprehensive": ["complete", "thorough", "extensive", "detailed", "full"],
|
30 |
+
"indicates": ["shows", "suggests", "points to", "reveals", "demonstrates"],
|
31 |
+
"established": ["set up", "created", "formed", "developed", "built"]
|
32 |
+
}
|
33 |
+
|
34 |
+
self.sentence_starters = [
|
35 |
+
"Notably,", "Importantly,", "Significantly,", "Interestingly,",
|
36 |
+
"Furthermore,", "Moreover,", "Additionally,", "In contrast,",
|
37 |
+
"Similarly,", "Nevertheless,", "Consequently,", "As a result,",
|
38 |
+
"In particular,", "Specifically,", "Generally,", "Typically,"
|
39 |
+
]
|
40 |
+
|
41 |
+
self.hedging_phrases = [
|
42 |
+
"appears to", "seems to", "tends to", "suggests that", "indicates that",
|
43 |
+
"may well", "might be", "could be", "potentially", "presumably",
|
44 |
+
"arguably", "to some extent", "in many cases", "generally speaking",
|
45 |
+
"it is likely that", "evidence suggests", "research indicates"
|
46 |
+
]
|
47 |
+
|
48 |
+
self.connecting_phrases = [
|
49 |
+
"In light of this", "Building upon this", "This finding suggests",
|
50 |
+
"It is worth noting that", "This observation", "These results",
|
51 |
+
"The evidence indicates", "This approach", "The data reveals",
|
52 |
+
"Research shows", "Studies demonstrate", "Analysis reveals"
|
53 |
+
]
|
54 |
+
|
55 |
+
def split_into_sentences(self, text):
|
56 |
+
"""Simple sentence splitting"""
|
57 |
+
# Split by periods, but be careful with abbreviations
|
58 |
+
sentences = []
|
59 |
+
current = ""
|
60 |
+
|
61 |
+
for char in text:
|
62 |
+
current += char
|
63 |
+
if char == '.' and len(current) > 10:
|
64 |
+
# Check if this looks like end of sentence
|
65 |
+
next_chars = text[text.find(current) + len(current):text.find(current) + len(current) + 3]
|
66 |
+
if next_chars.strip() and (next_chars[0].isupper() or next_chars.strip()[0].isupper()):
|
67 |
+
sentences.append(current.strip())
|
68 |
+
current = ""
|
69 |
+
|
70 |
+
if current.strip():
|
71 |
+
sentences.append(current.strip())
|
72 |
+
|
73 |
+
return [s for s in sentences if len(s.strip()) > 5]
|
74 |
+
|
75 |
+
def vary_vocabulary(self, text):
|
76 |
+
"""Replace words with alternatives"""
|
77 |
+
result = text
|
78 |
+
|
79 |
+
for original, alternatives in self.academic_replacements.items():
|
80 |
+
if original.lower() in result.lower():
|
81 |
+
replacement = random.choice(alternatives)
|
82 |
+
# Case-sensitive replacement
|
83 |
+
pattern = re.compile(re.escape(original), re.IGNORECASE)
|
84 |
+
result = pattern.sub(replacement, result, count=1)
|
85 |
+
|
86 |
+
return result
|
87 |
+
|
88 |
+
def add_natural_flow(self, text):
|
89 |
+
"""Add natural academic connectors and hedging"""
|
90 |
+
sentences = self.split_into_sentences(text)
|
91 |
+
if not sentences:
|
92 |
+
return text
|
93 |
+
|
94 |
+
modified_sentences = []
|
95 |
+
|
96 |
+
for i, sentence in enumerate(sentences):
|
97 |
+
sentence = sentence.strip()
|
98 |
+
if not sentence:
|
99 |
+
continue
|
100 |
+
|
101 |
+
# Add hedging to some sentences
|
102 |
+
if random.random() < 0.3 and not any(hedge in sentence.lower() for hedge in self.hedging_phrases):
|
103 |
+
if sentence.lower().startswith(('the ', 'this ', 'these ', 'that ')):
|
104 |
+
hedge = random.choice(self.hedging_phrases)
|
105 |
+
words = sentence.split()
|
106 |
+
if len(words) > 2:
|
107 |
+
words.insert(2, hedge)
|
108 |
+
sentence = " ".join(words)
|
109 |
+
|
110 |
+
# Add connecting phrases for flow
|
111 |
+
if i > 0 and random.random() < 0.4:
|
112 |
+
connector = random.choice(self.connecting_phrases)
|
113 |
+
sentence = f"{connector}, {sentence.lower()}"
|
114 |
+
|
115 |
+
# Sometimes start with variety
|
116 |
+
elif i > 0 and random.random() < 0.2:
|
117 |
+
starter = random.choice(self.sentence_starters)
|
118 |
+
sentence = f"{starter} {sentence.lower()}"
|
119 |
+
|
120 |
+
modified_sentences.append(sentence)
|
121 |
+
|
122 |
+
return " ".join(modified_sentences)
|
123 |
+
|
124 |
+
def restructure_sentences(self, text):
|
125 |
+
"""Modify sentence structures for variety"""
|
126 |
+
sentences = self.split_into_sentences(text)
|
127 |
+
restructured = []
|
128 |
+
|
129 |
+
for sentence in sentences:
|
130 |
+
words = sentence.split()
|
131 |
+
|
132 |
+
# For long sentences, sometimes break them up
|
133 |
+
if len(words) > 25 and random.random() < 0.5:
|
134 |
+
# Find a good break point
|
135 |
+
break_words = ['and', 'but', 'which', 'that', 'because', 'since', 'while']
|
136 |
+
for i, word in enumerate(words[10:20], 10): # Look in middle section
|
137 |
+
if word.lower() in break_words:
|
138 |
+
part1 = " ".join(words[:i]) + "."
|
139 |
+
part2 = " ".join(words[i+1:])
|
140 |
+
if len(part2) > 10: # Only if second part is substantial
|
141 |
+
part2 = part2[0].upper() + part2[1:] if part2 else part2
|
142 |
+
restructured.extend([part1, part2])
|
143 |
+
break
|
144 |
+
else:
|
145 |
+
restructured.append(sentence)
|
146 |
+
else:
|
147 |
+
restructured.append(sentence)
|
148 |
+
|
149 |
+
return " ".join(restructured)
|
150 |
+
|
151 |
+
def clean_and_format(self, text):
|
152 |
+
"""Clean up the text formatting"""
|
153 |
+
# Remove extra spaces
|
154 |
+
text = re.sub(r'\s+', ' ', text)
|
155 |
+
text = re.sub(r'\s+([.,!?;:])', r'\1', text)
|
156 |
+
|
157 |
+
# Fix capitalization
|
158 |
+
sentences = self.split_into_sentences(text)
|
159 |
+
formatted = []
|
160 |
+
|
161 |
+
for sentence in sentences:
|
162 |
+
sentence = sentence.strip()
|
163 |
+
if sentence:
|
164 |
+
# Capitalize first letter
|
165 |
+
sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
|
166 |
+
|
167 |
+
# Ensure proper ending
|
168 |
+
if not sentence.endswith(('.', '!', '?')):
|
169 |
+
sentence += '.'
|
170 |
+
|
171 |
+
formatted.append(sentence)
|
172 |
+
|
173 |
+
return " ".join(formatted)
|
174 |
+
|
175 |
+
def humanize_text(self, text, intensity="medium"):
|
176 |
+
"""Main humanization function"""
|
177 |
+
if not text or len(text.strip()) < 10:
|
178 |
+
return "Please enter substantial text to humanize (at least 10 characters)."
|
179 |
+
|
180 |
+
result = text.strip()
|
181 |
+
|
182 |
+
try:
|
183 |
+
# Apply different levels of humanization
|
184 |
+
if intensity.lower() in ["light", "low"]:
|
185 |
+
# Just vocabulary changes
|
186 |
+
result = self.vary_vocabulary(result)
|
187 |
+
|
188 |
+
elif intensity.lower() in ["medium", "moderate"]:
|
189 |
+
# Vocabulary + natural flow
|
190 |
+
result = self.vary_vocabulary(result)
|
191 |
+
result = self.add_natural_flow(result)
|
192 |
+
|
193 |
+
elif intensity.lower() in ["heavy", "high", "maximum"]:
|
194 |
+
# All techniques
|
195 |
+
result = self.vary_vocabulary(result)
|
196 |
+
result = self.add_natural_flow(result)
|
197 |
+
result = self.restructure_sentences(result)
|
198 |
+
|
199 |
+
# Always clean up formatting
|
200 |
+
result = self.clean_and_format(result)
|
201 |
+
|
202 |
+
return result if result and len(result) > 10 else text
|
203 |
+
|
204 |
+
except Exception as e:
|
205 |
+
print(f"Humanization error: {e}")
|
206 |
+
return f"Error processing text. Please try again with different input."
|
207 |
+
|
208 |
+
# Initialize the humanizer
|
209 |
+
humanizer = RobustHumanizer()
|
210 |
+
|
211 |
+
def process_text(input_text, humanization_level):
|
212 |
+
"""Process the input text"""
|
213 |
+
return humanizer.humanize_text(input_text, humanization_level)
|
214 |
+
|
215 |
+
# Create Gradio interface
|
216 |
+
demo = gr.Interface(
|
217 |
+
fn=process_text,
|
218 |
+
inputs=[
|
219 |
+
gr.Textbox(
|
220 |
+
lines=12,
|
221 |
+
placeholder="Paste your AI-generated or robotic text here...\n\nExample: 'The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.'",
|
222 |
+
label="Input Text",
|
223 |
+
info="Enter the text you want to make more natural and human-like"
|
224 |
+
),
|
225 |
+
gr.Radio(
|
226 |
+
choices=["Light", "Medium", "Heavy"],
|
227 |
+
value="Medium",
|
228 |
+
label="Humanization Intensity",
|
229 |
+
info="Light: Basic vocabulary changes | Medium: + Natural flow | Heavy: + Sentence restructuring"
|
230 |
+
)
|
231 |
+
],
|
232 |
+
outputs=gr.Textbox(
|
233 |
+
label="Humanized Output",
|
234 |
+
lines=12,
|
235 |
+
show_copy_button=True,
|
236 |
+
info="Copy this natural, human-like text"
|
237 |
+
),
|
238 |
+
title="🤖➡️👨 Robust AI Text Humanizer",
|
239 |
+
description="""
|
240 |
+
**Transform robotic AI text into natural, human-like academic writing**
|
241 |
+
|
242 |
+
This tool uses advanced linguistic techniques to make AI-generated text sound more natural and human-like.
|
243 |
+
Perfect for academic papers, research reports, essays, and professional documents.
|
244 |
+
|
245 |
+
✅ **No external dependencies** - Always works
|
246 |
+
✅ **Advanced vocabulary variation** - Natural word choices
|
247 |
+
✅ **Sentence flow optimization** - Smooth transitions
|
248 |
+
✅ **Academic tone preservation** - Maintains credibility
|
249 |
+
✅ **Structure diversification** - Varied sentence patterns
|
250 |
+
✅ **Natural connectors** - Academic linking phrases
|
251 |
+
""",
|
252 |
+
examples=[
|
253 |
+
[
|
254 |
+
"The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets. These results indicate that the optimization of neural network architectures can facilitate enhanced performance in predictive analytics applications.",
|
255 |
+
"Medium"
|
256 |
+
],
|
257 |
+
[
|
258 |
+
"Artificial intelligence technologies are increasingly being utilized across numerous industries to optimize operational processes and generate innovative solutions. The comprehensive analysis of these systems reveals substantial benefits in terms of efficiency and accuracy.",
|
259 |
+
"Heavy"
|
260 |
+
],
|
261 |
+
[
|
262 |
+
"The research methodology encompasses a systematic approach to data collection and analysis, utilizing advanced statistical techniques to ensure robust and reliable results that demonstrate the effectiveness of the proposed framework.",
|
263 |
+
"Light"
|
264 |
+
]
|
265 |
+
],
|
266 |
+
theme="soft",
|
267 |
+
css="""
|
268 |
+
.gradio-container {
|
269 |
+
max-width: 1200px !important;
|
270 |
+
}
|
271 |
+
""",
|
272 |
+
article="""
|
273 |
+
### 🎯 **How to Use:**
|
274 |
+
1. **Paste your AI-generated text** in the input box
|
275 |
+
2. **Choose intensity level** based on how much change you want
|
276 |
+
3. **Click Submit** and get natural, human-like output
|
277 |
+
4. **Copy the result** and use it in your work
|
278 |
+
|
279 |
+
### 💡 **Pro Tips:**
|
280 |
+
- Use **Light** for minimal changes while preserving original structure
|
281 |
+
- Use **Medium** for balanced humanization with natural flow
|
282 |
+
- Use **Heavy** for maximum transformation and sentence variety
|
283 |
+
- Always review the output to ensure it maintains your intended meaning
|
284 |
+
- For best results, input complete sentences and paragraphs
|
285 |
+
|
286 |
+
### ⚖️ **Ethical Usage:**
|
287 |
+
This tool is designed to improve writing quality and natural expression.
|
288 |
+
Please use responsibly and maintain academic integrity.
|
289 |
+
""",
|
290 |
+
allow_flagging="never"
|
291 |
+
)
|
292 |
+
|
293 |
+
if __name__ == "__main__":
|
294 |
+
demo.launch(
|
295 |
+
share=False,
|
296 |
+
server_name="127.0.0.1",
|
297 |
+
server_port=7862,
|
298 |
+
debug=True,
|
299 |
+
show_error=True
|
300 |
+
)
|
humanizer_simple.py
ADDED
@@ -0,0 +1,249 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
3 |
+
import torch
|
4 |
+
import random
|
5 |
+
import re
|
6 |
+
import warnings
|
7 |
+
warnings.filterwarnings("ignore")
|
8 |
+
|
9 |
+
class SimpleHumanizer:
|
10 |
+
def __init__(self):
|
11 |
+
# Load a reliable T5 model for paraphrasing
|
12 |
+
try:
|
13 |
+
self.model_name = "Vamsi/T5_Paraphrase_Paws"
|
14 |
+
self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
|
15 |
+
self.model = AutoModelForSeq2SeqLM.from_pretrained(self.model_name)
|
16 |
+
print("✅ Model loaded successfully")
|
17 |
+
except Exception as e:
|
18 |
+
print(f"❌ Error loading model: {e}")
|
19 |
+
self.tokenizer = None
|
20 |
+
self.model = None
|
21 |
+
|
22 |
+
def add_variations(self, text):
|
23 |
+
"""Add simple variations to make text more natural"""
|
24 |
+
# Common academic phrase variations
|
25 |
+
replacements = {
|
26 |
+
"shows that": ["demonstrates that", "indicates that", "reveals that", "suggests that"],
|
27 |
+
"results in": ["leads to", "causes", "produces", "generates"],
|
28 |
+
"due to": ["because of", "owing to", "as a result of", "on account of"],
|
29 |
+
"in order to": ["to", "so as to", "with the aim of", "for the purpose of"],
|
30 |
+
"as well as": ["and", "along with", "together with", "in addition to"],
|
31 |
+
"therefore": ["thus", "hence", "consequently", "as a result"],
|
32 |
+
"however": ["nevertheless", "nonetheless", "on the other hand", "yet"],
|
33 |
+
"furthermore": ["moreover", "additionally", "in addition", "what is more"],
|
34 |
+
"significant": ["notable", "considerable", "substantial", "important"],
|
35 |
+
"important": ["crucial", "vital", "essential", "key"],
|
36 |
+
"analyze": ["examine", "investigate", "study", "assess"],
|
37 |
+
"demonstrate": ["show", "illustrate", "reveal", "display"],
|
38 |
+
"utilize": ["use", "employ", "apply", "implement"]
|
39 |
+
}
|
40 |
+
|
41 |
+
result = text
|
42 |
+
for original, alternatives in replacements.items():
|
43 |
+
if original in result.lower():
|
44 |
+
replacement = random.choice(alternatives)
|
45 |
+
# Replace with case matching
|
46 |
+
pattern = re.compile(re.escape(original), re.IGNORECASE)
|
47 |
+
result = pattern.sub(replacement, result, count=1)
|
48 |
+
|
49 |
+
return result
|
50 |
+
|
51 |
+
def vary_sentence_structure(self, text):
|
52 |
+
"""Simple sentence structure variations"""
|
53 |
+
sentences = text.split('.')
|
54 |
+
varied = []
|
55 |
+
|
56 |
+
for sentence in sentences:
|
57 |
+
sentence = sentence.strip()
|
58 |
+
if not sentence:
|
59 |
+
continue
|
60 |
+
|
61 |
+
# Add some variety to sentence starters
|
62 |
+
if random.random() < 0.3:
|
63 |
+
starters = ["Notably, ", "Importantly, ", "Significantly, ", "Interestingly, "]
|
64 |
+
if not any(sentence.startswith(s.strip()) for s in starters):
|
65 |
+
sentence = random.choice(starters) + sentence.lower()
|
66 |
+
|
67 |
+
varied.append(sentence)
|
68 |
+
|
69 |
+
return '. '.join(varied) + '.'
|
70 |
+
|
71 |
+
def paraphrase_text(self, text):
|
72 |
+
"""Paraphrase using T5 model"""
|
73 |
+
if not self.model or not self.tokenizer:
|
74 |
+
return text
|
75 |
+
|
76 |
+
try:
|
77 |
+
# Split long text into chunks
|
78 |
+
max_length = 400
|
79 |
+
if len(text) > max_length:
|
80 |
+
sentences = text.split('.')
|
81 |
+
chunks = []
|
82 |
+
current_chunk = ""
|
83 |
+
|
84 |
+
for sentence in sentences:
|
85 |
+
if len(current_chunk + sentence) < max_length:
|
86 |
+
current_chunk += sentence + "."
|
87 |
+
else:
|
88 |
+
if current_chunk:
|
89 |
+
chunks.append(current_chunk.strip())
|
90 |
+
current_chunk = sentence + "."
|
91 |
+
|
92 |
+
if current_chunk:
|
93 |
+
chunks.append(current_chunk.strip())
|
94 |
+
|
95 |
+
paraphrased_chunks = []
|
96 |
+
for chunk in chunks:
|
97 |
+
para = self._paraphrase_chunk(chunk)
|
98 |
+
paraphrased_chunks.append(para)
|
99 |
+
|
100 |
+
return " ".join(paraphrased_chunks)
|
101 |
+
else:
|
102 |
+
return self._paraphrase_chunk(text)
|
103 |
+
|
104 |
+
except Exception as e:
|
105 |
+
print(f"Paraphrasing error: {e}")
|
106 |
+
return text
|
107 |
+
|
108 |
+
def _paraphrase_chunk(self, text):
|
109 |
+
"""Paraphrase a single chunk"""
|
110 |
+
try:
|
111 |
+
# Prepare input
|
112 |
+
input_text = f"paraphrase: {text}"
|
113 |
+
input_ids = self.tokenizer.encode(
|
114 |
+
input_text,
|
115 |
+
return_tensors="pt",
|
116 |
+
max_length=512,
|
117 |
+
truncation=True
|
118 |
+
)
|
119 |
+
|
120 |
+
# Generate paraphrase
|
121 |
+
with torch.no_grad():
|
122 |
+
outputs = self.model.generate(
|
123 |
+
input_ids=input_ids,
|
124 |
+
max_length=min(len(text.split()) + 50, 512),
|
125 |
+
num_beams=5,
|
126 |
+
num_return_sequences=1,
|
127 |
+
temperature=1.3,
|
128 |
+
top_k=50,
|
129 |
+
top_p=0.95,
|
130 |
+
do_sample=True,
|
131 |
+
early_stopping=True,
|
132 |
+
repetition_penalty=1.2
|
133 |
+
)
|
134 |
+
|
135 |
+
# Decode result
|
136 |
+
paraphrased = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
|
137 |
+
|
138 |
+
# Clean up the result
|
139 |
+
paraphrased = paraphrased.strip()
|
140 |
+
if paraphrased and len(paraphrased) > 10:
|
141 |
+
return paraphrased
|
142 |
+
else:
|
143 |
+
return text
|
144 |
+
|
145 |
+
except Exception as e:
|
146 |
+
print(f"Chunk paraphrasing error: {e}")
|
147 |
+
return text
|
148 |
+
|
149 |
+
# Initialize humanizer
|
150 |
+
humanizer = SimpleHumanizer()
|
151 |
+
|
152 |
+
def humanize_text(input_text, complexity="Medium"):
|
153 |
+
"""Main humanization function"""
|
154 |
+
if not input_text or not input_text.strip():
|
155 |
+
return "Please enter some text to humanize."
|
156 |
+
|
157 |
+
try:
|
158 |
+
# Step 1: Paraphrase the text
|
159 |
+
result = humanizer.paraphrase_text(input_text)
|
160 |
+
|
161 |
+
# Step 2: Add variations based on complexity
|
162 |
+
if complexity in ["Medium", "High"]:
|
163 |
+
result = humanizer.add_variations(result)
|
164 |
+
|
165 |
+
if complexity == "High":
|
166 |
+
result = humanizer.vary_sentence_structure(result)
|
167 |
+
|
168 |
+
# Step 3: Clean up formatting
|
169 |
+
result = re.sub(r'\s+', ' ', result)
|
170 |
+
result = re.sub(r'\s+([.!?,:;])', r'\1', result)
|
171 |
+
|
172 |
+
# Ensure proper sentence capitalization
|
173 |
+
sentences = result.split('. ')
|
174 |
+
formatted_sentences = []
|
175 |
+
for i, sentence in enumerate(sentences):
|
176 |
+
sentence = sentence.strip()
|
177 |
+
if sentence:
|
178 |
+
# Capitalize first letter
|
179 |
+
sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
|
180 |
+
formatted_sentences.append(sentence)
|
181 |
+
|
182 |
+
result = '. '.join(formatted_sentences)
|
183 |
+
|
184 |
+
# Final cleanup
|
185 |
+
if not result.endswith('.') and not result.endswith('!') and not result.endswith('?'):
|
186 |
+
result += '.'
|
187 |
+
|
188 |
+
return result
|
189 |
+
|
190 |
+
except Exception as e:
|
191 |
+
print(f"Humanization error: {e}")
|
192 |
+
return f"Error processing text: {str(e)}"
|
193 |
+
|
194 |
+
# Create Gradio interface
|
195 |
+
demo = gr.Interface(
|
196 |
+
fn=humanize_text,
|
197 |
+
inputs=[
|
198 |
+
gr.Textbox(
|
199 |
+
lines=10,
|
200 |
+
placeholder="Paste your AI-generated or robotic text here...",
|
201 |
+
label="Input Text",
|
202 |
+
info="Enter the text you want to humanize"
|
203 |
+
),
|
204 |
+
gr.Radio(
|
205 |
+
choices=["Low", "Medium", "High"],
|
206 |
+
value="Medium",
|
207 |
+
label="Humanization Complexity",
|
208 |
+
info="Low: Basic paraphrasing | Medium: + Vocabulary variations | High: + Structure changes"
|
209 |
+
)
|
210 |
+
],
|
211 |
+
outputs=gr.Textbox(
|
212 |
+
label="Humanized Output",
|
213 |
+
lines=10,
|
214 |
+
show_copy_button=True
|
215 |
+
),
|
216 |
+
title="🤖➡️👨 AI Text Humanizer (Simple)",
|
217 |
+
description="""
|
218 |
+
**Transform robotic AI text into natural, human-like writing**
|
219 |
+
|
220 |
+
This tool uses advanced paraphrasing techniques to make AI-generated text sound more natural and human-like.
|
221 |
+
Perfect for academic papers, essays, reports, and any content that needs to pass AI detection tools.
|
222 |
+
|
223 |
+
**Features:**
|
224 |
+
✅ Advanced T5-based paraphrasing
|
225 |
+
✅ Vocabulary diversification
|
226 |
+
✅ Sentence structure optimization
|
227 |
+
✅ Academic tone preservation
|
228 |
+
✅ Natural flow enhancement
|
229 |
+
""",
|
230 |
+
examples=[
|
231 |
+
[
|
232 |
+
"The implementation of machine learning algorithms in data processing systems demonstrates significant improvements in efficiency and accuracy metrics.",
|
233 |
+
"Medium"
|
234 |
+
],
|
235 |
+
[
|
236 |
+
"Artificial intelligence technologies are increasingly being utilized across various industries to enhance operational capabilities and drive innovation.",
|
237 |
+
"High"
|
238 |
+
]
|
239 |
+
],
|
240 |
+
theme="soft"
|
241 |
+
)
|
242 |
+
|
243 |
+
if __name__ == "__main__":
|
244 |
+
demo.launch(
|
245 |
+
share=False,
|
246 |
+
server_name="0.0.0.0",
|
247 |
+
server_port=7861,
|
248 |
+
debug=True
|
249 |
+
)
|
requirements.txt
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio==4.44.0
|
2 |
+
transformers==4.35.0
|
3 |
+
torch==2.1.0
|
4 |
+
nltk==3.8.1
|
5 |
+
textstat==0.7.3
|
6 |
+
numpy==1.24.3
|
7 |
+
pandas==2.0.3
|
research_humanizer_dataset.csv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
input,target
|
2 |
+
The model is good and fast.,The proposed model exhibits strong performance and efficiency.
|
3 |
+
We tried some tests and they worked well.,"Several experiments were conducted, all of which demonstrated promising results."
|
4 |
+
This system gives better results than old ones.,This system outperforms traditional approaches in terms of accuracy and scalability.
|
5 |
+
The algorithm was run on many datasets.,The algorithm was evaluated using a diverse set of benchmark datasets.
|
6 |
+
We can say it works great.,These findings suggest the approach is both effective and reliable.
|
7 |
+
Our method is simple but it does the job.,Our approach is straightforward yet achieves the intended objectives.
|
8 |
+
"There are many problems, but we fixed them.","Several issues were encountered, all of which were systematically resolved."
|
9 |
+
The results are okay and show improvement.,The outcomes indicate measurable improvements over baseline methods.
|
10 |
+
We used some tools to help with this.,Auxiliary tools were employed to support the development process.
|
11 |
+
It shows better accuracy than others.,The approach demonstrates superior accuracy compared to existing methods.
|