SidddhantJain commited on
Commit
850a7ff
·
0 Parent(s):

Grason app was built for ai detection and humanizer

Browse files
README.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🤖➡️👨 AI Text Humanizer
2
+
3
+ An advanced tool to transform robotic, AI-generated text into natural, human-like writing that can bypass AI detection tools.
4
+
5
+ ## 🚀 Features
6
+
7
+ - **Multiple AI Models**: Uses T5 and Pegasus models for diverse paraphrasing
8
+ - **Advanced Techniques**: Vocabulary diversification, sentence restructuring, natural flow enhancement
9
+ - **Batch Processing**: Handle multiple texts and files at once
10
+ - **Academic Focus**: Preserves academic tone while making text more natural
11
+ - **Undetectable Output**: Creates human-like text that passes AI detection tools
12
+ - **Multiple Interfaces**: Simple, advanced, and batch processing versions
13
+
14
+ ## 📁 Files
15
+
16
+ 1. **`humanizer_app.py`** - Advanced version with multiple models and sophisticated techniques
17
+ 2. **`humanizer_simple.py`** - Simplified version with reliable single model
18
+ 3. **`humanizer_batch.py`** - Batch processing version for files and multiple texts
19
+
20
+ ## 🛠️ Installation
21
+
22
+ ### Prerequisites
23
+
24
+ 1. Python 3.8+ installed
25
+ 2. Virtual environment (recommended)
26
+
27
+ ### Setup
28
+
29
+ ```bash
30
+ # Clone or download the project
31
+ cd Humanizer
32
+
33
+ # Create virtual environment (if not already created)
34
+ python -m venv .venv
35
+
36
+ # Activate virtual environment
37
+ # Windows:
38
+ .venv\Scripts\activate
39
+ # Linux/Mac:
40
+ source .venv/bin/activate
41
+
42
+ # Install required packages
43
+ pip install gradio transformers torch tiktoken nltk textstat protobuf pandas
44
+
45
+ # Run the application
46
+ python humanizer_app.py # Advanced version
47
+ # OR
48
+ python humanizer_simple.py # Simple version
49
+ # OR
50
+ python humanizer_batch.py # Batch processing version
51
+ ```
52
+
53
+ ## 🎯 Usage
54
+
55
+ ### Basic Usage
56
+
57
+ 1. Run one of the Python files
58
+ 2. Open your browser to the displayed URL (usually http://127.0.0.1:7860)
59
+ 3. Paste your AI-generated text
60
+ 4. Select humanization level
61
+ 5. Click "Humanize" and get natural, human-like output
62
+
63
+ ### Humanization Levels
64
+
65
+ - **Light**: Basic paraphrasing with minimal changes
66
+ - **Moderate/Medium**: Paraphrasing + vocabulary variations + natural connectors
67
+ - **Heavy**: All techniques + sentence structure modifications + advanced variations
68
+
69
+ ### Batch Processing
70
+
71
+ The batch processor (`humanizer_batch.py`) supports:
72
+ - **.txt files**: Processes paragraph by paragraph
73
+ - **.csv files**: Adds a 'humanized' column with processed text
74
+
75
+ ## 🔧 How It Works
76
+
77
+ ### Advanced Techniques Used
78
+
79
+ 1. **Multi-Model Paraphrasing**: Uses multiple AI models to avoid patterns
80
+ 2. **Vocabulary Diversification**: Replaces words with contextual synonyms
81
+ 3. **Sentence Structure Variation**: Modifies sentence patterns for natural flow
82
+ 4. **Academic Connector Integration**: Adds natural transitional phrases
83
+ 5. **Hedging Language**: Incorporates academic hedging for natural tone
84
+ 6. **Smart Chunking**: Processes long texts in optimal chunks
85
+
86
+ ### AI Models Used
87
+
88
+ - **T5 Paraphrase (Primary)**: `Vamsi/T5_Paraphrase_Paws`
89
+ - **Pegasus (Secondary)**: `tuner007/pegasus_paraphrase`
90
+ - **NLTK WordNet**: For synonym replacement
91
+ - **Custom Algorithms**: For structure and flow optimization
92
+
93
+ ## 📊 Example Transformations
94
+
95
+ ### Input (AI-generated):
96
+ ```
97
+ The implementation of machine learning algorithms in data processing systems demonstrates significant improvements in efficiency and accuracy metrics across various benchmark datasets.
98
+ ```
99
+
100
+ ### Output (Humanized):
101
+ ```
102
+ Implementing machine learning algorithms within data processing frameworks shows notable enhancements in both efficiency and accuracy measures when evaluated across different benchmark datasets. These improvements suggest that such approaches can effectively optimize computational performance.
103
+ ```
104
+
105
+ ## 🎮 Advanced Features
106
+
107
+ ### Multi-Level Processing
108
+ - Processes texts of any length by intelligent chunking
109
+ - Maintains context across chunks
110
+ - Preserves academic integrity
111
+
112
+ ### Natural Variations
113
+ - Dynamic vocabulary replacement
114
+ - Contextual synonym selection
115
+ - Academic phrase integration
116
+ - Sentence flow optimization
117
+
118
+ ### Error Handling
119
+ - Graceful fallbacks if models fail
120
+ - Multiple backup techniques
121
+ - Robust error recovery
122
+
123
+ ## 🔍 Best Practices
124
+
125
+ 1. **Input Quality**: Use complete sentences and proper grammar
126
+ 2. **Length Considerations**: Works best with 50-1000 word chunks
127
+ 3. **Context Preservation**: Review output to ensure meaning is maintained
128
+ 4. **Multiple Passes**: For heavy humanization, consider multiple rounds
129
+ 5. **Manual Review**: Always review output for accuracy and flow
130
+
131
+ ## 🚫 Troubleshooting
132
+
133
+ ### Common Issues
134
+
135
+ 1. **Model Loading Errors**:
136
+ - Ensure protobuf is installed: `pip install protobuf`
137
+ - Check internet connection for model downloads
138
+ - Try the simple version if advanced fails
139
+
140
+ 2. **Memory Issues**:
141
+ - Reduce text chunk size
142
+ - Use lighter humanization levels
143
+ - Close other applications
144
+
145
+ 3. **Performance Issues**:
146
+ - Use GPU if available
147
+ - Process smaller texts
148
+ - Try the simple version
149
+
150
+ ## ⚖️ Ethical Usage
151
+
152
+ This tool is designed for:
153
+ - ✅ Improving writing quality
154
+ - ✅ Learning natural language patterns
155
+ - ✅ Enhancing academic writing
156
+ - ✅ Content optimization
157
+
158
+ Please use responsibly and:
159
+ - 🚫 Don't use for plagiarism
160
+ - 🚫 Don't violate academic integrity policies
161
+ - 🚫 Don't misrepresent authorship
162
+ - 🚫 Don't use for deceptive purposes
163
+
164
+ ## 🤝 Contributing
165
+
166
+ Feel free to:
167
+ - Report bugs
168
+ - Suggest improvements
169
+ - Add new models
170
+ - Enhance techniques
171
+
172
+ ## 📄 License
173
+
174
+ This project is for educational and research purposes. Please respect academic integrity and use responsibly.
175
+
176
+ ---
177
+
178
+ **Made with ❤️ for better academic writing**
README_deployment.md ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🤖➡️👨 AI Text Humanizer & Detector Pro
2
+
3
+ A comprehensive web application for transforming AI-generated text into natural, human-like writing while providing advanced AI detection capabilities.
4
+
5
+ ## ✨ Features
6
+
7
+ ### 🎭 Text Humanizer
8
+ - **Advanced Vocabulary Enhancement**: Replace robotic terms with natural alternatives
9
+ - **Sentence Flow Optimization**: Improve readability and natural rhythm
10
+ - **Structure Diversification**: Break up repetitive patterns
11
+ - **Academic Tone Preservation**: Maintain professional quality while adding humanity
12
+ - **Multi-level Processing**: Light, Medium, and Heavy humanization options
13
+
14
+ ### 🕵️ AI Detector
15
+ - **7-Point Analysis System**: Comprehensive AI probability assessment
16
+ - **Detailed Scoring**: Individual metrics for each detection factor
17
+ - **Confidence Levels**: Clear interpretation of results
18
+ - **Pattern Recognition**: Identifies common AI writing patterns
19
+ - **Real-time Analysis**: Instant feedback on text authenticity
20
+
21
+ ### 🔄 Combined Processing
22
+ - **One-Click Workflow**: Humanize and test in a single process
23
+ - **Optimization Loop**: Perfect for iterative improvements
24
+ - **Quality Validation**: Ensure humanization effectiveness
25
+
26
+ ## 🚀 Live Demo
27
+
28
+ Visit the live application: [Hugging Face Spaces](https://huggingface.co/spaces/YOUR_USERNAME/ai-text-humanizer)
29
+
30
+ ## 📦 Installation
31
+
32
+ ### Local Setup
33
+
34
+ 1. Clone the repository:
35
+ ```bash
36
+ git clone https://github.com/YOUR_USERNAME/ai-text-humanizer.git
37
+ cd ai-text-humanizer
38
+ ```
39
+
40
+ 2. Install dependencies:
41
+ ```bash
42
+ pip install -r requirements.txt
43
+ ```
44
+
45
+ 3. Run the application:
46
+ ```bash
47
+ python app.py
48
+ ```
49
+
50
+ 4. Open your browser to `http://localhost:7860`
51
+
52
+ ### Requirements
53
+ - Python 3.8+
54
+ - Gradio 4.44.0
55
+ - NLTK 3.8.1
56
+ - textstat 0.7.3
57
+ - numpy 1.24.3
58
+ - pandas 2.0.3
59
+
60
+ ## 🛠️ Technical Details
61
+
62
+ ### Humanization Algorithms
63
+ - **Vocabulary Diversification**: WordNet-based synonym replacement
64
+ - **Structural Variation**: Sentence pattern modification
65
+ - **Natural Flow Enhancement**: Academic connector and hedge phrase insertion
66
+ - **Linguistic Pattern Breaking**: AI-specific phrase elimination
67
+
68
+ ### AI Detection Metrics
69
+ 1. **AI Phrase Detection**: Identifies common AI-generated expressions
70
+ 2. **Vocabulary Repetition**: Analyzes overuse of academic terms
71
+ 3. **Structure Patterns**: Detects repetitive sentence starters
72
+ 4. **Transition Overuse**: Measures excessive formal connectors
73
+ 5. **Formal Pattern Recognition**: Identifies robotic phrasing
74
+ 6. **Sentence Consistency**: Analyzes unnatural uniformity
75
+ 7. **Readability Assessment**: Evaluates writing naturalness
76
+
77
+ ## 📈 Usage Examples
78
+
79
+ ### Input (AI-Generated):
80
+ ```
81
+ The implementation of artificial intelligence algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.
82
+ ```
83
+
84
+ ### Output (Humanized):
85
+ ```
86
+ AI algorithms show notable improvements in both computational efficiency and accuracy when tested across different benchmark datasets. These results indicate considerable advances in performance.
87
+ ```
88
+
89
+ ## 🔧 Configuration
90
+
91
+ ### Humanization Levels:
92
+ - **Light**: Basic vocabulary substitution
93
+ - **Medium**: Vocabulary + natural flow enhancement
94
+ - **Heavy**: All techniques including structure modification
95
+
96
+ ### AI Detection Thresholds:
97
+ - **0-20%**: Likely human-written
98
+ - **21-40%**: Possibly AI-generated
99
+ - **41-60%**: Probably AI-generated
100
+ - **61-80%**: Likely AI-generated
101
+ - **81-100%**: Very likely AI-generated
102
+
103
+ ## 🌐 Deployment Options
104
+
105
+ ### Hugging Face Spaces (Recommended)
106
+ 1. Fork this repository
107
+ 2. Create a new Space on Hugging Face
108
+ 3. Link your GitHub repository
109
+ 4. Automatic deployment with free GPU access
110
+
111
+ ### Railway
112
+ 1. Connect your GitHub repository
113
+ 2. Deploy with one click
114
+ 3. Free tier available
115
+
116
+ ### Heroku
117
+ 1. Create new Heroku app
118
+ 2. Connect GitHub repository
119
+ 3. Deploy from dashboard
120
+
121
+ ## ⚖️ Ethical Usage
122
+
123
+ This tool is designed for:
124
+ - ✅ Improving writing quality and naturalness
125
+ - ✅ Educational purposes and learning
126
+ - ✅ Understanding AI detection mechanisms
127
+ - ✅ Research and development
128
+
129
+ **Important Guidelines:**
130
+ - 🚫 Do not use for plagiarism or academic dishonesty
131
+ - 🚫 Do not violate institutional policies
132
+ - 🚫 Do not misrepresent authorship
133
+ - ✅ Maintain transparency about AI assistance
134
+ - ✅ Follow academic integrity guidelines
135
+
136
+ ## 🤝 Contributing
137
+
138
+ Contributions are welcome! Please feel free to submit pull requests or open issues for:
139
+ - Bug fixes
140
+ - Feature enhancements
141
+ - Algorithm improvements
142
+ - Documentation updates
143
+
144
+ ## 📄 License
145
+
146
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
147
+
148
+ ## 🙏 Acknowledgments
149
+
150
+ - NLTK team for natural language processing tools
151
+ - Hugging Face for hosting and deployment platform
152
+ - Gradio team for the web interface framework
153
+ - Open source community for various libraries and tools
154
+
155
+ ## 📞 Support
156
+
157
+ For questions, issues, or suggestions:
158
+ - Open an issue on GitHub
159
+ - Contact: [[email protected]]
160
+ - Documentation: [Link to detailed docs]
161
+
162
+ ---
163
+
164
+ **Disclaimer**: This tool is for educational and research purposes. Users are responsible for ensuring compliance with their institution's policies and maintaining academic integrity.
STATUS.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎯 AI Text Humanizer - Version Summary
2
+
3
+ ## 📊 **Current Status**
4
+
5
+ ✅ **WORKING APPLICATIONS:**
6
+ - **Robust Humanizer** (Port 7862) - **RECOMMENDED** ⭐
7
+ - Advanced Humanizer (Port 7860) - Running with fallbacks
8
+ - Simple Humanizer (Port 7861) - Running with fallbacks
9
+
10
+ ## 🚀 **Available Versions**
11
+
12
+ ### 1. **`humanizer_robust.py`** ⭐ **BEST CHOICE**
13
+ - **Port:** 7862
14
+ - **Status:** ✅ **FULLY WORKING**
15
+ - **Dependencies:** None (pure Python)
16
+ - **Features:**
17
+ - Advanced vocabulary replacement (20+ word pairs)
18
+ - Natural sentence flow optimization
19
+ - Academic connector integration
20
+ - Sentence restructuring for variety
21
+ - Hedging language insertion
22
+ - Smart sentence breaking
23
+ - Multiple intensity levels
24
+
25
+ **Why Choose This:**
26
+ - 🛡️ **Always works** - No external dependencies
27
+ - 🎯 **Highly effective** - Advanced linguistic techniques
28
+ - ⚡ **Fast processing** - No model loading delays
29
+ - 🔧 **Reliable** - No network or model failures
30
+
31
+ ### 2. **`humanizer_app.py`** (Advanced)
32
+ - **Port:** 7860
33
+ - **Status:** ⚠️ **Partial** (Models failing, fallbacks working)
34
+ - **Features:** Multi-model AI approach with NLTK integration
35
+ - **Issue:** SentencePiece tokenizer conversion problems
36
+
37
+ ### 3. **`humanizer_simple.py`** (Simple)
38
+ - **Port:** 7861
39
+ - **Status:** ⚠️ **Partial** (Model failing, fallbacks working)
40
+ - **Features:** Single T5 model approach
41
+ - **Issue:** Same tokenizer conversion problems
42
+
43
+ ### 4. **`humanizer_batch.py`** (Batch Processing)
44
+ - **Status:** 🚫 **Not Running** (Same model issues)
45
+ - **Features:** File upload and batch processing
46
+
47
+ ## 🎮 **How to Use the Working Version**
48
+
49
+ ### **Access the Robust Humanizer:**
50
+ ```
51
+ http://127.0.0.1:7862
52
+ ```
53
+
54
+ ### **Three Intensity Levels:**
55
+
56
+ 1. **Light Humanization:**
57
+ - Basic vocabulary substitutions
58
+ - Minimal structural changes
59
+ - Quick and conservative
60
+
61
+ 2. **Medium Humanization:** ⭐ **RECOMMENDED**
62
+ - Vocabulary variations + natural flow
63
+ - Academic connectors and transitions
64
+ - Balanced approach
65
+
66
+ 3. **Heavy Humanization:**
67
+ - All techniques + sentence restructuring
68
+ - Maximum transformation
69
+ - Most natural output
70
+
71
+ ## 🔧 **Technical Details**
72
+
73
+ ### **Robust Humanizer Techniques:**
74
+
75
+ 1. **Advanced Vocabulary Replacement:**
76
+ ```
77
+ "demonstrates" → ["shows", "reveals", "indicates", "illustrates"]
78
+ "significant" → ["notable", "considerable", "substantial"]
79
+ "utilize" → ["use", "employ", "apply", "implement"]
80
+ ```
81
+
82
+ 2. **Natural Flow Enhancement:**
83
+ - Academic sentence starters
84
+ - Transitional connectors
85
+ - Hedging phrases for natural tone
86
+
87
+ 3. **Sentence Structure Variation:**
88
+ - Smart sentence breaking for long sentences
89
+ - Natural connection between ideas
90
+ - Variety in sentence beginnings
91
+
92
+ 4. **Academic Tone Preservation:**
93
+ - Maintains scholarly language
94
+ - Preserves technical accuracy
95
+ - Enhances readability
96
+
97
+ ## 📝 **Example Transformation**
98
+
99
+ ### **Input (Robotic AI Text):**
100
+ ```
101
+ The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets. These results indicate that the optimization of neural network architectures can facilitate enhanced performance in predictive analytics applications.
102
+ ```
103
+
104
+ ### **Output (Humanized - Medium Level):**
105
+ ```
106
+ Implementing machine learning algorithms shows notable enhancements in computational efficiency and accuracy measures across various benchmark datasets. Moreover, these findings suggest that optimizing neural network architectures can help improve performance in predictive analytics applications. Research indicates that such approaches provide considerable benefits for data processing tasks.
107
+ ```
108
+
109
+ ## 🛠️ **If You Want to Fix the AI Model Versions:**
110
+
111
+ The main issue is with the SentencePiece tokenizer conversion. To potentially fix:
112
+
113
+ 1. **Try different model versions:**
114
+ ```bash
115
+ # Install specific transformers version
116
+ pip install transformers==4.30.0
117
+ ```
118
+
119
+ 2. **Use different models:**
120
+ ```python
121
+ # Replace with models that have better tokenizer support
122
+ "google/flan-t5-base" # Instead of Vamsi/T5_Paraphrase_Paws
123
+ ```
124
+
125
+ 3. **Force slow tokenizer:**
126
+ ```python
127
+ tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
128
+ ```
129
+
130
+ ## 💡 **Recommendations**
131
+
132
+ 1. **For Daily Use:** Use `humanizer_robust.py` (Port 7862)
133
+ 2. **For Best Results:** Use "Medium" intensity level
134
+ 3. **For Long Texts:** Process in chunks of 200-500 words
135
+ 4. **For Academic Papers:** Always review output for accuracy
136
+
137
+ ## ⚡ **Quick Start**
138
+
139
+ ```bash
140
+ # Run the working version
141
+ D:/Siddhant/projects/Humanizer/.venv/Scripts/python.exe humanizer_robust.py
142
+
143
+ # Open in browser
144
+ http://127.0.0.1:7862
145
+ ```
146
+
147
+ ## 🎯 **Why This Solution Works**
148
+
149
+ The robust version is highly effective because it:
150
+
151
+ - **Targets AI Detection Patterns:** Replaces common AI-generated phrases
152
+ - **Adds Natural Variation:** Uses multiple alternatives for each replacement
153
+ - **Maintains Academic Quality:** Preserves scholarly tone and accuracy
154
+ - **Creates Natural Flow:** Adds appropriate connectors and transitions
155
+ - **Varies Structure:** Changes sentence patterns for authenticity
156
+ - **Always Works:** No dependencies on external models or services
157
+
158
+ ---
159
+
160
+ **🎉 You now have a fully functional, robust AI text humanizer that will consistently produce natural, human-like text!**
app.py ADDED
@@ -0,0 +1,654 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import random
3
+ import re
4
+ import warnings
5
+ import math
6
+ from collections import Counter
7
+ warnings.filterwarnings("ignore")
8
+
9
+ # Import NLTK with error handling
10
+ try:
11
+ import nltk
12
+ import textstat
13
+ from nltk.corpus import wordnet
14
+ from nltk.tokenize import sent_tokenize, word_tokenize
15
+ NLTK_AVAILABLE = True
16
+
17
+ # Download required NLTK data
18
+ try:
19
+ nltk.data.find('tokenizers/punkt_tab')
20
+ except LookupError:
21
+ nltk.download('punkt_tab')
22
+ try:
23
+ nltk.data.find('tokenizers/punkt')
24
+ except LookupError:
25
+ nltk.download('punkt')
26
+ try:
27
+ nltk.data.find('corpora/wordnet')
28
+ except LookupError:
29
+ nltk.download('wordnet')
30
+ try:
31
+ nltk.data.find('corpora/omw-1.4')
32
+ except LookupError:
33
+ nltk.download('omw-1.4')
34
+
35
+ except ImportError as e:
36
+ print(f"NLTK import error: {e}")
37
+ NLTK_AVAILABLE = False
38
+ import textstat
39
+
40
+ class AdvancedHumanizer:
41
+ def __init__(self):
42
+ self.transition_words = [
43
+ "However", "Nevertheless", "Furthermore", "Moreover", "Additionally",
44
+ "Consequently", "Therefore", "Thus", "In contrast", "Similarly",
45
+ "On the other hand", "Meanwhile", "Subsequently", "Notably",
46
+ "Importantly", "Significantly", "Interestingly", "Remarkably"
47
+ ]
48
+
49
+ self.hedging_phrases = [
50
+ "appears to", "seems to", "tends to", "suggests that", "indicates that",
51
+ "may well", "might be", "could be", "potentially", "presumably",
52
+ "arguably", "to some extent", "in many cases", "generally speaking"
53
+ ]
54
+
55
+ self.academic_connectors = [
56
+ "In light of this", "Building upon this", "This finding suggests",
57
+ "It is worth noting that", "This observation", "These results",
58
+ "The evidence indicates", "This approach", "The data reveals"
59
+ ]
60
+
61
+ # Enhanced vocabulary replacements for better humanization
62
+ self.vocabulary_replacements = {
63
+ "significant": ["notable", "considerable", "substantial", "important", "remarkable"],
64
+ "demonstrate": ["show", "illustrate", "reveal", "display", "indicate"],
65
+ "utilize": ["use", "employ", "apply", "implement", "make use of"],
66
+ "implement": ["apply", "use", "put into practice", "carry out", "execute"],
67
+ "generate": ["create", "produce", "develop", "form", "make"],
68
+ "facilitate": ["help", "enable", "assist", "support", "aid"],
69
+ "optimize": ["improve", "enhance", "refine", "perfect", "better"],
70
+ "analyze": ["examine", "study", "investigate", "assess", "evaluate"],
71
+ "therefore": ["thus", "hence", "consequently", "as a result", "for this reason"],
72
+ "however": ["nevertheless", "nonetheless", "yet", "on the other hand", "but"],
73
+ "furthermore": ["moreover", "additionally", "in addition", "what is more", "besides"],
74
+ "substantial": ["significant", "considerable", "notable", "important", "major"],
75
+ "subsequently": ["later", "then", "afterward", "following this", "next"],
76
+ "approximately": ["about", "roughly", "around", "nearly", "close to"],
77
+ "numerous": ["many", "several", "multiple", "various", "a number of"],
78
+ "encompasses": ["includes", "covers", "contains", "involves", "comprises"],
79
+ "methodology": ["method", "approach", "technique", "procedure", "process"],
80
+ "comprehensive": ["complete", "thorough", "extensive", "detailed", "full"],
81
+ "indicates": ["shows", "suggests", "points to", "reveals", "demonstrates"],
82
+ "established": ["set up", "created", "formed", "developed", "built"]
83
+ }
84
+
85
+ def split_into_sentences(self, text):
86
+ """Smart sentence splitting with NLTK fallback"""
87
+ if NLTK_AVAILABLE:
88
+ return sent_tokenize(text)
89
+ else:
90
+ # Enhanced fallback sentence splitting
91
+ sentences = []
92
+ current = ""
93
+
94
+ for char in text:
95
+ current += char
96
+ if char == '.' and len(current) > 10:
97
+ # Check if this looks like end of sentence
98
+ remaining = text[text.find(current) + len(current):]
99
+ if remaining and (remaining[0].isupper() or remaining.strip().startswith(('The ', 'This ', 'A '))):
100
+ sentences.append(current.strip())
101
+ current = ""
102
+
103
+ if current.strip():
104
+ sentences.append(current.strip())
105
+
106
+ return [s for s in sentences if len(s.strip()) > 5]
107
+
108
+ def add_natural_variations(self, text):
109
+ """Add natural linguistic variations to make text less robotic"""
110
+ sentences = self.split_into_sentences(text)
111
+ varied_sentences = []
112
+
113
+ for i, sentence in enumerate(sentences):
114
+ sentence = sentence.strip()
115
+ if not sentence.endswith('.'):
116
+ sentence += '.'
117
+
118
+ # Randomly add hedging language
119
+ if random.random() < 0.3 and not any(phrase in sentence.lower() for phrase in self.hedging_phrases):
120
+ hedge = random.choice(self.hedging_phrases)
121
+ if sentence.startswith("The ") or sentence.startswith("This "):
122
+ words = sentence.split()
123
+ if len(words) > 2:
124
+ words.insert(2, hedge)
125
+ sentence = " ".join(words)
126
+
127
+ # Add transitional phrases for flow
128
+ if i > 0 and random.random() < 0.4:
129
+ connector = random.choice(self.academic_connectors)
130
+ sentence = f"{connector}, {sentence.lower()}"
131
+
132
+ varied_sentences.append(sentence)
133
+
134
+ return " ".join(varied_sentences)
135
+
136
+ def diversify_vocabulary(self, text):
137
+ """Replace common words with synonyms for variation"""
138
+ if NLTK_AVAILABLE:
139
+ words = word_tokenize(text)
140
+ result = []
141
+
142
+ for word in words:
143
+ if word.isalpha() and len(word) > 4 and random.random() < 0.2:
144
+ synonyms = []
145
+ for syn in wordnet.synsets(word):
146
+ for lemma in syn.lemmas():
147
+ if lemma.name() != word and '_' not in lemma.name():
148
+ synonyms.append(lemma.name())
149
+
150
+ if synonyms:
151
+ replacement = random.choice(synonyms[:3])
152
+ result.append(replacement)
153
+ else:
154
+ result.append(word)
155
+ else:
156
+ result.append(word)
157
+
158
+ return " ".join(result)
159
+ else:
160
+ # Enhanced fallback with more replacements
161
+ result = text
162
+ for original, alternatives in self.vocabulary_replacements.items():
163
+ if original.lower() in result.lower():
164
+ replacement = random.choice(alternatives)
165
+ pattern = re.compile(re.escape(original), re.IGNORECASE)
166
+ result = pattern.sub(replacement, result, count=1)
167
+
168
+ return result
169
+
170
+ def adjust_sentence_structure(self, text):
171
+ """Modify sentence structures for more natural flow"""
172
+ sentences = self.split_into_sentences(text)
173
+ modified = []
174
+
175
+ for sentence in sentences:
176
+ words = sentence.split()
177
+
178
+ # For long sentences, sometimes break them up
179
+ if len(words) > 20 and random.random() < 0.4:
180
+ # Find a good break point
181
+ break_words = ['and', 'but', 'which', 'that', 'because', 'since', 'while']
182
+ for i, word in enumerate(words[8:18], 8): # Look in middle section
183
+ if word.lower() in break_words:
184
+ part1 = " ".join(words[:i]) + "."
185
+ part2 = " ".join(words[i+1:])
186
+ if len(part2) > 5: # Only if second part is substantial
187
+ part2 = part2[0].upper() + part2[1:] if part2 else part2
188
+ modified.extend([part1, part2])
189
+ break
190
+ else:
191
+ modified.append(sentence)
192
+ else:
193
+ modified.append(sentence)
194
+
195
+ return " ".join(modified)
196
+
197
+ def clean_and_format(self, text):
198
+ """Clean up the text formatting"""
199
+ # Remove extra spaces
200
+ text = re.sub(r'\s+', ' ', text)
201
+ text = re.sub(r'\s+([.,!?;:])', r'\1', text)
202
+
203
+ # Fix capitalization
204
+ sentences = self.split_into_sentences(text)
205
+ formatted = []
206
+
207
+ for sentence in sentences:
208
+ sentence = sentence.strip()
209
+ if sentence:
210
+ # Capitalize first letter
211
+ sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
212
+
213
+ # Ensure proper ending
214
+ if not sentence.endswith(('.', '!', '?')):
215
+ sentence += '.'
216
+
217
+ formatted.append(sentence)
218
+
219
+ return " ".join(formatted)
220
+
221
+ def humanize_text(self, text, intensity="medium"):
222
+ """Main humanization function"""
223
+ if not text or len(text.strip()) < 10:
224
+ return "Please enter substantial text to humanize (at least 10 characters)."
225
+
226
+ result = text.strip()
227
+
228
+ try:
229
+ # Apply different levels of humanization
230
+ if intensity.lower() in ["light", "low"]:
231
+ # Just vocabulary changes
232
+ result = self.diversify_vocabulary(result)
233
+
234
+ elif intensity.lower() in ["medium", "moderate"]:
235
+ # Vocabulary + natural flow
236
+ result = self.diversify_vocabulary(result)
237
+ result = self.add_natural_variations(result)
238
+
239
+ elif intensity.lower() in ["heavy", "high", "maximum"]:
240
+ # All techniques
241
+ result = self.diversify_vocabulary(result)
242
+ result = self.add_natural_variations(result)
243
+ result = self.adjust_sentence_structure(result)
244
+
245
+ # Always clean up formatting
246
+ result = self.clean_and_format(result)
247
+
248
+ return result if result and len(result) > 10 else text
249
+
250
+ except Exception as e:
251
+ print(f"Humanization error: {e}")
252
+ return "Error processing text. Please try again with different input."
253
+
254
+ class AIDetector:
255
+ def __init__(self):
256
+ """Initialize AI detection patterns and thresholds"""
257
+ self.ai_phrases = [
258
+ "demonstrates significant", "substantial improvements", "comprehensive analysis",
259
+ "furthermore", "moreover", "additionally", "consequently", "therefore",
260
+ "implementation of", "utilization of", "optimization of", "enhancement of",
261
+ "facilitate", "demonstrate", "indicate", "substantial", "comprehensive",
262
+ "significant improvements", "notable enhancements", "effective approach",
263
+ "robust methodology", "systematic approach", "extensive evaluation",
264
+ "empirical results", "experimental validation", "performance metrics",
265
+ "benchmark datasets", "state-of-the-art", "cutting-edge", "novel approach",
266
+ "innovative solution", "groundbreaking", "revolutionary", "paradigm shift"
267
+ ]
268
+
269
+ self.overused_academic_words = [
270
+ "significant", "substantial", "comprehensive", "extensive", "robust",
271
+ "novel", "innovative", "efficient", "effective", "optimal", "superior",
272
+ "enhanced", "improved", "advanced", "sophisticated", "cutting-edge",
273
+ "state-of-the-art", "groundbreaking", "revolutionary", "paradigm"
274
+ ]
275
+
276
+ self.excessive_transitions = [
277
+ "furthermore", "moreover", "additionally", "consequently", "therefore",
278
+ "thus", "hence", "nevertheless", "nonetheless", "however"
279
+ ]
280
+
281
+ self.formal_patterns = [
282
+ r"the implementation of \w+",
283
+ r"the utilization of \w+",
284
+ r"in order to \w+",
285
+ r"it is important to note that",
286
+ r"it should be emphasized that",
287
+ r"it can be observed that",
288
+ r"the results demonstrate that",
289
+ r"the findings indicate that"
290
+ ]
291
+
292
+ def calculate_ai_probability(self, text):
293
+ """Calculate the probability that text is AI-generated"""
294
+ if not text or len(text.strip()) < 50:
295
+ return {"probability": 0, "confidence": "Low", "details": {"error": "Text too short for analysis"}}
296
+
297
+ scores = {}
298
+
299
+ # Various AI detection checks
300
+ scores['ai_phrases'] = self._check_ai_phrases(text)
301
+ scores['vocab_repetition'] = self._check_vocabulary_repetition(text)
302
+ scores['structure_patterns'] = self._check_structure_patterns(text)
303
+ scores['transition_overuse'] = self._check_transition_overuse(text)
304
+ scores['formal_patterns'] = self._check_formal_patterns(text)
305
+ scores['sentence_consistency'] = self._check_sentence_consistency(text)
306
+ scores['readability'] = self._check_readability_patterns(text)
307
+
308
+ # Calculate weighted final score
309
+ weights = {
310
+ 'ai_phrases': 0.2, 'vocab_repetition': 0.15, 'structure_patterns': 0.15,
311
+ 'transition_overuse': 0.15, 'formal_patterns': 0.15,
312
+ 'sentence_consistency': 0.1, 'readability': 0.1
313
+ }
314
+
315
+ final_score = sum(scores[key] * weights[key] for key in weights)
316
+ final_score = min(100, max(0, final_score))
317
+
318
+ # Determine confidence level
319
+ if final_score >= 80:
320
+ confidence, verdict = "Very High", "Likely AI-Generated"
321
+ elif final_score >= 60:
322
+ confidence, verdict = "High", "Probably AI-Generated"
323
+ elif final_score >= 40:
324
+ confidence, verdict = "Medium", "Possibly AI-Generated"
325
+ elif final_score >= 20:
326
+ confidence, verdict = "Low", "Probably Human-Written"
327
+ else:
328
+ confidence, verdict = "Very Low", "Likely Human-Written"
329
+
330
+ return {
331
+ "probability": round(final_score, 1),
332
+ "confidence": confidence,
333
+ "verdict": verdict,
334
+ "details": {k: round(v, 1) for k, v in scores.items()}
335
+ }
336
+
337
+ def _check_ai_phrases(self, text):
338
+ text_lower = text.lower()
339
+ phrase_count = sum(1 for phrase in self.ai_phrases if phrase in text_lower)
340
+ words = len(text.split())
341
+ return min(100, (phrase_count / words) * 1000 * 10) if words > 0 else 0
342
+
343
+ def _check_vocabulary_repetition(self, text):
344
+ words = [word.lower().strip('.,!?;:') for word in text.split() if word.isalpha()]
345
+ if len(words) < 10:
346
+ return 0
347
+ word_counts = Counter(words)
348
+ overused_count = sum(1 for word in self.overused_academic_words if word_counts.get(word, 0) > 1)
349
+ return min(100, (overused_count / len(self.overused_academic_words)) * 200)
350
+
351
+ def _check_structure_patterns(self, text):
352
+ if NLTK_AVAILABLE:
353
+ sentences = sent_tokenize(text)
354
+ else:
355
+ sentences = [s.strip() for s in text.split('.') if s.strip()]
356
+
357
+ if len(sentences) < 3:
358
+ return 0
359
+
360
+ starters = [s.split()[:3] for s in sentences if len(s.split()) >= 3]
361
+ starter_counts = Counter([' '.join(starter) for starter in starters])
362
+ repeated_starters = sum(1 for count in starter_counts.values() if count > 1)
363
+ return min(100, (repeated_starters / len(sentences)) * 150) if sentences else 0
364
+
365
+ def _check_transition_overuse(self, text):
366
+ text_lower = text.lower()
367
+ transition_count = sum(1 for transition in self.excessive_transitions if transition in text_lower)
368
+ words = len(text.split())
369
+ return min(100, (transition_count / words) * 100 * 20) if words > 0 else 0
370
+
371
+ def _check_formal_patterns(self, text):
372
+ pattern_count = sum(len(re.findall(pattern, text.lower())) for pattern in self.formal_patterns)
373
+ words = len(text.split())
374
+ return min(100, (pattern_count / words) * 1000 * 15) if words > 0 else 0
375
+
376
+ def _check_sentence_consistency(self, text):
377
+ if NLTK_AVAILABLE:
378
+ sentences = sent_tokenize(text)
379
+ else:
380
+ sentences = [s.strip() for s in text.split('.') if s.strip()]
381
+
382
+ if len(sentences) < 5:
383
+ return 0
384
+
385
+ lengths = [len(s.split()) for s in sentences]
386
+ avg_length = sum(lengths) / len(lengths)
387
+ variance = sum((length - avg_length) ** 2 for length in lengths) / len(lengths)
388
+ std_dev = math.sqrt(variance)
389
+ consistency_score = 100 - min(100, std_dev * 10)
390
+ return max(0, consistency_score - 20)
391
+
392
+ def _check_readability_patterns(self, text):
393
+ try:
394
+ words = text.split()
395
+ sentences = len([s for s in text.split('.') if s.strip()])
396
+ if sentences == 0:
397
+ return 0
398
+ avg_words_per_sentence = len(words) / sentences
399
+ if 15 <= avg_words_per_sentence <= 25:
400
+ return 30
401
+ elif 25 < avg_words_per_sentence <= 35:
402
+ return 50
403
+ else:
404
+ return 10
405
+ except:
406
+ return 0
407
+
408
+ # Initialize components
409
+ humanizer = AdvancedHumanizer()
410
+ ai_detector = AIDetector()
411
+
412
+ def process_text(input_text, humanization_level):
413
+ """Process the input text"""
414
+ return humanizer.humanize_text(input_text, humanization_level)
415
+
416
+ def detect_ai_text(input_text):
417
+ """Detect if text is AI-generated"""
418
+ if not input_text.strip():
419
+ return "Please enter some text to analyze."
420
+
421
+ result = ai_detector.calculate_ai_probability(input_text)
422
+
423
+ return f"""
424
+ ## 🤖 AI Detection Analysis
425
+
426
+ **Overall Assessment:** {result['verdict']}
427
+ **AI Probability:** {result['probability']}%
428
+ **Confidence Level:** {result['confidence']}
429
+
430
+ ### 📊 Detailed Breakdown:
431
+ - **AI Phrases Score:** {result['details']['ai_phrases']}%
432
+ - **Vocabulary Repetition:** {result['details']['vocab_repetition']}%
433
+ - **Structure Patterns:** {result['details']['structure_patterns']}%
434
+ - **Transition Overuse:** {result['details']['transition_overuse']}%
435
+ - **Formal Patterns:** {result['details']['formal_patterns']}%
436
+ - **Sentence Consistency:** {result['details']['sentence_consistency']}%
437
+ - **Readability Score:** {result['details']['readability']}%
438
+
439
+ ### 💡 Interpretation:
440
+ - **0-20%:** Likely human-written with natural variations
441
+ - **21-40%:** Possibly AI-generated or heavily edited
442
+ - **41-60%:** Probably AI-generated with some humanization
443
+ - **61-80%:** Likely AI-generated with minimal editing
444
+ - **81-100%:** Very likely raw AI-generated content
445
+ """
446
+
447
+ def combined_process(text, level):
448
+ """Humanize text and then analyze it"""
449
+ if not text.strip():
450
+ return "Please enter text to process.", "No analysis available."
451
+
452
+ humanized = process_text(text, level)
453
+ analysis = detect_ai_text(humanized)
454
+ return humanized, analysis
455
+
456
+ # Create Gradio interface
457
+ with gr.Blocks(theme="soft", title="AI Text Humanizer & Detector") as demo:
458
+ gr.Markdown("""
459
+ # 🤖➡️👨 AI Text Humanizer & Detector Pro
460
+
461
+ **Complete solution for AI text processing - Humanize AND Detect AI-generated content**
462
+
463
+ Transform robotic AI text into natural, human-like writing, then verify the results with our built-in AI detector.
464
+
465
+ ⚠️ **Note:** This tool is for educational purposes. Please use responsibly and maintain academic integrity.
466
+ """)
467
+
468
+ with gr.Tabs():
469
+ # Humanization Tab
470
+ with gr.TabItem("🎭 Text Humanizer"):
471
+ gr.Markdown("### Transform AI text into natural, human-like writing")
472
+
473
+ with gr.Row():
474
+ with gr.Column():
475
+ humanize_input = gr.Textbox(
476
+ lines=10,
477
+ placeholder="Enter machine-generated or robotic academic text here...",
478
+ label="Raw Input Text",
479
+ info="Paste your AI-generated text that needs to be humanized"
480
+ )
481
+
482
+ humanization_level = gr.Radio(
483
+ choices=["Light", "Medium", "Heavy"],
484
+ value="Medium",
485
+ label="Humanization Level",
486
+ info="Light: Basic changes | Medium: Vocabulary + flow | Heavy: All techniques"
487
+ )
488
+
489
+ humanize_btn = gr.Button("🚀 Humanize Text", variant="primary", size="lg")
490
+
491
+ with gr.Column():
492
+ humanize_output = gr.Textbox(
493
+ label="Humanized Academic Output",
494
+ lines=10,
495
+ show_copy_button=True,
496
+ info="Copy this natural, human-like text"
497
+ )
498
+
499
+ # Examples for humanizer
500
+ gr.Examples(
501
+ examples=[
502
+ [
503
+ "The implementation of artificial intelligence algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.",
504
+ "Medium"
505
+ ],
506
+ [
507
+ "Machine learning models exhibit superior performance characteristics when evaluated against traditional statistical approaches in predictive analytics applications.",
508
+ "Heavy"
509
+ ]
510
+ ],
511
+ inputs=[humanize_input, humanization_level],
512
+ outputs=humanize_output
513
+ )
514
+
515
+ # AI Detection Tab
516
+ with gr.TabItem("🕵️ AI Detector"):
517
+ gr.Markdown("### Analyze text to detect if it's AI-generated")
518
+
519
+ with gr.Row():
520
+ with gr.Column():
521
+ detect_input = gr.Textbox(
522
+ lines=10,
523
+ placeholder="Paste text here to check if it's AI-generated...",
524
+ label="Text to Analyze",
525
+ info="Enter any text to check its AI probability"
526
+ )
527
+
528
+ detect_btn = gr.Button("🔍 Analyze Text", variant="secondary", size="lg")
529
+
530
+ with gr.Column():
531
+ detect_output = gr.Markdown(
532
+ label="AI Detection Results",
533
+ value="Analysis results will appear here..."
534
+ )
535
+
536
+ # Examples for detector
537
+ gr.Examples(
538
+ examples=[
539
+ ["The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets. Furthermore, these results indicate substantial enhancements in performance."],
540
+ ["I love going to the coffee shop on weekends. The barista there makes the best cappuccino I've ever had, and I always end up chatting with other customers about random stuff."],
541
+ ["The comprehensive analysis reveals that the optimization of neural network architectures facilitates enhanced performance characteristics in predictive analytics applications."]
542
+ ],
543
+ inputs=[detect_input],
544
+ outputs=detect_output
545
+ )
546
+
547
+ # Combined Analysis Tab
548
+ with gr.TabItem("🔄 Humanize & Test"):
549
+ gr.Markdown("### Humanize text and immediately test the results")
550
+
551
+ with gr.Column():
552
+ combined_input = gr.Textbox(
553
+ lines=8,
554
+ placeholder="Enter AI-generated text to humanize and test...",
555
+ label="Original AI Text",
556
+ info="This will be humanized and then tested for AI detection"
557
+ )
558
+
559
+ combined_level = gr.Radio(
560
+ choices=["Light", "Medium", "Heavy"],
561
+ value="Medium",
562
+ label="Humanization Level"
563
+ )
564
+
565
+ combined_btn = gr.Button("🔄 Humanize & Analyze", variant="primary", size="lg")
566
+
567
+ with gr.Row():
568
+ with gr.Column():
569
+ combined_humanized = gr.Textbox(
570
+ label="Humanized Text",
571
+ lines=8,
572
+ show_copy_button=True
573
+ )
574
+
575
+ with gr.Column():
576
+ combined_analysis = gr.Markdown(
577
+ label="AI Detection Analysis",
578
+ value="Analysis will appear here..."
579
+ )
580
+
581
+ # Info Tab
582
+ with gr.TabItem("ℹ️ Instructions"):
583
+ gr.Markdown("""
584
+ ### 🎯 How to Use:
585
+
586
+ **Text Humanizer:**
587
+ 1. Paste your AI-generated text
588
+ 2. Choose humanization level
589
+ 3. Get natural, human-like output
590
+
591
+ **AI Detector:**
592
+ 1. Paste any text
593
+ 2. Get detailed AI probability analysis
594
+ 3. See breakdown of detection factors
595
+
596
+ **Combined Mode:**
597
+ 1. Humanize and test in one step
598
+ 2. Perfect for optimizing results
599
+ 3. Iterate until satisfied
600
+
601
+ ### 🔧 Features:
602
+
603
+ **Humanization Techniques:**
604
+ - ✅ Advanced vocabulary variations
605
+ - ✅ Natural sentence flow enhancement
606
+ - ✅ Academic tone preservation
607
+ - ✅ Structure diversification
608
+ - ✅ Linguistic pattern breaking
609
+
610
+ **AI Detection:**
611
+ - 🔍 7-point analysis system
612
+ - 📊 Detailed scoring breakdown
613
+ - 🎯 Confidence assessment
614
+ - 💡 Improvement suggestions
615
+
616
+ ### ⚖️ Ethical Usage:
617
+ This tool is designed for:
618
+ - ✅ Improving writing quality
619
+ - ✅ Learning natural language patterns
620
+ - ✅ Educational purposes
621
+ - ✅ Understanding AI detection
622
+
623
+ **Please use responsibly:**
624
+ - 🚫 Don't use for plagiarism
625
+ - 🚫 Don't violate academic policies
626
+ - 🚫 Don't misrepresent authorship
627
+ - ✅ Maintain academic integrity
628
+ """)
629
+
630
+ # Event handlers
631
+ humanize_btn.click(
632
+ fn=process_text,
633
+ inputs=[humanize_input, humanization_level],
634
+ outputs=humanize_output
635
+ )
636
+
637
+ detect_btn.click(
638
+ fn=detect_ai_text,
639
+ inputs=[detect_input],
640
+ outputs=detect_output
641
+ )
642
+
643
+ combined_btn.click(
644
+ fn=combined_process,
645
+ inputs=[combined_input, combined_level],
646
+ outputs=[combined_humanized, combined_analysis]
647
+ )
648
+
649
+ if __name__ == "__main__":
650
+ demo.launch(
651
+ share=True, # Enable public sharing
652
+ server_name="0.0.0.0",
653
+ server_port=7860
654
+ )
humanizer_app.py ADDED
@@ -0,0 +1,823 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
3
+ import torch
4
+ import random
5
+ import re
6
+ import warnings
7
+ import math
8
+ from collections import Counter
9
+ warnings.filterwarnings("ignore")
10
+
11
+ # Import NLTK with error handling
12
+ try:
13
+ import nltk
14
+ import textstat
15
+ from nltk.corpus import wordnet
16
+ from nltk.tokenize import sent_tokenize, word_tokenize
17
+ NLTK_AVAILABLE = True
18
+ except ImportError as e:
19
+ print(f"NLTK import error: {e}")
20
+ NLTK_AVAILABLE = False
21
+ # Fallback imports
22
+ import textstat
23
+
24
+ # Download required NLTK data if available
25
+ if NLTK_AVAILABLE:
26
+ try:
27
+ nltk.data.find('tokenizers/punkt_tab')
28
+ except LookupError:
29
+ print("Downloading punkt_tab...")
30
+ nltk.download('punkt_tab')
31
+ try:
32
+ nltk.data.find('tokenizers/punkt')
33
+ except LookupError:
34
+ print("Downloading punkt...")
35
+ nltk.download('punkt')
36
+ try:
37
+ nltk.data.find('corpora/wordnet')
38
+ except LookupError:
39
+ print("Downloading wordnet...")
40
+ nltk.download('wordnet')
41
+ try:
42
+ nltk.data.find('corpora/omw-1.4')
43
+ except LookupError:
44
+ print("Downloading omw-1.4...")
45
+ nltk.download('omw-1.4')
46
+
47
+ # Load multiple models for diverse paraphrasing
48
+ models = {
49
+ "t5_paraphrase": {
50
+ "model_name": "Vamsi/T5_Paraphrase_Paws",
51
+ "tokenizer": None,
52
+ "model": None
53
+ },
54
+ "pegasus": {
55
+ "model_name": "tuner007/pegasus_paraphrase",
56
+ "tokenizer": None,
57
+ "model": None
58
+ }
59
+ }
60
+
61
+ # Initialize models
62
+ for key, model_info in models.items():
63
+ try:
64
+ model_info["tokenizer"] = AutoTokenizer.from_pretrained(model_info["model_name"])
65
+ model_info["model"] = AutoModelForSeq2SeqLM.from_pretrained(model_info["model_name"])
66
+ print(f"Loaded {key} model successfully")
67
+ except Exception as e:
68
+ print(f"Failed to load {key}: {e}")
69
+
70
+ class AdvancedHumanizer:
71
+ def __init__(self):
72
+ self.transition_words = [
73
+ "However", "Nevertheless", "Furthermore", "Moreover", "Additionally",
74
+ "Consequently", "Therefore", "Thus", "In contrast", "Similarly",
75
+ "On the other hand", "Meanwhile", "Subsequently", "Notably",
76
+ "Importantly", "Significantly", "Interestingly", "Remarkably"
77
+ ]
78
+
79
+ self.hedging_phrases = [
80
+ "appears to", "seems to", "tends to", "suggests that", "indicates that",
81
+ "may well", "might be", "could be", "potentially", "presumably",
82
+ "arguably", "to some extent", "in many cases", "generally speaking"
83
+ ]
84
+
85
+ self.academic_connectors = [
86
+ "In light of this", "Building upon this", "This finding suggests",
87
+ "It is worth noting that", "This observation", "These results",
88
+ "The evidence indicates", "This approach", "The data reveals"
89
+ ]
90
+
91
+ def add_natural_variations(self, text):
92
+ """Add natural linguistic variations to make text less robotic"""
93
+ if NLTK_AVAILABLE:
94
+ sentences = sent_tokenize(text)
95
+ else:
96
+ # Fallback: simple sentence splitting
97
+ sentences = [s.strip() for s in text.split('.') if s.strip()]
98
+
99
+ varied_sentences = []
100
+
101
+ for i, sentence in enumerate(sentences):
102
+ if not sentence.endswith('.') and NLTK_AVAILABLE:
103
+ sentence += '.'
104
+ elif not sentence.endswith('.') and not NLTK_AVAILABLE:
105
+ sentence += '.'
106
+
107
+ # Randomly add hedging language
108
+ if random.random() < 0.3 and not any(phrase in sentence.lower() for phrase in self.hedging_phrases):
109
+ hedge = random.choice(self.hedging_phrases)
110
+ if sentence.startswith("The ") or sentence.startswith("This "):
111
+ sentence = sentence.replace("The ", f"The {hedge} ", 1)
112
+ sentence = sentence.replace("This ", f"This {hedge} ", 1)
113
+
114
+ # Add transitional phrases for flow
115
+ if i > 0 and random.random() < 0.4:
116
+ connector = random.choice(self.academic_connectors)
117
+ sentence = f"{connector}, {sentence.lower()}"
118
+
119
+ varied_sentences.append(sentence)
120
+
121
+ return " ".join(varied_sentences)
122
+
123
+ def diversify_vocabulary(self, text):
124
+ """Replace common words with synonyms for variation"""
125
+ if not NLTK_AVAILABLE:
126
+ # Fallback: simple word replacements
127
+ replacements = {
128
+ "significant": "notable", "important": "crucial", "demonstrate": "show",
129
+ "utilize": "use", "implement": "apply", "generate": "create",
130
+ "facilitate": "help", "optimize": "improve", "analyze": "examine"
131
+ }
132
+ result = text
133
+ for old, new in replacements.items():
134
+ result = re.sub(r'\b' + old + r'\b', new, result, flags=re.IGNORECASE)
135
+ return result
136
+
137
+ words = word_tokenize(text)
138
+ result = []
139
+
140
+ for word in words:
141
+ if word.isalpha() and len(word) > 4 and random.random() < 0.2:
142
+ synonyms = []
143
+ for syn in wordnet.synsets(word):
144
+ for lemma in syn.lemmas():
145
+ if lemma.name() != word and '_' not in lemma.name():
146
+ synonyms.append(lemma.name())
147
+
148
+ if synonyms:
149
+ replacement = random.choice(synonyms[:3]) # Use top 3 synonyms
150
+ result.append(replacement)
151
+ else:
152
+ result.append(word)
153
+ else:
154
+ result.append(word)
155
+
156
+ return " ".join(result)
157
+
158
+ def adjust_sentence_structure(self, text):
159
+ """Modify sentence structures for more natural flow"""
160
+ if NLTK_AVAILABLE:
161
+ sentences = sent_tokenize(text)
162
+ else:
163
+ # Fallback: simple sentence splitting
164
+ sentences = [s.strip() + '.' for s in text.split('.') if s.strip()]
165
+
166
+ modified = []
167
+
168
+ for sentence in sentences:
169
+ # Randomly split long sentences
170
+ if len(sentence.split()) > 20 and random.random() < 0.4:
171
+ words = sentence.split()
172
+ mid_point = len(words) // 2
173
+ # Find a good breaking point near the middle
174
+ for i in range(mid_point - 3, mid_point + 3):
175
+ if i < len(words) and words[i].rstrip('.,').lower() in ['and', 'but', 'which', 'that']:
176
+ part1 = " ".join(words[:i]) + "."
177
+ part2 = " ".join(words[i+1:])
178
+ if part2:
179
+ part2 = part2[0].upper() + part2[1:]
180
+ modified.extend([part1, part2])
181
+ break
182
+ else:
183
+ modified.append(sentence)
184
+ else:
185
+ modified.append(sentence)
186
+
187
+ return " ".join(modified)
188
+
189
+ def paraphrase_with_multiple_models(self, text, chunk_size=300):
190
+ """Use multiple models to paraphrase different parts of the text"""
191
+ # Check if any models are available
192
+ available_models = [k for k, v in models.items() if v["model"] is not None]
193
+ if not available_models:
194
+ # No models available, use fallback humanization
195
+ return self.fallback_humanization(text)
196
+
197
+ if len(text) <= chunk_size:
198
+ return self.paraphrase_single_chunk(text)
199
+
200
+ # Split into chunks
201
+ if NLTK_AVAILABLE:
202
+ sentences = sent_tokenize(text)
203
+ else:
204
+ sentences = [s.strip() + '.' for s in text.split('.') if s.strip()]
205
+
206
+ chunks = []
207
+ current_chunk = ""
208
+
209
+ for sentence in sentences:
210
+ if len(current_chunk + sentence) <= chunk_size:
211
+ current_chunk += sentence + " "
212
+ else:
213
+ if current_chunk:
214
+ chunks.append(current_chunk.strip())
215
+ current_chunk = sentence + " "
216
+
217
+ if current_chunk:
218
+ chunks.append(current_chunk.strip())
219
+
220
+ # Paraphrase each chunk with different models
221
+ paraphrased_chunks = []
222
+ for i, chunk in enumerate(chunks):
223
+ paraphrased = self.paraphrase_single_chunk(chunk, model_choice=i % len(available_models))
224
+ paraphrased_chunks.append(paraphrased)
225
+
226
+ return " ".join(paraphrased_chunks)
227
+
228
+ def fallback_humanization(self, text):
229
+ """Fallback humanization when no AI models are available"""
230
+ # Use the vocabulary diversification and natural variations
231
+ result = self.diversify_vocabulary(text)
232
+ result = self.add_natural_variations(result)
233
+ return result
234
+
235
+ def paraphrase_single_chunk(self, text, model_choice=0):
236
+ """Paraphrase a single chunk of text"""
237
+ available_models = [k for k, v in models.items() if v["model"] is not None]
238
+ if not available_models:
239
+ # No models available, use fallback
240
+ return self.fallback_humanization(text)
241
+
242
+ model_key = available_models[model_choice % len(available_models)]
243
+ model_info = models[model_key]
244
+
245
+ try:
246
+ if model_key == "t5_paraphrase":
247
+ input_ids = model_info["tokenizer"].encode(
248
+ f"paraphrase: {text}",
249
+ return_tensors="pt",
250
+ max_length=512,
251
+ truncation=True
252
+ )
253
+ outputs = model_info["model"].generate(
254
+ input_ids=input_ids,
255
+ max_length=len(text.split()) + 50,
256
+ num_beams=5,
257
+ num_return_sequences=1,
258
+ temperature=1.2,
259
+ top_k=50,
260
+ top_p=0.92,
261
+ do_sample=True,
262
+ early_stopping=True
263
+ )
264
+ result = model_info["tokenizer"].decode(outputs[0], skip_special_tokens=True)
265
+
266
+ elif model_key == "pegasus":
267
+ input_ids = model_info["tokenizer"].encode(
268
+ text,
269
+ return_tensors="pt",
270
+ max_length=512,
271
+ truncation=True
272
+ )
273
+ outputs = model_info["model"].generate(
274
+ input_ids=input_ids,
275
+ max_length=len(text.split()) + 30,
276
+ num_beams=4,
277
+ temperature=1.1,
278
+ top_p=0.9,
279
+ do_sample=True
280
+ )
281
+ result = model_info["tokenizer"].decode(outputs[0], skip_special_tokens=True)
282
+
283
+ return result if result and len(result) > 10 else self.fallback_humanization(text)
284
+ except Exception as e:
285
+ print(f"Error with {model_key}: {e}")
286
+ return self.fallback_humanization(text)
287
+
288
+ class AIDetector:
289
+ def __init__(self):
290
+ """Initialize AI detection patterns and thresholds"""
291
+ # Common AI-generated text patterns
292
+ self.ai_phrases = [
293
+ "demonstrates significant", "substantial improvements", "comprehensive analysis",
294
+ "furthermore", "moreover", "additionally", "consequently", "therefore",
295
+ "implementation of", "utilization of", "optimization of", "enhancement of",
296
+ "facilitate", "demonstrate", "indicate", "substantial", "comprehensive",
297
+ "significant improvements", "notable enhancements", "effective approach",
298
+ "robust methodology", "systematic approach", "extensive evaluation",
299
+ "empirical results", "experimental validation", "performance metrics",
300
+ "benchmark datasets", "state-of-the-art", "cutting-edge", "novel approach",
301
+ "innovative solution", "groundbreaking", "revolutionary", "paradigm shift"
302
+ ]
303
+
304
+ # Academic buzzwords that AI overuses
305
+ self.overused_academic_words = [
306
+ "significant", "substantial", "comprehensive", "extensive", "robust",
307
+ "novel", "innovative", "efficient", "effective", "optimal", "superior",
308
+ "enhanced", "improved", "advanced", "sophisticated", "cutting-edge",
309
+ "state-of-the-art", "groundbreaking", "revolutionary", "paradigm"
310
+ ]
311
+
312
+ # Transition words AI uses excessively
313
+ self.excessive_transitions = [
314
+ "furthermore", "moreover", "additionally", "consequently", "therefore",
315
+ "thus", "hence", "nevertheless", "nonetheless", "however"
316
+ ]
317
+
318
+ # Formal structures AI tends to overuse
319
+ self.formal_patterns = [
320
+ r"the implementation of \w+",
321
+ r"the utilization of \w+",
322
+ r"in order to \w+",
323
+ r"it is important to note that",
324
+ r"it should be emphasized that",
325
+ r"it can be observed that",
326
+ r"the results demonstrate that",
327
+ r"the findings indicate that"
328
+ ]
329
+
330
+ def calculate_ai_probability(self, text):
331
+ """Calculate the probability that text is AI-generated"""
332
+ if not text or len(text.strip()) < 50:
333
+ return {"probability": 0, "confidence": "Low", "details": {"error": "Text too short for analysis"}}
334
+
335
+ scores = {}
336
+
337
+ # 1. Check for AI phrases
338
+ scores['ai_phrases'] = self._check_ai_phrases(text)
339
+
340
+ # 2. Check vocabulary repetition
341
+ scores['vocab_repetition'] = self._check_vocabulary_repetition(text)
342
+
343
+ # 3. Check sentence structure patterns
344
+ scores['structure_patterns'] = self._check_structure_patterns(text)
345
+
346
+ # 4. Check transition word overuse
347
+ scores['transition_overuse'] = self._check_transition_overuse(text)
348
+
349
+ # 5. Check formal pattern overuse
350
+ scores['formal_patterns'] = self._check_formal_patterns(text)
351
+
352
+ # 6. Check sentence length consistency
353
+ scores['sentence_consistency'] = self._check_sentence_consistency(text)
354
+
355
+ # 7. Check readability patterns
356
+ scores['readability'] = self._check_readability_patterns(text)
357
+
358
+ # Calculate weighted final score
359
+ weights = {
360
+ 'ai_phrases': 0.2,
361
+ 'vocab_repetition': 0.15,
362
+ 'structure_patterns': 0.15,
363
+ 'transition_overuse': 0.15,
364
+ 'formal_patterns': 0.15,
365
+ 'sentence_consistency': 0.1,
366
+ 'readability': 0.1
367
+ }
368
+
369
+ final_score = sum(scores[key] * weights[key] for key in weights)
370
+ final_score = min(100, max(0, final_score)) # Clamp between 0-100
371
+
372
+ # Determine confidence level
373
+ if final_score >= 80:
374
+ confidence = "Very High"
375
+ verdict = "Likely AI-Generated"
376
+ elif final_score >= 60:
377
+ confidence = "High"
378
+ verdict = "Probably AI-Generated"
379
+ elif final_score >= 40:
380
+ confidence = "Medium"
381
+ verdict = "Possibly AI-Generated"
382
+ elif final_score >= 20:
383
+ confidence = "Low"
384
+ verdict = "Probably Human-Written"
385
+ else:
386
+ confidence = "Very Low"
387
+ verdict = "Likely Human-Written"
388
+
389
+ return {
390
+ "probability": round(final_score, 1),
391
+ "confidence": confidence,
392
+ "verdict": verdict,
393
+ "details": {
394
+ "ai_phrases_score": round(scores['ai_phrases'], 1),
395
+ "vocabulary_repetition": round(scores['vocab_repetition'], 1),
396
+ "structure_patterns": round(scores['structure_patterns'], 1),
397
+ "transition_overuse": round(scores['transition_overuse'], 1),
398
+ "formal_patterns": round(scores['formal_patterns'], 1),
399
+ "sentence_consistency": round(scores['sentence_consistency'], 1),
400
+ "readability_score": round(scores['readability'], 1)
401
+ }
402
+ }
403
+
404
+ def _check_ai_phrases(self, text):
405
+ """Check for common AI-generated phrases"""
406
+ text_lower = text.lower()
407
+ phrase_count = sum(1 for phrase in self.ai_phrases if phrase in text_lower)
408
+ words = len(text.split())
409
+
410
+ if words == 0:
411
+ return 0
412
+
413
+ # Score based on phrase density
414
+ density = (phrase_count / words) * 1000 # Per 1000 words
415
+ return min(100, density * 10) # Scale to 0-100
416
+
417
+ def _check_vocabulary_repetition(self, text):
418
+ """Check for repetitive vocabulary typical of AI"""
419
+ words = [word.lower().strip('.,!?;:') for word in text.split() if word.isalpha()]
420
+ if len(words) < 10:
421
+ return 0
422
+
423
+ word_counts = Counter(words)
424
+ overused_count = sum(1 for word in self.overused_academic_words if word_counts.get(word, 0) > 1)
425
+
426
+ # Calculate repetition score
427
+ total_overused_words = len(self.overused_academic_words)
428
+ repetition_ratio = overused_count / total_overused_words if total_overused_words > 0 else 0
429
+
430
+ return min(100, repetition_ratio * 200) # Scale to 0-100
431
+
432
+ def _check_structure_patterns(self, text):
433
+ """Check for repetitive sentence structures"""
434
+ if NLTK_AVAILABLE:
435
+ sentences = sent_tokenize(text)
436
+ else:
437
+ sentences = [s.strip() for s in text.split('.') if s.strip()]
438
+
439
+ if len(sentences) < 3:
440
+ return 0
441
+
442
+ # Check for similar sentence starters
443
+ starters = [s.split()[:3] for s in sentences if len(s.split()) >= 3]
444
+ starter_counts = Counter([' '.join(starter) for starter in starters])
445
+
446
+ repeated_starters = sum(1 for count in starter_counts.values() if count > 1)
447
+ repetition_ratio = repeated_starters / len(sentences) if len(sentences) > 0 else 0
448
+
449
+ return min(100, repetition_ratio * 150) # Scale to 0-100
450
+
451
+ def _check_transition_overuse(self, text):
452
+ """Check for excessive use of transition words"""
453
+ text_lower = text.lower()
454
+ transition_count = sum(1 for transition in self.excessive_transitions if transition in text_lower)
455
+ words = len(text.split())
456
+
457
+ if words == 0:
458
+ return 0
459
+
460
+ # Score based on transition density
461
+ density = (transition_count / words) * 100 # Percentage
462
+ return min(100, density * 20) # Scale to 0-100
463
+
464
+ def _check_formal_patterns(self, text):
465
+ """Check for overly formal patterns typical of AI"""
466
+ pattern_count = 0
467
+ text_lower = text.lower()
468
+
469
+ for pattern in self.formal_patterns:
470
+ matches = re.findall(pattern, text_lower)
471
+ pattern_count += len(matches)
472
+
473
+ words = len(text.split())
474
+ if words == 0:
475
+ return 0
476
+
477
+ density = (pattern_count / words) * 1000 # Per 1000 words
478
+ return min(100, density * 15) # Scale to 0-100
479
+
480
+ def _check_sentence_consistency(self, text):
481
+ """Check for unnaturally consistent sentence lengths"""
482
+ if NLTK_AVAILABLE:
483
+ sentences = sent_tokenize(text)
484
+ else:
485
+ sentences = [s.strip() for s in text.split('.') if s.strip()]
486
+
487
+ if len(sentences) < 5:
488
+ return 0
489
+
490
+ lengths = [len(s.split()) for s in sentences]
491
+ avg_length = sum(lengths) / len(lengths)
492
+
493
+ # Calculate variance
494
+ variance = sum((length - avg_length) ** 2 for length in lengths) / len(lengths)
495
+ std_dev = math.sqrt(variance)
496
+
497
+ # Low variance indicates AI (unnaturally consistent)
498
+ consistency_score = 100 - min(100, std_dev * 10) # Invert score
499
+ return max(0, consistency_score - 20) # Adjust threshold
500
+
501
+ def _check_readability_patterns(self, text):
502
+ """Check readability patterns that suggest AI generation"""
503
+ try:
504
+ # Simple readability metrics
505
+ words = text.split()
506
+ sentences = len([s for s in text.split('.') if s.strip()])
507
+
508
+ if sentences == 0:
509
+ return 0
510
+
511
+ avg_words_per_sentence = len(words) / sentences
512
+
513
+ # AI tends to have very consistent, moderate sentence lengths
514
+ if 15 <= avg_words_per_sentence <= 25:
515
+ return 30 # Moderate AI indicator
516
+ elif 25 < avg_words_per_sentence <= 35:
517
+ return 50 # Higher AI indicator
518
+ else:
519
+ return 10 # More natural variation
520
+
521
+ except:
522
+ return 0
523
+
524
+ # Initialize AI detector
525
+ ai_detector = AIDetector()
526
+
527
+ # Initialize humanizer
528
+ humanizer = AdvancedHumanizer()
529
+
530
+ def detect_ai_text(input_text):
531
+ """Detect if text is AI-generated"""
532
+ if not input_text.strip():
533
+ return "Please enter some text to analyze."
534
+
535
+ result = ai_detector.calculate_ai_probability(input_text)
536
+
537
+ # Format the output
538
+ output = f"""
539
+ ## 🤖 AI Detection Analysis
540
+
541
+ **Overall Assessment:** {result['verdict']}
542
+ **AI Probability:** {result['probability']}%
543
+ **Confidence Level:** {result['confidence']}
544
+
545
+ ### 📊 Detailed Breakdown:
546
+
547
+ - **AI Phrases Score:** {result['details']['ai_phrases_score']}%
548
+ - **Vocabulary Repetition:** {result['details']['vocabulary_repetition']}%
549
+ - **Structure Patterns:** {result['details']['structure_patterns']}%
550
+ - **Transition Overuse:** {result['details']['transition_overuse']}%
551
+ - **Formal Patterns:** {result['details']['formal_patterns']}%
552
+ - **Sentence Consistency:** {result['details']['sentence_consistency']}%
553
+ - **Readability Score:** {result['details']['readability_score']}%
554
+
555
+ ### 💡 Interpretation:
556
+ - **0-20%:** Likely human-written with natural variations
557
+ - **21-40%:** Possibly AI-generated or heavily edited
558
+ - **41-60%:** Probably AI-generated with some humanization
559
+ - **61-80%:** Likely AI-generated with minimal editing
560
+ - **81-100%:** Very likely raw AI-generated content
561
+
562
+ ### 🛡️ Tips to Improve:
563
+ - Add more natural vocabulary variations
564
+ - Use varied sentence structures
565
+ - Include personal insights or examples
566
+ - Reduce formal academic buzzwords
567
+ - Add natural transitions and flow
568
+ """
569
+
570
+ return output
571
+
572
+ def humanize_academic_text(input_text, humanization_level="Moderate"):
573
+ """
574
+ Advanced humanization with multiple techniques
575
+ """
576
+ if not input_text.strip():
577
+ return "Please enter some text to humanize."
578
+
579
+ # Step 1: Initial paraphrasing with multiple models
580
+ paraphrased = humanizer.paraphrase_with_multiple_models(input_text)
581
+
582
+ # Apply different levels of humanization
583
+ if humanization_level == "Light":
584
+ # Minimal changes - just paraphrasing
585
+ result = paraphrased
586
+ elif humanization_level == "Moderate":
587
+ # Add natural variations and some vocabulary changes
588
+ result = humanizer.add_natural_variations(paraphrased)
589
+ result = humanizer.diversify_vocabulary(result)
590
+ else: # Heavy
591
+ # Apply all techniques
592
+ result = humanizer.add_natural_variations(paraphrased)
593
+ result = humanizer.diversify_vocabulary(result)
594
+ result = humanizer.adjust_sentence_structure(result)
595
+
596
+ # Clean up formatting
597
+ result = re.sub(r'\s+', ' ', result).strip()
598
+ result = re.sub(r'\s+([.,!?;:])', r'\1', result)
599
+
600
+ # Ensure proper capitalization
601
+ if NLTK_AVAILABLE:
602
+ sentences = sent_tokenize(result)
603
+ else:
604
+ sentences = [s.strip() for s in result.split('.') if s.strip()]
605
+
606
+ formatted_sentences = []
607
+ for sentence in sentences:
608
+ if sentence:
609
+ sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
610
+ if not sentence.endswith(('.', '!', '?')):
611
+ sentence += '.'
612
+ formatted_sentences.append(sentence)
613
+
614
+ final_result = " ".join(formatted_sentences)
615
+
616
+ return final_result if final_result else "Error processing text. Please try again."
617
+
618
+ # Create Gradio interface with tabs for both humanization and AI detection
619
+ with gr.Blocks(theme="soft", title="AI Text Humanizer & Detector") as demo:
620
+ gr.Markdown("""
621
+ # 🤖➡️👨 AI Text Humanizer & Detector Pro
622
+
623
+ **Complete solution for AI text processing - Humanize AND Detect AI-generated content**
624
+
625
+ Transform robotic AI text into natural, human-like writing, then verify the results with our built-in AI detector.
626
+ """)
627
+
628
+ with gr.Tabs():
629
+ # Humanization Tab
630
+ with gr.TabItem("🎭 Text Humanizer"):
631
+ gr.Markdown("### Transform AI text into natural, human-like writing")
632
+
633
+ with gr.Row():
634
+ with gr.Column():
635
+ humanize_input = gr.Textbox(
636
+ lines=10,
637
+ placeholder="Enter machine-generated or robotic academic text here...",
638
+ label="Raw Input Text",
639
+ info="Paste your AI-generated text that needs to be humanized"
640
+ )
641
+
642
+ humanization_level = gr.Radio(
643
+ choices=["Light", "Moderate", "Heavy"],
644
+ value="Moderate",
645
+ label="Humanization Level",
646
+ info="Light: Basic paraphrasing | Moderate: Natural variations + vocabulary | Heavy: All techniques"
647
+ )
648
+
649
+ humanize_btn = gr.Button("🚀 Humanize Text", variant="primary", size="lg")
650
+
651
+ with gr.Column():
652
+ humanize_output = gr.Textbox(
653
+ label="Humanized Academic Output",
654
+ lines=10,
655
+ show_copy_button=True,
656
+ info="Copy this natural, human-like text"
657
+ )
658
+
659
+ # Examples for humanizer
660
+ gr.Examples(
661
+ examples=[
662
+ [
663
+ "The implementation of artificial intelligence algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.",
664
+ "Moderate"
665
+ ],
666
+ [
667
+ "Machine learning models exhibit superior performance characteristics when evaluated against traditional statistical approaches in predictive analytics applications.",
668
+ "Heavy"
669
+ ]
670
+ ],
671
+ inputs=[humanize_input, humanization_level],
672
+ outputs=humanize_output,
673
+ fn=humanize_academic_text
674
+ )
675
+
676
+ # AI Detection Tab
677
+ with gr.TabItem("🕵️ AI Detector"):
678
+ gr.Markdown("### Analyze text to detect if it's AI-generated")
679
+
680
+ with gr.Row():
681
+ with gr.Column():
682
+ detect_input = gr.Textbox(
683
+ lines=10,
684
+ placeholder="Paste text here to check if it's AI-generated...",
685
+ label="Text to Analyze",
686
+ info="Enter any text to check its AI probability"
687
+ )
688
+
689
+ detect_btn = gr.Button("🔍 Analyze Text", variant="secondary", size="lg")
690
+
691
+ with gr.Column():
692
+ detect_output = gr.Markdown(
693
+ label="AI Detection Results",
694
+ value="Analysis results will appear here..."
695
+ )
696
+
697
+ # Examples for detector
698
+ gr.Examples(
699
+ examples=[
700
+ ["The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets. Furthermore, these results indicate substantial enhancements in performance."],
701
+ ["I love going to the coffee shop on weekends. The barista there makes the best cappuccino I've ever had, and I always end up chatting with other customers about random stuff."],
702
+ ["The comprehensive analysis reveals that the optimization of neural network architectures facilitates enhanced performance characteristics in predictive analytics applications."]
703
+ ],
704
+ inputs=[detect_input],
705
+ outputs=detect_output,
706
+ fn=detect_ai_text
707
+ )
708
+
709
+ # Combined Analysis Tab
710
+ with gr.TabItem("🔄 Humanize & Test"):
711
+ gr.Markdown("### Humanize text and immediately test the results")
712
+
713
+ with gr.Column():
714
+ combined_input = gr.Textbox(
715
+ lines=8,
716
+ placeholder="Enter AI-generated text to humanize and test...",
717
+ label="Original AI Text",
718
+ info="This will be humanized and then tested for AI detection"
719
+ )
720
+
721
+ combined_level = gr.Radio(
722
+ choices=["Light", "Moderate", "Heavy"],
723
+ value="Moderate",
724
+ label="Humanization Level"
725
+ )
726
+
727
+ combined_btn = gr.Button("🔄 Humanize & Analyze", variant="primary", size="lg")
728
+
729
+ with gr.Row():
730
+ with gr.Column():
731
+ combined_humanized = gr.Textbox(
732
+ label="Humanized Text",
733
+ lines=8,
734
+ show_copy_button=True
735
+ )
736
+
737
+ with gr.Column():
738
+ combined_analysis = gr.Markdown(
739
+ label="AI Detection Analysis",
740
+ value="Analysis will appear here..."
741
+ )
742
+
743
+ # Settings & Info Tab
744
+ with gr.TabItem("ℹ️ Info & Settings"):
745
+ gr.Markdown("""
746
+ ### 🎯 How to Use:
747
+
748
+ **Humanizer:**
749
+ 1. Paste your AI-generated text
750
+ 2. Choose humanization level
751
+ 3. Get natural, human-like output
752
+
753
+ **AI Detector:**
754
+ 1. Paste any text
755
+ 2. Get detailed AI probability analysis
756
+ 3. See breakdown of detection factors
757
+
758
+ **Combined Mode:**
759
+ 1. Humanize and test in one step
760
+ 2. Perfect for optimizing results
761
+ 3. Iterate until satisfied
762
+
763
+ ### 🔧 Features:
764
+
765
+ **Humanization:**
766
+ - ✅ Multiple AI models for paraphrasing
767
+ - ✅ Natural vocabulary variations
768
+ - ✅ Sentence structure optimization
769
+ - ✅ Academic tone preservation
770
+ - ✅ Three intensity levels
771
+
772
+ **AI Detection:**
773
+ - 🔍 Advanced pattern recognition
774
+ - 📊 Detailed scoring breakdown
775
+ - 🎯 Multiple detection criteria
776
+ - 📈 Confidence assessment
777
+ - 💡 Improvement suggestions
778
+
779
+ ### ⚖️ Ethical Usage:
780
+ This tool is for improving writing quality and understanding AI detection.
781
+ Use responsibly and maintain academic integrity.
782
+ """)
783
+
784
+ # Event handlers
785
+ humanize_btn.click(
786
+ fn=humanize_academic_text,
787
+ inputs=[humanize_input, humanization_level],
788
+ outputs=humanize_output
789
+ )
790
+
791
+ detect_btn.click(
792
+ fn=detect_ai_text,
793
+ inputs=[detect_input],
794
+ outputs=detect_output
795
+ )
796
+
797
+ def combined_process(text, level):
798
+ """Humanize text and then analyze it"""
799
+ if not text.strip():
800
+ return "Please enter text to process.", "No analysis available."
801
+
802
+ # First humanize
803
+ humanized = humanize_academic_text(text, level)
804
+
805
+ # Then analyze
806
+ analysis = detect_ai_text(humanized)
807
+
808
+ return humanized, analysis
809
+
810
+ combined_btn.click(
811
+ fn=combined_process,
812
+ inputs=[combined_input, combined_level],
813
+ outputs=[combined_humanized, combined_analysis]
814
+ )
815
+
816
+ if __name__ == "__main__":
817
+ demo.launch(
818
+ share=False,
819
+ debug=True,
820
+ show_error=True,
821
+ server_name="127.0.0.1",
822
+ server_port=7860
823
+ )
humanizer_batch.py ADDED
@@ -0,0 +1,329 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
4
+ import torch
5
+ import random
6
+ import re
7
+ import warnings
8
+ warnings.filterwarnings("ignore")
9
+
10
+ class BatchHumanizer:
11
+ def __init__(self):
12
+ try:
13
+ self.model_name = "Vamsi/T5_Paraphrase_Paws"
14
+ self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
15
+ self.model = AutoModelForSeq2SeqLM.from_pretrained(self.model_name)
16
+ print("✅ Batch Humanizer model loaded successfully")
17
+ except Exception as e:
18
+ print(f"❌ Error loading model: {e}")
19
+ self.tokenizer = None
20
+ self.model = None
21
+
22
+ def humanize_single_text(self, text, strength="medium"):
23
+ """Humanize a single piece of text"""
24
+ if not self.model or not self.tokenizer:
25
+ return self.fallback_humanize(text)
26
+
27
+ try:
28
+ # Paraphrase using T5
29
+ input_text = f"paraphrase: {text}"
30
+ input_ids = self.tokenizer.encode(
31
+ input_text,
32
+ return_tensors="pt",
33
+ max_length=512,
34
+ truncation=True
35
+ )
36
+
37
+ # Adjust parameters based on strength
38
+ if strength == "light":
39
+ temp, top_p = 1.1, 0.9
40
+ elif strength == "heavy":
41
+ temp, top_p = 1.5, 0.95
42
+ else: # medium
43
+ temp, top_p = 1.3, 0.92
44
+
45
+ with torch.no_grad():
46
+ outputs = self.model.generate(
47
+ input_ids=input_ids,
48
+ max_length=min(len(text.split()) + 50, 512),
49
+ num_beams=5,
50
+ temperature=temp,
51
+ top_p=top_p,
52
+ do_sample=True,
53
+ early_stopping=True,
54
+ repetition_penalty=1.2
55
+ )
56
+
57
+ result = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
58
+
59
+ # Additional humanization
60
+ if strength in ["medium", "heavy"]:
61
+ result = self.add_natural_variations(result)
62
+
63
+ return self.clean_text(result) if result and len(result) > 10 else text
64
+
65
+ except Exception as e:
66
+ print(f"Error humanizing text: {e}")
67
+ return self.fallback_humanize(text)
68
+
69
+ def fallback_humanize(self, text):
70
+ """Simple fallback humanization without model"""
71
+ # Basic word replacements
72
+ replacements = {
73
+ "utilize": "use", "demonstrate": "show", "facilitate": "help",
74
+ "optimize": "improve", "implement": "apply", "generate": "create",
75
+ "therefore": "thus", "however": "yet", "furthermore": "also"
76
+ }
77
+
78
+ result = text
79
+ for old, new in replacements.items():
80
+ result = re.sub(r'\b' + old + r'\b', new, result, flags=re.IGNORECASE)
81
+
82
+ return result
83
+
84
+ def add_natural_variations(self, text):
85
+ """Add natural language variations"""
86
+ # Academic connectors
87
+ connectors = [
88
+ "Moreover", "Furthermore", "Additionally", "In contrast",
89
+ "Similarly", "Consequently", "Nevertheless", "Notably"
90
+ ]
91
+
92
+ sentences = text.split('.')
93
+ varied = []
94
+
95
+ for i, sentence in enumerate(sentences):
96
+ sentence = sentence.strip()
97
+ if not sentence:
98
+ continue
99
+
100
+ # Sometimes add connectors
101
+ if i > 0 and random.random() < 0.2:
102
+ connector = random.choice(connectors)
103
+ sentence = f"{connector}, {sentence.lower()}"
104
+
105
+ varied.append(sentence)
106
+
107
+ return '. '.join(varied) + '.' if varied else text
108
+
109
+ def clean_text(self, text):
110
+ """Clean and format text"""
111
+ # Remove extra spaces
112
+ text = re.sub(r'\s+', ' ', text)
113
+ text = re.sub(r'\s+([.!?,:;])', r'\1', text)
114
+
115
+ # Capitalize sentences
116
+ sentences = text.split('. ')
117
+ formatted = []
118
+ for sentence in sentences:
119
+ sentence = sentence.strip()
120
+ if sentence:
121
+ sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
122
+ formatted.append(sentence)
123
+
124
+ result = '. '.join(formatted)
125
+ if not result.endswith(('.', '!', '?')):
126
+ result += '.'
127
+
128
+ return result
129
+
130
+ # Initialize humanizer
131
+ batch_humanizer = BatchHumanizer()
132
+
133
+ def process_text_input(text_input, strength):
134
+ """Process single text input"""
135
+ if not text_input or not text_input.strip():
136
+ return "Please enter some text to humanize."
137
+
138
+ return batch_humanizer.humanize_single_text(text_input, strength.lower())
139
+
140
+ def process_file_upload(file, strength):
141
+ """Process uploaded file"""
142
+ if file is None:
143
+ return "Please upload a file.", None
144
+
145
+ try:
146
+ # Read the file
147
+ if file.name.endswith('.txt'):
148
+ with open(file.name, 'r', encoding='utf-8') as f:
149
+ content = f.read()
150
+
151
+ # Split into paragraphs or sentences for processing
152
+ paragraphs = [p.strip() for p in content.split('\n\n') if p.strip()]
153
+
154
+ humanized_paragraphs = []
155
+ for para in paragraphs:
156
+ if len(para) > 50: # Only process substantial paragraphs
157
+ humanized = batch_humanizer.humanize_single_text(para, strength.lower())
158
+ humanized_paragraphs.append(humanized)
159
+ else:
160
+ humanized_paragraphs.append(para)
161
+
162
+ result = '\n\n'.join(humanized_paragraphs)
163
+
164
+ # Save to new file
165
+ output_filename = file.name.replace('.txt', '_humanized.txt')
166
+ with open(output_filename, 'w', encoding='utf-8') as f:
167
+ f.write(result)
168
+
169
+ return result, output_filename
170
+
171
+ elif file.name.endswith('.csv'):
172
+ df = pd.read_csv(file.name)
173
+
174
+ # Assume the text column is named 'text' or the first column
175
+ text_column = 'text' if 'text' in df.columns else df.columns[0]
176
+
177
+ # Humanize each text entry
178
+ df['humanized'] = df[text_column].apply(
179
+ lambda x: batch_humanizer.humanize_single_text(str(x), strength.lower()) if pd.notna(x) else x
180
+ )
181
+
182
+ # Save to new CSV
183
+ output_filename = file.name.replace('.csv', '_humanized.csv')
184
+ df.to_csv(output_filename, index=False)
185
+
186
+ return f"Processed {len(df)} entries. Check the 'humanized' column.", output_filename
187
+
188
+ else:
189
+ return "Unsupported file format. Please upload .txt or .csv files.", None
190
+
191
+ except Exception as e:
192
+ return f"Error processing file: {str(e)}", None
193
+
194
+ # Create Gradio interface with tabs
195
+ with gr.Blocks(theme="soft", title="AI Text Humanizer Pro") as demo:
196
+ gr.Markdown("""
197
+ # 🤖➡️👨 AI Text Humanizer Pro
198
+
199
+ **Advanced tool to transform robotic AI-generated text into natural, human-like writing**
200
+
201
+ Perfect for:
202
+ - 📝 Academic papers and essays
203
+ - 📊 Research reports
204
+ - 📄 Business documents
205
+ - 💼 Professional content
206
+ - 🔍 Bypassing AI detection tools
207
+ """)
208
+
209
+ with gr.Tabs():
210
+ # Single Text Tab
211
+ with gr.TabItem("Single Text"):
212
+ gr.Markdown("### Humanize Individual Text")
213
+
214
+ with gr.Row():
215
+ with gr.Column(scale=2):
216
+ text_input = gr.Textbox(
217
+ lines=12,
218
+ placeholder="Paste your AI-generated text here...",
219
+ label="Input Text",
220
+ info="Enter the text you want to humanize"
221
+ )
222
+
223
+ strength_single = gr.Radio(
224
+ choices=["Light", "Medium", "Heavy"],
225
+ value="Medium",
226
+ label="Humanization Strength"
227
+ )
228
+
229
+ process_btn = gr.Button("🚀 Humanize Text", variant="primary")
230
+
231
+ with gr.Column(scale=2):
232
+ text_output = gr.Textbox(
233
+ lines=12,
234
+ label="Humanized Output",
235
+ show_copy_button=True
236
+ )
237
+
238
+ # Examples
239
+ gr.Examples(
240
+ examples=[
241
+ ["The implementation of artificial intelligence algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.", "Medium"],
242
+ ["Machine learning models exhibit superior performance characteristics when evaluated against traditional statistical approaches in predictive analytics applications.", "Heavy"],
243
+ ["The research methodology utilized in this study involves comprehensive data collection and analysis procedures to ensure robust and reliable results.", "Light"]
244
+ ],
245
+ inputs=[text_input, strength_single],
246
+ outputs=text_output,
247
+ fn=process_text_input
248
+ )
249
+
250
+ # Batch Processing Tab
251
+ with gr.TabItem("Batch Processing"):
252
+ gr.Markdown("### Process Files in Batch")
253
+ gr.Markdown("Upload .txt or .csv files to humanize multiple texts at once")
254
+
255
+ with gr.Row():
256
+ with gr.Column():
257
+ file_input = gr.File(
258
+ label="Upload File (.txt or .csv)",
259
+ file_types=[".txt", ".csv"]
260
+ )
261
+
262
+ strength_batch = gr.Radio(
263
+ choices=["Light", "Medium", "Heavy"],
264
+ value="Medium",
265
+ label="Humanization Strength"
266
+ )
267
+
268
+ process_file_btn = gr.Button("🔄 Process File", variant="primary")
269
+
270
+ with gr.Column():
271
+ file_output = gr.Textbox(
272
+ lines=10,
273
+ label="Processing Results",
274
+ show_copy_button=True
275
+ )
276
+
277
+ download_file = gr.File(
278
+ label="Download Processed File",
279
+ visible=False
280
+ )
281
+
282
+ # Settings Tab
283
+ with gr.TabItem("Settings & Info"):
284
+ gr.Markdown("""
285
+ ### How it works:
286
+
287
+ 1. **Light Humanization**: Basic paraphrasing with minimal changes
288
+ 2. **Medium Humanization**: Paraphrasing + vocabulary variations
289
+ 3. **Heavy Humanization**: All techniques + sentence structure changes
290
+
291
+ ### Features:
292
+ - ✅ Advanced T5-based paraphrasing
293
+ - ✅ Natural vocabulary diversification
294
+ - ✅ Sentence structure optimization
295
+ - ✅ Academic tone preservation
296
+ - ✅ Batch file processing
297
+ - ✅ Multiple output formats
298
+
299
+ ### Supported Formats:
300
+ - **Text files (.txt)**: Processes paragraph by paragraph
301
+ - **CSV files (.csv)**: Adds 'humanized' column with processed text
302
+
303
+ ### Tips for best results:
304
+ - Use complete sentences and paragraphs
305
+ - Avoid very short fragments
306
+ - Choose appropriate humanization strength
307
+ - Review output for context accuracy
308
+ """)
309
+
310
+ # Event handlers
311
+ process_btn.click(
312
+ fn=process_text_input,
313
+ inputs=[text_input, strength_single],
314
+ outputs=text_output
315
+ )
316
+
317
+ process_file_btn.click(
318
+ fn=process_file_upload,
319
+ inputs=[file_input, strength_batch],
320
+ outputs=[file_output, download_file]
321
+ )
322
+
323
+ if __name__ == "__main__":
324
+ demo.launch(
325
+ share=False,
326
+ server_name="0.0.0.0",
327
+ server_port=7862,
328
+ debug=True
329
+ )
humanizer_robust.py ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import random
3
+ import re
4
+ import warnings
5
+ warnings.filterwarnings("ignore")
6
+
7
+ class RobustHumanizer:
8
+ def __init__(self):
9
+ """Initialize with robust fallback techniques that don't require external models"""
10
+ self.academic_replacements = {
11
+ # Common AI patterns to humanize
12
+ "demonstrates": ["shows", "reveals", "indicates", "illustrates", "displays"],
13
+ "significant": ["notable", "considerable", "substantial", "important", "remarkable"],
14
+ "utilize": ["use", "employ", "apply", "implement", "make use of"],
15
+ "implement": ["apply", "use", "put into practice", "carry out", "execute"],
16
+ "generate": ["create", "produce", "develop", "form", "make"],
17
+ "facilitate": ["help", "enable", "assist", "support", "aid"],
18
+ "optimize": ["improve", "enhance", "refine", "perfect", "better"],
19
+ "analyze": ["examine", "study", "investigate", "assess", "evaluate"],
20
+ "therefore": ["thus", "hence", "consequently", "as a result", "for this reason"],
21
+ "however": ["nevertheless", "nonetheless", "yet", "on the other hand", "but"],
22
+ "furthermore": ["moreover", "additionally", "in addition", "what is more", "besides"],
23
+ "substantial": ["significant", "considerable", "notable", "important", "major"],
24
+ "subsequently": ["later", "then", "afterward", "following this", "next"],
25
+ "approximately": ["about", "roughly", "around", "nearly", "close to"],
26
+ "numerous": ["many", "several", "multiple", "various", "a number of"],
27
+ "encompasses": ["includes", "covers", "contains", "involves", "comprises"],
28
+ "methodology": ["method", "approach", "technique", "procedure", "process"],
29
+ "comprehensive": ["complete", "thorough", "extensive", "detailed", "full"],
30
+ "indicates": ["shows", "suggests", "points to", "reveals", "demonstrates"],
31
+ "established": ["set up", "created", "formed", "developed", "built"]
32
+ }
33
+
34
+ self.sentence_starters = [
35
+ "Notably,", "Importantly,", "Significantly,", "Interestingly,",
36
+ "Furthermore,", "Moreover,", "Additionally,", "In contrast,",
37
+ "Similarly,", "Nevertheless,", "Consequently,", "As a result,",
38
+ "In particular,", "Specifically,", "Generally,", "Typically,"
39
+ ]
40
+
41
+ self.hedging_phrases = [
42
+ "appears to", "seems to", "tends to", "suggests that", "indicates that",
43
+ "may well", "might be", "could be", "potentially", "presumably",
44
+ "arguably", "to some extent", "in many cases", "generally speaking",
45
+ "it is likely that", "evidence suggests", "research indicates"
46
+ ]
47
+
48
+ self.connecting_phrases = [
49
+ "In light of this", "Building upon this", "This finding suggests",
50
+ "It is worth noting that", "This observation", "These results",
51
+ "The evidence indicates", "This approach", "The data reveals",
52
+ "Research shows", "Studies demonstrate", "Analysis reveals"
53
+ ]
54
+
55
+ def split_into_sentences(self, text):
56
+ """Simple sentence splitting"""
57
+ # Split by periods, but be careful with abbreviations
58
+ sentences = []
59
+ current = ""
60
+
61
+ for char in text:
62
+ current += char
63
+ if char == '.' and len(current) > 10:
64
+ # Check if this looks like end of sentence
65
+ next_chars = text[text.find(current) + len(current):text.find(current) + len(current) + 3]
66
+ if next_chars.strip() and (next_chars[0].isupper() or next_chars.strip()[0].isupper()):
67
+ sentences.append(current.strip())
68
+ current = ""
69
+
70
+ if current.strip():
71
+ sentences.append(current.strip())
72
+
73
+ return [s for s in sentences if len(s.strip()) > 5]
74
+
75
+ def vary_vocabulary(self, text):
76
+ """Replace words with alternatives"""
77
+ result = text
78
+
79
+ for original, alternatives in self.academic_replacements.items():
80
+ if original.lower() in result.lower():
81
+ replacement = random.choice(alternatives)
82
+ # Case-sensitive replacement
83
+ pattern = re.compile(re.escape(original), re.IGNORECASE)
84
+ result = pattern.sub(replacement, result, count=1)
85
+
86
+ return result
87
+
88
+ def add_natural_flow(self, text):
89
+ """Add natural academic connectors and hedging"""
90
+ sentences = self.split_into_sentences(text)
91
+ if not sentences:
92
+ return text
93
+
94
+ modified_sentences = []
95
+
96
+ for i, sentence in enumerate(sentences):
97
+ sentence = sentence.strip()
98
+ if not sentence:
99
+ continue
100
+
101
+ # Add hedging to some sentences
102
+ if random.random() < 0.3 and not any(hedge in sentence.lower() for hedge in self.hedging_phrases):
103
+ if sentence.lower().startswith(('the ', 'this ', 'these ', 'that ')):
104
+ hedge = random.choice(self.hedging_phrases)
105
+ words = sentence.split()
106
+ if len(words) > 2:
107
+ words.insert(2, hedge)
108
+ sentence = " ".join(words)
109
+
110
+ # Add connecting phrases for flow
111
+ if i > 0 and random.random() < 0.4:
112
+ connector = random.choice(self.connecting_phrases)
113
+ sentence = f"{connector}, {sentence.lower()}"
114
+
115
+ # Sometimes start with variety
116
+ elif i > 0 and random.random() < 0.2:
117
+ starter = random.choice(self.sentence_starters)
118
+ sentence = f"{starter} {sentence.lower()}"
119
+
120
+ modified_sentences.append(sentence)
121
+
122
+ return " ".join(modified_sentences)
123
+
124
+ def restructure_sentences(self, text):
125
+ """Modify sentence structures for variety"""
126
+ sentences = self.split_into_sentences(text)
127
+ restructured = []
128
+
129
+ for sentence in sentences:
130
+ words = sentence.split()
131
+
132
+ # For long sentences, sometimes break them up
133
+ if len(words) > 25 and random.random() < 0.5:
134
+ # Find a good break point
135
+ break_words = ['and', 'but', 'which', 'that', 'because', 'since', 'while']
136
+ for i, word in enumerate(words[10:20], 10): # Look in middle section
137
+ if word.lower() in break_words:
138
+ part1 = " ".join(words[:i]) + "."
139
+ part2 = " ".join(words[i+1:])
140
+ if len(part2) > 10: # Only if second part is substantial
141
+ part2 = part2[0].upper() + part2[1:] if part2 else part2
142
+ restructured.extend([part1, part2])
143
+ break
144
+ else:
145
+ restructured.append(sentence)
146
+ else:
147
+ restructured.append(sentence)
148
+
149
+ return " ".join(restructured)
150
+
151
+ def clean_and_format(self, text):
152
+ """Clean up the text formatting"""
153
+ # Remove extra spaces
154
+ text = re.sub(r'\s+', ' ', text)
155
+ text = re.sub(r'\s+([.,!?;:])', r'\1', text)
156
+
157
+ # Fix capitalization
158
+ sentences = self.split_into_sentences(text)
159
+ formatted = []
160
+
161
+ for sentence in sentences:
162
+ sentence = sentence.strip()
163
+ if sentence:
164
+ # Capitalize first letter
165
+ sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
166
+
167
+ # Ensure proper ending
168
+ if not sentence.endswith(('.', '!', '?')):
169
+ sentence += '.'
170
+
171
+ formatted.append(sentence)
172
+
173
+ return " ".join(formatted)
174
+
175
+ def humanize_text(self, text, intensity="medium"):
176
+ """Main humanization function"""
177
+ if not text or len(text.strip()) < 10:
178
+ return "Please enter substantial text to humanize (at least 10 characters)."
179
+
180
+ result = text.strip()
181
+
182
+ try:
183
+ # Apply different levels of humanization
184
+ if intensity.lower() in ["light", "low"]:
185
+ # Just vocabulary changes
186
+ result = self.vary_vocabulary(result)
187
+
188
+ elif intensity.lower() in ["medium", "moderate"]:
189
+ # Vocabulary + natural flow
190
+ result = self.vary_vocabulary(result)
191
+ result = self.add_natural_flow(result)
192
+
193
+ elif intensity.lower() in ["heavy", "high", "maximum"]:
194
+ # All techniques
195
+ result = self.vary_vocabulary(result)
196
+ result = self.add_natural_flow(result)
197
+ result = self.restructure_sentences(result)
198
+
199
+ # Always clean up formatting
200
+ result = self.clean_and_format(result)
201
+
202
+ return result if result and len(result) > 10 else text
203
+
204
+ except Exception as e:
205
+ print(f"Humanization error: {e}")
206
+ return f"Error processing text. Please try again with different input."
207
+
208
+ # Initialize the humanizer
209
+ humanizer = RobustHumanizer()
210
+
211
+ def process_text(input_text, humanization_level):
212
+ """Process the input text"""
213
+ return humanizer.humanize_text(input_text, humanization_level)
214
+
215
+ # Create Gradio interface
216
+ demo = gr.Interface(
217
+ fn=process_text,
218
+ inputs=[
219
+ gr.Textbox(
220
+ lines=12,
221
+ placeholder="Paste your AI-generated or robotic text here...\n\nExample: 'The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets.'",
222
+ label="Input Text",
223
+ info="Enter the text you want to make more natural and human-like"
224
+ ),
225
+ gr.Radio(
226
+ choices=["Light", "Medium", "Heavy"],
227
+ value="Medium",
228
+ label="Humanization Intensity",
229
+ info="Light: Basic vocabulary changes | Medium: + Natural flow | Heavy: + Sentence restructuring"
230
+ )
231
+ ],
232
+ outputs=gr.Textbox(
233
+ label="Humanized Output",
234
+ lines=12,
235
+ show_copy_button=True,
236
+ info="Copy this natural, human-like text"
237
+ ),
238
+ title="🤖➡️👨 Robust AI Text Humanizer",
239
+ description="""
240
+ **Transform robotic AI text into natural, human-like academic writing**
241
+
242
+ This tool uses advanced linguistic techniques to make AI-generated text sound more natural and human-like.
243
+ Perfect for academic papers, research reports, essays, and professional documents.
244
+
245
+ ✅ **No external dependencies** - Always works
246
+ ✅ **Advanced vocabulary variation** - Natural word choices
247
+ ✅ **Sentence flow optimization** - Smooth transitions
248
+ ✅ **Academic tone preservation** - Maintains credibility
249
+ ✅ **Structure diversification** - Varied sentence patterns
250
+ ✅ **Natural connectors** - Academic linking phrases
251
+ """,
252
+ examples=[
253
+ [
254
+ "The implementation of machine learning algorithms demonstrates significant improvements in computational efficiency and accuracy metrics across various benchmark datasets. These results indicate that the optimization of neural network architectures can facilitate enhanced performance in predictive analytics applications.",
255
+ "Medium"
256
+ ],
257
+ [
258
+ "Artificial intelligence technologies are increasingly being utilized across numerous industries to optimize operational processes and generate innovative solutions. The comprehensive analysis of these systems reveals substantial benefits in terms of efficiency and accuracy.",
259
+ "Heavy"
260
+ ],
261
+ [
262
+ "The research methodology encompasses a systematic approach to data collection and analysis, utilizing advanced statistical techniques to ensure robust and reliable results that demonstrate the effectiveness of the proposed framework.",
263
+ "Light"
264
+ ]
265
+ ],
266
+ theme="soft",
267
+ css="""
268
+ .gradio-container {
269
+ max-width: 1200px !important;
270
+ }
271
+ """,
272
+ article="""
273
+ ### 🎯 **How to Use:**
274
+ 1. **Paste your AI-generated text** in the input box
275
+ 2. **Choose intensity level** based on how much change you want
276
+ 3. **Click Submit** and get natural, human-like output
277
+ 4. **Copy the result** and use it in your work
278
+
279
+ ### 💡 **Pro Tips:**
280
+ - Use **Light** for minimal changes while preserving original structure
281
+ - Use **Medium** for balanced humanization with natural flow
282
+ - Use **Heavy** for maximum transformation and sentence variety
283
+ - Always review the output to ensure it maintains your intended meaning
284
+ - For best results, input complete sentences and paragraphs
285
+
286
+ ### ⚖️ **Ethical Usage:**
287
+ This tool is designed to improve writing quality and natural expression.
288
+ Please use responsibly and maintain academic integrity.
289
+ """,
290
+ allow_flagging="never"
291
+ )
292
+
293
+ if __name__ == "__main__":
294
+ demo.launch(
295
+ share=False,
296
+ server_name="127.0.0.1",
297
+ server_port=7862,
298
+ debug=True,
299
+ show_error=True
300
+ )
humanizer_simple.py ADDED
@@ -0,0 +1,249 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
3
+ import torch
4
+ import random
5
+ import re
6
+ import warnings
7
+ warnings.filterwarnings("ignore")
8
+
9
+ class SimpleHumanizer:
10
+ def __init__(self):
11
+ # Load a reliable T5 model for paraphrasing
12
+ try:
13
+ self.model_name = "Vamsi/T5_Paraphrase_Paws"
14
+ self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
15
+ self.model = AutoModelForSeq2SeqLM.from_pretrained(self.model_name)
16
+ print("✅ Model loaded successfully")
17
+ except Exception as e:
18
+ print(f"❌ Error loading model: {e}")
19
+ self.tokenizer = None
20
+ self.model = None
21
+
22
+ def add_variations(self, text):
23
+ """Add simple variations to make text more natural"""
24
+ # Common academic phrase variations
25
+ replacements = {
26
+ "shows that": ["demonstrates that", "indicates that", "reveals that", "suggests that"],
27
+ "results in": ["leads to", "causes", "produces", "generates"],
28
+ "due to": ["because of", "owing to", "as a result of", "on account of"],
29
+ "in order to": ["to", "so as to", "with the aim of", "for the purpose of"],
30
+ "as well as": ["and", "along with", "together with", "in addition to"],
31
+ "therefore": ["thus", "hence", "consequently", "as a result"],
32
+ "however": ["nevertheless", "nonetheless", "on the other hand", "yet"],
33
+ "furthermore": ["moreover", "additionally", "in addition", "what is more"],
34
+ "significant": ["notable", "considerable", "substantial", "important"],
35
+ "important": ["crucial", "vital", "essential", "key"],
36
+ "analyze": ["examine", "investigate", "study", "assess"],
37
+ "demonstrate": ["show", "illustrate", "reveal", "display"],
38
+ "utilize": ["use", "employ", "apply", "implement"]
39
+ }
40
+
41
+ result = text
42
+ for original, alternatives in replacements.items():
43
+ if original in result.lower():
44
+ replacement = random.choice(alternatives)
45
+ # Replace with case matching
46
+ pattern = re.compile(re.escape(original), re.IGNORECASE)
47
+ result = pattern.sub(replacement, result, count=1)
48
+
49
+ return result
50
+
51
+ def vary_sentence_structure(self, text):
52
+ """Simple sentence structure variations"""
53
+ sentences = text.split('.')
54
+ varied = []
55
+
56
+ for sentence in sentences:
57
+ sentence = sentence.strip()
58
+ if not sentence:
59
+ continue
60
+
61
+ # Add some variety to sentence starters
62
+ if random.random() < 0.3:
63
+ starters = ["Notably, ", "Importantly, ", "Significantly, ", "Interestingly, "]
64
+ if not any(sentence.startswith(s.strip()) for s in starters):
65
+ sentence = random.choice(starters) + sentence.lower()
66
+
67
+ varied.append(sentence)
68
+
69
+ return '. '.join(varied) + '.'
70
+
71
+ def paraphrase_text(self, text):
72
+ """Paraphrase using T5 model"""
73
+ if not self.model or not self.tokenizer:
74
+ return text
75
+
76
+ try:
77
+ # Split long text into chunks
78
+ max_length = 400
79
+ if len(text) > max_length:
80
+ sentences = text.split('.')
81
+ chunks = []
82
+ current_chunk = ""
83
+
84
+ for sentence in sentences:
85
+ if len(current_chunk + sentence) < max_length:
86
+ current_chunk += sentence + "."
87
+ else:
88
+ if current_chunk:
89
+ chunks.append(current_chunk.strip())
90
+ current_chunk = sentence + "."
91
+
92
+ if current_chunk:
93
+ chunks.append(current_chunk.strip())
94
+
95
+ paraphrased_chunks = []
96
+ for chunk in chunks:
97
+ para = self._paraphrase_chunk(chunk)
98
+ paraphrased_chunks.append(para)
99
+
100
+ return " ".join(paraphrased_chunks)
101
+ else:
102
+ return self._paraphrase_chunk(text)
103
+
104
+ except Exception as e:
105
+ print(f"Paraphrasing error: {e}")
106
+ return text
107
+
108
+ def _paraphrase_chunk(self, text):
109
+ """Paraphrase a single chunk"""
110
+ try:
111
+ # Prepare input
112
+ input_text = f"paraphrase: {text}"
113
+ input_ids = self.tokenizer.encode(
114
+ input_text,
115
+ return_tensors="pt",
116
+ max_length=512,
117
+ truncation=True
118
+ )
119
+
120
+ # Generate paraphrase
121
+ with torch.no_grad():
122
+ outputs = self.model.generate(
123
+ input_ids=input_ids,
124
+ max_length=min(len(text.split()) + 50, 512),
125
+ num_beams=5,
126
+ num_return_sequences=1,
127
+ temperature=1.3,
128
+ top_k=50,
129
+ top_p=0.95,
130
+ do_sample=True,
131
+ early_stopping=True,
132
+ repetition_penalty=1.2
133
+ )
134
+
135
+ # Decode result
136
+ paraphrased = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
137
+
138
+ # Clean up the result
139
+ paraphrased = paraphrased.strip()
140
+ if paraphrased and len(paraphrased) > 10:
141
+ return paraphrased
142
+ else:
143
+ return text
144
+
145
+ except Exception as e:
146
+ print(f"Chunk paraphrasing error: {e}")
147
+ return text
148
+
149
+ # Initialize humanizer
150
+ humanizer = SimpleHumanizer()
151
+
152
+ def humanize_text(input_text, complexity="Medium"):
153
+ """Main humanization function"""
154
+ if not input_text or not input_text.strip():
155
+ return "Please enter some text to humanize."
156
+
157
+ try:
158
+ # Step 1: Paraphrase the text
159
+ result = humanizer.paraphrase_text(input_text)
160
+
161
+ # Step 2: Add variations based on complexity
162
+ if complexity in ["Medium", "High"]:
163
+ result = humanizer.add_variations(result)
164
+
165
+ if complexity == "High":
166
+ result = humanizer.vary_sentence_structure(result)
167
+
168
+ # Step 3: Clean up formatting
169
+ result = re.sub(r'\s+', ' ', result)
170
+ result = re.sub(r'\s+([.!?,:;])', r'\1', result)
171
+
172
+ # Ensure proper sentence capitalization
173
+ sentences = result.split('. ')
174
+ formatted_sentences = []
175
+ for i, sentence in enumerate(sentences):
176
+ sentence = sentence.strip()
177
+ if sentence:
178
+ # Capitalize first letter
179
+ sentence = sentence[0].upper() + sentence[1:] if len(sentence) > 1 else sentence.upper()
180
+ formatted_sentences.append(sentence)
181
+
182
+ result = '. '.join(formatted_sentences)
183
+
184
+ # Final cleanup
185
+ if not result.endswith('.') and not result.endswith('!') and not result.endswith('?'):
186
+ result += '.'
187
+
188
+ return result
189
+
190
+ except Exception as e:
191
+ print(f"Humanization error: {e}")
192
+ return f"Error processing text: {str(e)}"
193
+
194
+ # Create Gradio interface
195
+ demo = gr.Interface(
196
+ fn=humanize_text,
197
+ inputs=[
198
+ gr.Textbox(
199
+ lines=10,
200
+ placeholder="Paste your AI-generated or robotic text here...",
201
+ label="Input Text",
202
+ info="Enter the text you want to humanize"
203
+ ),
204
+ gr.Radio(
205
+ choices=["Low", "Medium", "High"],
206
+ value="Medium",
207
+ label="Humanization Complexity",
208
+ info="Low: Basic paraphrasing | Medium: + Vocabulary variations | High: + Structure changes"
209
+ )
210
+ ],
211
+ outputs=gr.Textbox(
212
+ label="Humanized Output",
213
+ lines=10,
214
+ show_copy_button=True
215
+ ),
216
+ title="🤖➡️👨 AI Text Humanizer (Simple)",
217
+ description="""
218
+ **Transform robotic AI text into natural, human-like writing**
219
+
220
+ This tool uses advanced paraphrasing techniques to make AI-generated text sound more natural and human-like.
221
+ Perfect for academic papers, essays, reports, and any content that needs to pass AI detection tools.
222
+
223
+ **Features:**
224
+ ✅ Advanced T5-based paraphrasing
225
+ ✅ Vocabulary diversification
226
+ ✅ Sentence structure optimization
227
+ ✅ Academic tone preservation
228
+ ✅ Natural flow enhancement
229
+ """,
230
+ examples=[
231
+ [
232
+ "The implementation of machine learning algorithms in data processing systems demonstrates significant improvements in efficiency and accuracy metrics.",
233
+ "Medium"
234
+ ],
235
+ [
236
+ "Artificial intelligence technologies are increasingly being utilized across various industries to enhance operational capabilities and drive innovation.",
237
+ "High"
238
+ ]
239
+ ],
240
+ theme="soft"
241
+ )
242
+
243
+ if __name__ == "__main__":
244
+ demo.launch(
245
+ share=False,
246
+ server_name="0.0.0.0",
247
+ server_port=7861,
248
+ debug=True
249
+ )
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ gradio==4.44.0
2
+ transformers==4.35.0
3
+ torch==2.1.0
4
+ nltk==3.8.1
5
+ textstat==0.7.3
6
+ numpy==1.24.3
7
+ pandas==2.0.3
research_humanizer_dataset.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ input,target
2
+ The model is good and fast.,The proposed model exhibits strong performance and efficiency.
3
+ We tried some tests and they worked well.,"Several experiments were conducted, all of which demonstrated promising results."
4
+ This system gives better results than old ones.,This system outperforms traditional approaches in terms of accuracy and scalability.
5
+ The algorithm was run on many datasets.,The algorithm was evaluated using a diverse set of benchmark datasets.
6
+ We can say it works great.,These findings suggest the approach is both effective and reliable.
7
+ Our method is simple but it does the job.,Our approach is straightforward yet achieves the intended objectives.
8
+ "There are many problems, but we fixed them.","Several issues were encountered, all of which were systematically resolved."
9
+ The results are okay and show improvement.,The outcomes indicate measurable improvements over baseline methods.
10
+ We used some tools to help with this.,Auxiliary tools were employed to support the development process.
11
+ It shows better accuracy than others.,The approach demonstrates superior accuracy compared to existing methods.