File size: 3,171 Bytes
33178bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# πŸ“š Document Q&A System

A powerful document question-answering system built with LlamaIndex and Gradio. Upload your documents and ask questions about them using state-of-the-art AI models.

## Features

πŸ” **Smart Document Processing**: Automatically processes various document formats (PDF, TXT, DOCX, MD, CSV, JSON)

πŸ€– **Multiple AI Models**: Choose from GPT-4o, Claude 3.5 Sonnet, Llama 3.1, Mistral, and more

πŸ“Š **Performance Monitoring**: Track response times and query statistics  

🎯 **Source Attribution**: See which document sections were used to generate answers

βš™οΈ **Customizable Settings**: Adjust temperature, token limits, and retrieval parameters

πŸ”’ **Secure API Key Management**: Use environment variables or direct input

## How to Use

### 1. Upload Documents
- Go to the "Upload Documents" tab
- Select your files (PDF, TXT, DOCX, MD, CSV, JSON)
- Click "Process Documents" to create the searchable index

### 2. Configure Settings
- Add your OpenRouter API key (or set as HF Space secret)
- Choose your preferred AI model
- Adjust parameters like temperature and max tokens

### 3. Ask Questions
- Enter your question in the "Ask Questions" tab
- Click "Ask Question" to get AI-powered answers
- View sources and performance metrics

## API Key Setup

You can provide your OpenRouter API key in two ways:

1. **Direct Input**: Enter it in the "API Key" field in the interface
2. **Environment Variable**: Set `OPENROUTER_API_KEY` as a Hugging Face Space secret

Get your API key from [OpenRouter](https://openrouter.ai/)

## Best Practices for Questions

- 🎯 **Be specific**: "What does the author say about climate change?" vs "Tell me about climate"
- πŸ“š **Ask about concepts**: "What is the main methodology discussed?"
- πŸ” **Use comparative questions**: "How do different studies approach this topic?"
- πŸ“Š **Request analysis**: "What are the key findings presented?"
- πŸ›οΈ **Ask about methodology**: "What research methods are used?"

## Available Models

- **GPT-4o**: Best overall performance, most accurate
- **GPT-4o Mini**: Faster, cost-effective option
- **Claude 3.5 Sonnet**: Excellent reasoning and analysis
- **Claude 3 Haiku**: Fast and efficient
- **Llama 3.1 70B/8B**: Open source, strong performance
- **Mistral Large**: Strong multilingual capabilities
- **Gemini Pro**: Google's advanced model

## Technical Details

Built with:
- **LlamaIndex**: Document indexing and retrieval
- **Gradio**: Web interface
- **OpenRouter**: Multi-model API access
- **HuggingFace Embeddings**: Text vectorization
- **BGE-small-en-v1.5**: Efficient embedding model

## Performance

- Vector-based semantic search for accurate retrieval
- Cached indexing for fast subsequent queries
- Configurable chunk sizes and overlap for optimal results
- Real-time performance monitoring

## Development

To run locally:

```bash
git clone <your-repo>
cd document-qa-system
pip install -r requirements.txt
python app.py
```

## License

This project is open source and available under the MIT License.

## Support

For issues or questions, please check the Help tab in the application or create an issue in the repository.