File size: 5,205 Bytes
24f73db
 
fd36f13
24f73db
 
2442d6c
24f73db
 
 
fd36f13
24f73db
 
fd36f13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: MT564AITraining
emoji: πŸš€
colorFrom: blue
colorTo: gray
sdk: docker
pinned: false
license: apache-2.0
short_description: MT564Model training

---


# SWIFT-MT564-Assistant
  Added MT564 TinyLlama training interface
βœ“ Created comprehensive training UI with file upload
βœ“ Integrated horoscope harvesting with MT564 training
βœ“ Both systems running in unified application
βœ“ Navigation links connect both functionalities

The application now provides both data harvesting for horoscopes AND MT564 TinyLlama training with a complete UI. You can access the MT564 training interface through the navigation menu.

## Project Overview

This project creates an AI-powered documentation assistant for financial messaging standards, specifically focused on the SWIFT MT564 message type. It combines web scraping, data processing, TinyLlama fine-tuning, and a user-friendly interface to provide an intelligent assistant for financial messaging professionals.

## Key Components

### 1. Data Collection & Processing

- **Web Scraper**: Extracts structured data from [ISO20022 SWIFT MT564 documentation](https://www.iso20022.org/15022/uhb/finmt564.htm)
- **PDF Parser**: Extracts text and structural information from uploaded SWIFT documentation PDFs
- **Data Formatter**: Converts scraped and parsed data into training examples for the model

### 2. Model Training Pipeline

- **TinyLlama Integration**: Implementation of TinyLlama, a smaller and more efficient LLM
- **Fine-tuning Scripts**: Specialized scripts for training on SWIFT message documentation
- **Evaluation Tools**: Methods to test the model's understanding of SWIFT message formats

### 3. User Interface

- **Web Application**: Flask-based interface for interacting with the model
- **PDF Upload**: Functionality to upload and process SWIFT documentation PDFs
- **Question-Answering System**: Interactive chat interface for asking questions about MT564 and related formats

## Technical Architecture

```
SWIFT-MT564-Assistant/
β”œβ”€β”€ scrapers/                 # Web scraping components
β”‚   β”œβ”€β”€ iso20022_scraper.py   # Scraper for ISO20022 website
β”‚   β”œβ”€β”€ pdf_parser.py         # PDF extraction utilities
β”‚   └── data_processor.py     # Converts raw data to training format
β”‚
β”œβ”€β”€ model/                    # ML model components
β”‚   β”œβ”€β”€ tinyllama_trainer.py  # Fine-tuning implementation
β”‚   β”œβ”€β”€ data_formatter.py     # Prepares data for training
β”‚   └── evaluator.py          # Tests model performance
β”‚
β”œβ”€β”€ webapp/                   # Web application
β”‚   β”œβ”€β”€ app.py                # Flask application
β”‚   β”œβ”€β”€ templates/            # HTML templates
β”‚   β”‚   β”œβ”€β”€ index.html        # Main page
β”‚   β”‚   └── result.html       # Results display
β”‚   └── static/               # CSS, JS, and other static files
β”‚
β”œβ”€β”€ data/                     # Data storage
β”‚   β”œβ”€β”€ raw/                  # Raw scraped data
β”‚   β”œβ”€β”€ processed/            # Processed training data
β”‚   └── uploaded/             # User-uploaded PDFs
β”‚
β”œβ”€β”€ train_mt564_model.py      # Script to train the model
β”œβ”€β”€ requirements.txt          # Project dependencies
└── README.md                 # Project documentation
```

## How It Works

1. **Data Collection Phase**:
   - The ISO20022 scraper extracts structured data from the SWIFT MT564 documentation
   - The data is processed and converted into a training dataset of instruction-response pairs

2. **Model Training Phase**:
   - TinyLlama is fine-tuned on the specialized SWIFT message format data
   - The model learns the structure, fields, and usage of MT564 messages

3. **User Interaction Phase**:
   - Users upload SWIFT documentation PDFs through the web interface
   - The system extracts and processes the PDF content
   - Users ask questions about SWIFT messages and receive accurate, contextual responses

## Installation & Setup

### Prerequisites

- Python 3.8+
- PyTorch
- Transformers library
- Flask
- PDF processing libraries

### Installation Steps

```bash
# Clone the repository
git clone <repository-url>
cd SWIFT-MT564-Assistant

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download and prepare the model
python prepare_mt564_data.py

# Run the web application
python main.py
```

## Usage

### Training the Model

```bash
# Run the scraper to collect data
python scrapers/iso20022_scraper.py

# Process the data
python scrapers/data_processor.py

# Train the model
python train_mt564_model.py
```

### Using the Web Interface

1. Start the Flask application: `python main.py`
2. Open a browser and navigate to: `http://localhost:5000`
3. Upload a SWIFT MT564 documentation PDF
4. Ask questions about the SWIFT message format

## Future Enhancements

- Expand coverage to additional SWIFT message types (MT565, MT566, etc.)
- Implement multi-document reasoning across different SWIFT standards
- Add support for ISO20022 MX message formats
- Develop specialized modules for message validation and conversion