File size: 1,883 Bytes
7f24916 9ba7d3b 6959cbd 9ba7d3b 6959cbd 9ba7d3b 6959cbd 9ba7d3b 6959cbd 7f24916 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
license: mit
language:
- en
metrics:
- wer
tags:
- g2p
- grapheme
- phoneme
- text2text
- text-generation-inference
---
# Grapheme to Phoneme (G2P) with Stress
This project provides a Grapheme to Phoneme (G2P) conversion tool that first checks the CMU Pronouncing Dictionary for phoneme translations. If a word is not found in the dictionary, it utilizes two Transformer-based models to generate phoneme translations and add stress markers. The output is in ARPAbet format, and the model can also convert graphemes into phoneme integer indices.
## Features
1. **CMU Pronouncing Dictionary Integration**: First checks the CMU dictionary for phoneme translations.
2. **Transformer-Based Conversion**:
- **Phoneme Generation**: The first Transformer model converts graphemes into phonemes.
- **Stress Addition**: The second Transformer model adds stress markers to the phonemes.
3. **ARPAbet Output**: Outputs phonemes in ARPAbet format.
4. **Phoneme Integer Indices**: Converts graphemes to phoneme integer indices.
5. A BPE tokenizer was used, which led to a better translation quality
## Installation
1. Clone the repository:
```sh
git clone https://github.com/NikiPshg/Grapheme-to-Phoneme-G2P-with-Stress.git
cd Grapheme-to-Phoneme-G2P-with-Stress
```
2. Install the required dependencies:
```sh
pip install -r requiremenst.txt
```
### Example
```python
from G2P_lexicon import g2p_en_lexicon
# Initialize the G2P converter
g2p = g2p_en_lexicon()
# Convert a word to phonemes
text = "text, numbers, and some strange symbols !№;% 21"
phonemes = g2p(text, with_stress=False)
['T', 'EH', 'K', 'S', 'T', ' ', ',', ' ',
'N', 'AH', 'M', 'B', 'ER', 'Z',' ', ',', ' ',
'AE', 'N', 'D', ' ', 'S', 'AH', 'M', ' ',
'S', 'T', 'R', 'EY', 'N', 'JH',' ',
'S', 'IH', 'M', 'B', 'AH', 'L', 'Z',' ',
'T', 'W', 'EH', 'N', 'IY', ' ', 'W', 'AH', 'N'] |