File size: 1,883 Bytes
7f24916
 
 
 
 
 
 
 
 
 
 
 
 
9ba7d3b
 
 
 
 
 
 
 
 
 
 
 
6959cbd
9ba7d3b
 
 
 
 
6959cbd
 
9ba7d3b
 
 
 
 
 
 
 
 
 
 
 
 
 
6959cbd
9ba7d3b
 
6959cbd
 
 
 
 
 
7f24916
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: mit
language:
- en
metrics:
- wer
tags:
- g2p
- grapheme
- phoneme
- text2text
- text-generation-inference
---
# Grapheme to Phoneme (G2P) with Stress

This project provides a Grapheme to Phoneme (G2P) conversion tool that first checks the CMU Pronouncing Dictionary for phoneme translations. If a word is not found in the dictionary, it utilizes two Transformer-based models to generate phoneme translations and add stress markers. The output is in ARPAbet format, and the model can also convert graphemes into phoneme integer indices.

## Features

1. **CMU Pronouncing Dictionary Integration**: First checks the CMU dictionary for phoneme translations.
2. **Transformer-Based Conversion**:
    - **Phoneme Generation**: The first Transformer model converts graphemes into phonemes.
    - **Stress Addition**: The second Transformer model adds stress markers to the phonemes.
3. **ARPAbet Output**: Outputs phonemes in ARPAbet format.
4. **Phoneme Integer Indices**: Converts graphemes to phoneme integer indices.
5. A BPE tokenizer was used, which led to a better translation quality

## Installation

1. Clone the repository:
    ```sh
    git clone https://github.com/NikiPshg/Grapheme-to-Phoneme-G2P-with-Stress.git
    cd Grapheme-to-Phoneme-G2P-with-Stress
    ```

2. Install the required dependencies:
    ```sh
    pip install -r requiremenst.txt
    ```


### Example

```python
from G2P_lexicon import g2p_en_lexicon

# Initialize the G2P converter
g2p = g2p_en_lexicon()
# Convert a word to phonemes
text = "text, numbers, and some strange symbols !№;% 21"
phonemes = g2p(text, with_stress=False)
['T', 'EH', 'K', 'S', 'T', ' ', ',', ' ',
'N', 'AH', 'M', 'B', 'ER', 'Z',' ', ',', ' ', 
'AE', 'N', 'D', ' ', 'S', 'AH', 'M', ' ',
'S', 'T', 'R', 'EY', 'N', 'JH',' ', 
'S', 'IH', 'M', 'B', 'AH', 'L', 'Z',' ', 
'T', 'W', 'EH', 'N', 'IY', ' ', 'W', 'AH', 'N']