--- license: mit language: - en metrics: - wer tags: - g2p - grapheme - phoneme - text2text - text-generation-inference --- # Grapheme to Phoneme (G2P) with Stress This project provides a Grapheme to Phoneme (G2P) conversion tool that first checks the CMU Pronouncing Dictionary for phoneme translations. If a word is not found in the dictionary, it utilizes two Transformer-based models to generate phoneme translations and add stress markers. The output is in ARPAbet format, and the model can also convert graphemes into phoneme integer indices. ## Features 1. **CMU Pronouncing Dictionary Integration**: First checks the CMU dictionary for phoneme translations. 2. **Transformer-Based Conversion**: - **Phoneme Generation**: The first Transformer model converts graphemes into phonemes. - **Stress Addition**: The second Transformer model adds stress markers to the phonemes. 3. **ARPAbet Output**: Outputs phonemes in ARPAbet format. 4. **Phoneme Integer Indices**: Converts graphemes to phoneme integer indices. 5. A BPE tokenizer was used, which led to a better translation quality ## Installation 1. Clone the repository: ```sh git clone https://github.com/NikiPshg/Grapheme-to-Phoneme-G2P-with-Stress.git cd Grapheme-to-Phoneme-G2P-with-Stress ``` 2. Install the required dependencies: ```sh pip install -r requiremenst.txt ``` ### Example ```python from G2P_lexicon import g2p_en_lexicon # Initialize the G2P converter g2p = g2p_en_lexicon() # Convert a word to phonemes text = "text, numbers, and some strange symbols !№;% 21" phonemes = g2p(text, with_stress=False) ['T', 'EH', 'K', 'S', 'T', ' ', ',', ' ', 'N', 'AH', 'M', 'B', 'ER', 'Z',' ', ',', ' ', 'AE', 'N', 'D', ' ', 'S', 'AH', 'M', ' ', 'S', 'T', 'R', 'EY', 'N', 'JH',' ', 'S', 'IH', 'M', 'B', 'AH', 'L', 'Z',' ', 'T', 'W', 'EH', 'N', 'IY', ' ', 'W', 'AH', 'N']