ManMenGon commited on
Commit
75d5a4f
·
verified ·
1 Parent(s): b1124c9

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -62
README.md CHANGED
@@ -9,86 +9,87 @@ app_file: app.py
9
  pinned: true
10
  ---
11
 
12
-
13
- markdown
14
- Copiar
15
- Editar
16
- emoji: 🧬
17
- colorFrom: indigo
18
- colorTo: blue
19
- sdk_version: "3.50.2"
20
- app_file: app.py
21
- pinned: true
22
-
23
  # 🧬 GeneForgeLang: Symbolic-to-Sequence & Cross-Modality Biomolecular Design Toolkit
24
 
25
- **GeneForgeLang** is a symbolic, generative language that allows scientists to design and interpret DNA, RNA, and protein sequences with unified syntax and AI support.
 
26
 
27
- This toolkit enables:
28
- - Generation of realistic proteins from symbolic design
29
- - Translation of symbolic phrases across DNA RNA ↔ Protein
30
- - Structured, human-readable and AI-trainable syntax
31
- - Semantic equivalence across molecular layers
 
32
 
 
33
 
34
- ## 🚀 Features
35
 
36
  | Module | Description |
37
  |----------------------------|-------------|
38
- | 🧠 Phrase → Protein | Generate realistic protein sequences from symbolic phrases |
39
- | 🔁 Transcode Across Molecules | Translate GeneForgeLang phrases between DNA, RNA, and Protein |
40
- | 📚 Universal Grammar | One structure to rule them all: motifs, domains, PTMs, splicing |
41
- | 🧬 Compact Notation | Prefixes, accents, and structural markers for efficiency |
42
- | 🧠 AI-Ready Output | Compatible with transformer-based models like ProtGPT2 |
 
 
43
 
 
44
 
45
  ## 🧪 Example Input Phrases
46
 
47
- ### DNA → RNA
48
-
49
- ~d:Prom-Exon1-Intr1-Exon2 :r:Cap5'-Ex1-Ex2-UTR3'
50
-
51
- shell
52
- Copiar
53
- Editar
54
-
55
- ### RNA → Protein
56
-
57
- :r:Ex1-Ex2 ↓ ^p:Dom(Kin)-Mot(NLS)
58
 
59
- yaml
60
- Copiar
61
- Editar
62
 
 
63
 
64
- ## ▶️ How to Use
 
 
 
 
65
 
66
- 1. Clone this repo
67
  2. Install dependencies:
68
  ```bash
69
  pip install -r requirements.txt
70
- Launch the interface:
71
 
72
- bash
73
- Copiar
74
- Editar
75
  python app.py
76
- Navigate to:
77
-
78
- cpp
79
- Copiar
80
- Editar
81
- http://127.0.0.1:7860
82
- 📁 Files
83
-
84
- File Description
85
- app.py Full Gradio app (all tabs)
86
- semillas.json Seed dictionary
87
- transcoder.py Script for DNA/RNA/protein conversion
88
- requirements.txt Python dependencies
89
- README.md This file
90
- 🧠 Developed by
91
- Fundación de Neurociencias
92
- MIT License
93
-
94
- Join us in shaping symbolic bio-AI.
 
 
 
 
 
 
 
 
 
 
9
  pinned: true
10
  ---
11
 
 
 
 
 
 
 
 
 
 
 
 
12
  # 🧬 GeneForgeLang: Symbolic-to-Sequence & Cross-Modality Biomolecular Design Toolkit
13
 
14
+ **GeneForgeLang** is a symbolic and generative language for cross-modality biomolecular design.
15
+ It enables unified AI-powered workflows to **design, interpret and translate DNA, RNA, and protein sequences** using a compact, human-readable grammar.
16
 
17
+ This project provides:
18
+ - **A symbolic language** spanning all biological layers (genomic, transcriptomic, proteomic)
19
+ - **Realistic sequence generation** via AI models like ProtGPT2
20
+ - **Scientific interpretation** of symbolic phrases in natural language
21
+ - **Cross-modality transcoders** (e.g., DNA → RNA → Protein and vice versa)
22
+ - **An interactive Gradio-based UI** for easy use and integration
23
 
24
+ ---
25
 
26
+ ## 🚀 Key Features
27
 
28
  | Module | Description |
29
  |----------------------------|-------------|
30
+ | 🧠 Phrase → Sequence | Generate DNA, RNA, or protein from symbolic design |
31
+ | 🔁 Transcode Phrases | Translate GeneForgeLang phrases across modalities |
32
+ | 📖 Phrase → Description | Generate scientific English descriptions of symbolic inputs |
33
+ | 🔄 Sequence → Phrase | Infer functional phrases from real sequences |
34
+ | 🧬 Mutate Sequence (WIP) | Generate variants for symbolic seeds (under development) |
35
+ | 📦 Export to FASTA (WIP) | Save generated sequences to .fasta (to be implemented) |
36
+ | 📊 Analyze Sequence (WIP) | Visualize amino acid composition or base content |
37
 
38
+ ---
39
 
40
  ## 🧪 Example Input Phrases
41
 
42
+ ```text
43
+ ~d:Prom[TATA]-Exon1-Intr1-Exon2
44
+
45
+ :r:Cap5'-Ex1-Ex2-UTR3'
46
+
47
+ ^p:Dom(Kin)-Mot(NLS)*AcK@147
48
+ ```
 
 
 
 
49
 
50
+ ---
 
 
51
 
52
+ ## ▶️ How to Use Locally
53
 
54
+ 1. Clone this repo:
55
+ ```bash
56
+ git clone https://github.com/Fundacion-de-Neurociencias/GeneForgeLang.git
57
+ cd GeneForgeLang
58
+ ```
59
 
 
60
  2. Install dependencies:
61
  ```bash
62
  pip install -r requirements.txt
63
+ ```
64
 
65
+ 3. Launch the interface:
66
+ ```bash
 
67
  python app.py
68
+ ```
69
+
70
+ 4. Navigate to:
71
+ [http://127.0.0.1:7860](http://127.0.0.1:7860)
72
+
73
+ ---
74
+
75
+ ## 📁 File Structure
76
+
77
+ | File | Description |
78
+ |------------------------------|-------------|
79
+ | `app.py` | Full Gradio app (4 tabs) |
80
+ | `semillas.json` | Phrase-to-seed dictionary |
81
+ | `generate_from_phrase.py` | Symbolic-to-sequence generator |
82
+ | `describe_phrase.py` | Phrase interpreter to scientific English |
83
+ | `translate_to_geneforgelang.py` | Sequence-to-symbolic phrase translation |
84
+ | `transcoder.py` | Modality switcher (DNA ↔ RNA ↔ Protein) |
85
+ | `requirements.txt` | Python dependencies |
86
+ | `README.md` | This file |
87
+
88
+ ---
89
+
90
+ ## 🧠 Developed by
91
+
92
+ **Fundación de Neurociencias**
93
+ Licensed under the MIT License
94
+
95
+ > Join us in shaping the future of symbolic bio-AI. Contributions welcome!