ManMenGon commited on
Commit
b1124c9
·
verified ·
1 Parent(s): 1291f55

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -106
README.md CHANGED
@@ -1,106 +1,94 @@
1
- Gracias por compartir el `README.md` actual. Está bastante bien estructurado, pero podemos mejorarlo para que refleje **todos los módulos reales incluidos** y su utilidad práctica, y además reforzar el interés del proyecto de cara a visitantes y posibles contribuidores.
2
-
3
- A continuación te presento una versión revisada y ampliada, lista para reemplazar el contenido actual:
4
-
5
- ---
6
-
7
- ```markdown
8
- ---
9
- title: GeneForgeLang
10
- emoji: 🧬
11
- colorFrom: indigo
12
- colorTo: blue
13
- sdk: gradio
14
- sdk_version: "3.50.2"
15
- app_file: app.py
16
- pinned: true
17
- ---
18
-
19
- # 🧬 GeneForgeLang: Symbolic-to-Sequence & Cross-Modality Biomolecular Design Toolkit
20
-
21
- **GeneForgeLang** is a symbolic and generative language for cross-modality biomolecular design.
22
- It enables unified AI-powered workflows to **design, interpret and translate DNA, RNA, and protein sequences** using a compact, human-readable grammar.
23
-
24
- This project provides:
25
- - **A symbolic language** spanning all biological layers (genomic, transcriptomic, proteomic)
26
- - **Realistic sequence generation** via AI models like ProtGPT2
27
- - **Scientific interpretation** of symbolic phrases in natural language
28
- - **Cross-modality transcoders** (e.g., DNA RNA → Protein and vice versa)
29
- - **An interactive Gradio-based UI** for easy use and integration
30
-
31
- ---
32
-
33
- ## 🚀 Key Features
34
-
35
- | Module | Description |
36
- |----------------------------|-------------|
37
- | 🧠 Phrase → Sequence | Generate DNA, RNA, or protein from symbolic design |
38
- | 🔁 Transcode Phrases | Translate GeneForgeLang phrases across modalities |
39
- | 📖 Phrase Description | Generate scientific English descriptions of symbolic inputs |
40
- | 🔄 Sequence → Phrase | Infer functional phrases from real sequences |
41
- | 🧬 Mutate Sequence (WIP) | Generate variants for symbolic seeds (under development) |
42
- | 📦 Export to FASTA (WIP) | Save generated sequences to .fasta (to be implemented) |
43
- | 📊 Analyze Sequence (WIP) | Visualize amino acid composition or base content |
44
-
45
- ---
46
-
47
- ## 🧪 Example Input Phrases
48
-
49
- ```text
50
- ~d:Prom[TATA]-Exon1-Intr1-Exon2
51
-
52
- :r:Cap5'-Ex1-Ex2-UTR3'
53
-
54
- ^p:Dom(Kin)-Mot(NLS)*AcK@147
55
- ```
56
-
57
- ---
58
-
59
- ## ▶️ How to Use Locally
60
-
61
- 1. Clone this repo:
62
- ```bash
63
- git clone https://github.com/Fundacion-de-Neurociencias/GeneForgeLang.git
64
- cd GeneForgeLang
65
- ```
66
-
67
- 2. Install dependencies:
68
- ```bash
69
- pip install -r requirements.txt
70
- ```
71
-
72
- 3. Launch the interface:
73
- ```bash
74
- python app.py
75
- ```
76
-
77
- 4. Navigate to:
78
- [http://127.0.0.1:7860](http://127.0.0.1:7860)
79
-
80
- ---
81
-
82
- ## 📁 File Structure
83
-
84
- | File | Description |
85
- |------------------------------|-------------|
86
- | `app.py` | Full Gradio app (4 tabs) |
87
- | `semillas.json` | Phrase-to-seed dictionary |
88
- | `generate_from_phrase.py` | Symbolic-to-sequence generator |
89
- | `describe_phrase.py` | Phrase interpreter to scientific English |
90
- | `translate_to_geneforgelang.py` | Sequence-to-symbolic phrase translation |
91
- | `transcoder.py` | Modality switcher (DNA ↔ RNA ↔ Protein) |
92
- | `requirements.txt` | Python dependencies |
93
- | `README.md` | This file |
94
-
95
- ---
96
-
97
- ## 🧠 Developed by
98
-
99
- **Fundación de Neurociencias**
100
- Licensed under the MIT License
101
-
102
- > Join us in shaping the future of symbolic bio-AI. Contributions welcome!
103
-
104
- ```
105
-
106
- ---
 
1
+ ---
2
+ title: GeneForgeLang
3
+ emoji: 🧬
4
+ colorFrom: indigo
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: "3.50.2"
8
+ app_file: app.py
9
+ pinned: true
10
+ ---
11
+
12
+
13
+ markdown
14
+ Copiar
15
+ Editar
16
+ emoji: 🧬
17
+ colorFrom: indigo
18
+ colorTo: blue
19
+ sdk_version: "3.50.2"
20
+ app_file: app.py
21
+ pinned: true
22
+
23
+ # 🧬 GeneForgeLang: Symbolic-to-Sequence & Cross-Modality Biomolecular Design Toolkit
24
+
25
+ **GeneForgeLang** is a symbolic, generative language that allows scientists to design and interpret DNA, RNA, and protein sequences with unified syntax and AI support.
26
+
27
+ This toolkit enables:
28
+ - Generation of realistic proteins from symbolic design
29
+ - Translation of symbolic phrases across DNA RNA ↔ Protein
30
+ - Structured, human-readable and AI-trainable syntax
31
+ - Semantic equivalence across molecular layers
32
+
33
+
34
+ ## 🚀 Features
35
+
36
+ | Module | Description |
37
+ |----------------------------|-------------|
38
+ | 🧠 Phrase → Protein | Generate realistic protein sequences from symbolic phrases |
39
+ | 🔁 Transcode Across Molecules | Translate GeneForgeLang phrases between DNA, RNA, and Protein |
40
+ | 📚 Universal Grammar | One structure to rule them all: motifs, domains, PTMs, splicing |
41
+ | 🧬 Compact Notation | Prefixes, accents, and structural markers for efficiency |
42
+ | 🧠 AI-Ready Output | Compatible with transformer-based models like ProtGPT2 |
43
+
44
+
45
+ ## 🧪 Example Input Phrases
46
+
47
+ ### DNA RNA
48
+
49
+ ~d:Prom-Exon1-Intr1-Exon2 ↓ :r:Cap5'-Ex1-Ex2-UTR3'
50
+
51
+ shell
52
+ Copiar
53
+ Editar
54
+
55
+ ### RNA → Protein
56
+
57
+ :r:Ex1-Ex2 ↓ ^p:Dom(Kin)-Mot(NLS)
58
+
59
+ yaml
60
+ Copiar
61
+ Editar
62
+
63
+
64
+ ## ▶️ How to Use
65
+
66
+ 1. Clone this repo
67
+ 2. Install dependencies:
68
+ ```bash
69
+ pip install -r requirements.txt
70
+ Launch the interface:
71
+
72
+ bash
73
+ Copiar
74
+ Editar
75
+ python app.py
76
+ Navigate to:
77
+
78
+ cpp
79
+ Copiar
80
+ Editar
81
+ http://127.0.0.1:7860
82
+ 📁 Files
83
+
84
+ File Description
85
+ app.py Full Gradio app (all tabs)
86
+ semillas.json Seed dictionary
87
+ transcoder.py Script for DNA/RNA/protein conversion
88
+ requirements.txt Python dependencies
89
+ README.md This file
90
+ 🧠 Developed by
91
+ Fundación de Neurociencias
92
+ MIT License
93
+
94
+ Join us in shaping symbolic bio-AI.