Allanatrix commited on
Commit
8ab7441
·
verified ·
1 Parent(s): bc13040

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +184 -30
README.md CHANGED
@@ -20,35 +20,189 @@ tags:
20
  - scientific-reasoning
21
  ---
22
 
23
- # Model Card for Nexa-Qwen-sci-7B
24
 
25
  ## Model Details
26
- **Model Description:**
27
- Nexa-Qwen-sci-7B is a fine-tuned variant of Qwen/Qwen3-1.7B, optimized for scientific research generation tasks such as hypothesis generation, abstract writing, and methodology completion. Fine-tuning was performed using PEFT with LoRA in 4-bit quantized mode via bitsandbytes, with Qwen3's thinking mode enabled for enhanced reasoning.
28
-
29
- **Developed by:** Allan (Independent Scientific Intelligence Architect)
30
- **Shared by:** Allan[](https://huggingface.co/allan-wandia)
31
- **Model type:** Decoder-only transformer (causal language model)
32
- **Language(s):** English (scientific domain-specific vocabulary)
33
- **License:** Apache 2.0
34
- **Fine-tuned from:** Qwen/Qwen3-1.7B
35
- **Repository:** https://huggingface.co/allan-wandia/nexa-qwen-sci-7b
36
-
37
- ## Training Details
38
- **Training Data:**
39
- - Size: 100 million tokens
40
- - Source: Curated scientific literature (Bio, Physics, QST, Astro)
41
-
42
- **Hyperparameters:**
43
- - Sequence length: 32768
44
- - Batch size: 1
45
- - Gradient Accumulation Steps: 64
46
- - Effective Batch Size: 64
47
- - Learning rate: 2e-05
48
- - Epochs: 2
49
- - LoRA: Enabled (PEFT)
50
- - Quantization: 4-bit
51
- - Sampling Parameters: Temperature=0.6, TopP=0.95, TopK=20, MinP=0, Presence Penalty=1.5
52
-
53
- **Results:**
54
- Robust performance in scientific prose tasks with enhanced reasoning capabilities via Qwen3's thinking mode.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - scientific-reasoning
21
  ---
22
 
23
+ # Model Card for `Nexa-Qwen-sci-7B`
24
 
25
  ## Model Details
26
+
27
+ **Model Description**:
28
+ `Nexa-Qwen-sci-7B` is a fine-tuned variant of the open-weight `Qwen/Qwen3-1.7B` model, optimized for scientific research generation tasks such as hypothesis generation, abstract writing, and methodology completion. Fine-tuning was performed using the PEFT (Parameter-Efficient Fine-Tuning) library with LoRA in 4-bit quantized mode using the `bitsandbytes` backend. The model leverages Qwen3’s thinking mode (`enable_thinking=True`) for enhanced reasoning capabilities, making it suitable for complex scientific tasks.
29
+
30
+ This model is part of the **Nexa Scientific Intelligence (Psi)** series, developed for scalable, automated scientific reasoning and domain-specific text generation.
31
+
32
+ ---
33
+
34
+ **Developed by**: Allan (Independent Scientific Intelligence Architect)
35
+ **Funded by**: Self-funded
36
+ **Shared by**: Allan (https://huggingface.co/Allanatrix)
37
+ **Model type**: Decoder-only transformer (causal language model)
38
+ **Language(s)**: English (scientific domain-specific vocabulary)
39
+ **License**: Apache 2.0 (inherits from base model)
40
+ **Fine-tuned from**: `Qwen/Qwen3-1.7B`
41
+ **Repository**: https://huggingface.co/allan-wandia/nexa-qwen-sci-7b
42
+ **Demo**: Coming soon via Hugging Face Spaces or Lambda inference endpoint.
43
+
44
+ ---
45
+
46
+ ## Uses
47
+
48
+ ### Direct Use
49
+ - Scientific hypothesis generation
50
+ - Abstract and method section synthesis
51
+ - Domain-specific research writing
52
+ - Semantic completion of structured research prompts
53
+
54
+ ### Downstream Use
55
+ - Fine-tuning or distillation into smaller expert models
56
+ - Foundation for test-time reasoning agents
57
+ - Seed model for bootstrapping larger synthetic scientific corpora
58
+
59
+ ### Out-of-Scope Use
60
+ - General conversation or chat use cases
61
+ - Non-English scientific domains
62
+ - Legal, financial, or clinical advice generation
63
+
64
+ ---
65
+
66
+ ## Bias, Risks, and Limitations
67
+ While the model performs well on structured scientific input, it inherits biases from its base model (`Qwen3-1.7B`) and fine-tuning dataset. Results should be evaluated by domain experts before use in high-stakes settings. It may hallucinate plausible but incorrect facts, especially in low-data areas. The thinking mode may increase latency for simpler tasks but improves reasoning quality.
68
+
69
+ ---
70
+
71
+ ## Recommendations
72
+ Users should:
73
+ - Validate critical outputs against trusted scientific literature
74
+ - Avoid deploying in clinical or regulatory environments without further evaluation
75
+ - Consider additional domain fine-tuning for niche fields
76
+ - Use recommended sampling parameters (Temperature=0.6, TopP=0.95, TopK=20, MinP=0, Presence Penalty=1.5) to avoid endless repetitions in thinking mode
77
+
78
+ ---
79
+
80
+ ## How to Get Started with the Model
81
+
82
+ ```python
83
+ from transformers import AutoTokenizer, AutoModelForCausalLM
84
+
85
+ model_name = "allan-wandia/nexa-qwen-sci-7b"
86
+
87
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
88
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")
89
+
90
+ prompt = "Generate a novel hypothesis in quantum materials research:"
91
+ messages = [{"role": "user", "content": prompt}]
92
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
93
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
94
+ outputs = model.generate(
95
+ **inputs,
96
+ max_new_tokens=32768,
97
+ temperature=0.6,
98
+ top_p=0.95,
99
+ top_k=20,
100
+ min_p=0,
101
+ presence_penalty=1.5
102
+ )
103
+
104
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
105
+ ```
106
+
107
+
108
+ # Training Details
109
+ Training Data
110
+ Size: 100 million tokens sampled from a 500M+ token corpus
111
+ Source: Curated scientific literature, abstracts, methodologies, and domain-labeled corpora (Bio, Physics, QST, Astro
112
+ Labeling: Token-level labels auto-generated via Qwen3 tokenizer with chat template (enable_thinking=True)
113
+
114
+ # Preprocessing
115
+ Tokenization with sequence truncation to 32,768 tokens (Qwen3’s context length)
116
+ Formatted using Qwen3’s chat template with thinking mode enabled
117
+ Labeled and batched using CPU; inference dispatched to GPU asynchronously
118
+ Training Hyperparameters
119
+ Base model: Qwen/Qwen3-1.7B
120
+ Sequence length: 32768
121
+ Batch size: 1 (with gradient accumulation)
122
+ Gradient Accumulation Steps: 64
123
+ Effective Batch Size: 64
124
+ Learning rate: 2e-5
125
+ Epochs: 2
126
+ LoRA: Enabled (PEFT with RSLoRA)
127
+ Quantization: 4-bit via bitsandbytes
128
+ Optimizer: 8-bit AdamW
129
+ Framework: Transformers (≥4.51.0) + PEFT + Accelerate + TRL
130
+
131
+ Sampling Parameters: Temperature=0.6, TopP=0.95, TopK=20, MinP=0, Presence Penalty=1.5 (applied during inference)
132
+ Evaluation
133
+ Testing Data
134
+
135
+ Synthetic scientific prompts across domains (Physics, biology, and Materials Science)
136
+
137
+ Evaluation Factors
138
+ Hypothesis novelty (entropy score)
139
+ Internal scientific consistency (domain-specific rubric)
140
+ Reasoning quality (assessed via thinking mode outputs)
141
+
142
+ Results
143
+ The model performs robustly in hypothesis generation and scientific prose tasks, with enhanced reasoning capabilities due to Qwen3’s thinking mode. Coherence is high, and novelty depends on prompt diversity. It is well-suited as a distiller or inference agent for synthetic scientific corpora generation.
144
+
145
+ # Environmental Impact
146
+
147
+ Component
148
+ Value
149
+ Hardware Type: 2× NVIDIA T4 GPUs
150
+ Hours used: ~7.5
151
+ Cloud Provider
152
+ Kaggle (Google Cloud)
153
+ Compute Region
154
+ US
155
+ Carbon Emitted
156
+ Estimate pending (likely 1 kg COkg CO2)
157
+
158
+ # Technical Specifications
159
+
160
+ Model Architecture
161
+ Transformer decoder (Qwen3-1.7B architecture: 28 layers, 16 attention heads for Q, 8 for KV)
162
+ LoRA adapters applied to all linear layers with RSLoRA
163
+
164
+
165
+
166
+ Quantized with bytes to 4-bit for memory efficiency
167
+
168
+ Compute Infrastructure
169
+ CPU: Intel i5 8th Gen vPro (batch preprocessing)
170
+ GPU: 2× NVIDIA T4 (CUDA 12.1)
171
+ Software Stack
172
+ PEFT 0.12.0
173
+ Transformers 4.51.0
174
+ Accelerate
175
+ TRL
176
+ Torch 2.x
177
+
178
+ Citation
179
+ BibTeX:
180
+
181
+ @misc{nexa-qwen-sci-7b,
182
+ title = {Nexa Qwen Sci 7B},
183
+ author = {Allan Wandia},
184
+ year = {2025},
185
+ howpublished = {\url{https://huggingface.co/allan-wandia/nexa-qwen-sci-7b}},
186
+ note = {Fine-tuned model for scientific generation tasks with Qwen3 thinking mode}
187
+ }
188
+
189
+
190
+
191
+ # Model Card Contact
192
+ For questions, contact Allan via Hugging Face or at
193
194
+
195
+
196
+ # Model Card Authors
197
+ Allan Wandia (Independent ML Engineer and Systems Architect)
198
+
199
+
200
+ # Glossary
201
+ LoRA: Low-Rank Adaptation
202
+ PEFT: Parameter-Efficient Fine-Tuning
203
+ Entropy Score: Metric used to estimate novelty/variation
204
+ Safe Tensors: Secure, fast format for model weights
205
+ Thinking Mode: Qwen3’s feature for enhanced reasoning, enabled via enable_thinking=True
206
+
207
+ Links
208
+ Github Repo and notebook: https://github.com/DarkStarStrix/Nexa_Auto