|  | --- | 
					
						
						|  | license: apache-2.0 | 
					
						
						|  | datasets: | 
					
						
						|  | - pico-lm/pretokenized-dolma | 
					
						
						|  | language: | 
					
						
						|  | - en | 
					
						
						|  | metrics: | 
					
						
						|  | - pico-lm/perplexity | 
					
						
						|  | pipeline_tag: text-generation | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # Pico Decoder Large | 
					
						
						|  |  | 
					
						
						|  | **pico-decoder-large** is the largest model (570M) in the current `pico-decoder` suite. It is a full-scale research model designed for in-depth interpretability studies of transformer learning. Trained with [`pico-train`](https://github.com/pico-lm) and fully compatible with [`pico-analyze`](https://github.com/pico-lm), it offers rich checkpointing and analytical insight into large-scale LM behavior. | 
					
						
						|  |  | 
					
						
						|  | ## π§ Model Details | 
					
						
						|  |  | 
					
						
						|  | | Field               | Value                              | | 
					
						
						|  | |---------------------|------------------------------------| | 
					
						
						|  | | **Architecture**     | Decoder-only transformer (LLaMA-style) | | 
					
						
						|  | | **Parameters**       | 570M                              | | 
					
						
						|  | | **Layers**           | 12                                | | 
					
						
						|  | | **Hidden Size**      | 1536                              | | 
					
						
						|  | | **Feed Forward Size**| 6144                              | | 
					
						
						|  | | **Attention Heads**  | 12                                | | 
					
						
						|  | | **Key/Value Heads**  | 4                                | | 
					
						
						|  |  | 
					
						
						|  | ## π Training | 
					
						
						|  |  | 
					
						
						|  | - **Dataset**: [`pretokenized-dolma`](https://github.com/pico-lm) | 
					
						
						|  | - **Training steps**: 200,000 | 
					
						
						|  | - **Batch size**: 1024 | 
					
						
						|  | - **Sequence length**: 2048 | 
					
						
						|  | - **Optimizer**: AdamW | 
					
						
						|  | - **Learning rate schedule**: Linear decay with warmup | 
					
						
						|  | - **Compute**: 16 A100-SXM4-80GB GPUs | 
					
						
						|  |  | 
					
						
						|  | ## π Evaluation and Analysis | 
					
						
						|  |  | 
					
						
						|  | This model supports fine-grained analysis using [pico-analyze](https://github.com/pico-lm). This tool enables researchers to understand how learning unfolds over training, even at very small scales. | 
					
						
						|  |  | 
					
						
						|  | We also evaluate perplexity of the model on the [pico-paloma-tinsy](https://huggingface.co/datasets/pico-lm/pretokenized-paloma-tinsy) dataset. | 
					
						
						|  |  | 
					
						
						|  | ## π Citation | 
					
						
						|  |  | 
					
						
						|  | ```bibtex | 
					
						
						|  | @software{pico2025, | 
					
						
						|  | author = {Diehl Martinez, Richard}, | 
					
						
						|  | title = {Pico: A Lightweight Framework for Studying Language Model Learning Dynamics}, | 
					
						
						|  | year = {2025}, | 
					
						
						|  | url = {https://github.com/pico-lm} | 
					
						
						|  | } | 
					
						
						|  |  |