Datadog
/

Toto-Open-Base-1.0

+---
+model_id: TOTO-v1-Base
+tags:
+  - time-series-forecasting
+  - foundation models
+  - pretrained models
+  - time series foundation models
+  - time series
+  - time-series
+  - transformers
+  - forecasting
+  - safetensors
+  - apache-2.0
+paper: [Link to Paper] # TODO(Anna)
+datasets:
+  # - BOOM [Link to BOOM Dataset] # TODO(Anna)
+  - GiftEvalPretrain
+  - Chronos # TODO(Anna) - is there a tag for this?
+leaderboards:
+  - GiftEval (if results are public) #TODO(Anna) check how to do that
+license: apache-2.0 # TODO(Anna) - check if renders correctly
+# TODO(Anna) - check if rendered correctly when uploaded to Hub
+# TODO(Anna) - export to RTF or anything to make it reviewable in GoogleDocs
+---
+# {{ model_id | default("TOTO", true) }}
+<!-- TODO: Update this section to align with the new abstract of the paper once finalized. -->
+TOTO, Time Series Optimized Transformer for Observability, is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. TOTO leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.
+Trained on one trillion time series data points, including 43% of in-house real-life observability data, TOTO demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance on the multi-domain time series forecasting GiftEval benchmark.
+---
+![model architecture](figures/architecture.png)
+Figure 1: Overview of the {{ model_id | default("TOTO", true) }} model architecture: Multivariate input time series of `L` steps are scaled using causal patch-based instance normalization, transformed into patch embeddings, and passed through a decoder-only transformer stack. The transformed features are unembedded and passed through a Student-T mixture model (Section: Probabilistic Prediction) which generates probabilistic next-patch predictions. **B.** The patch embedding takes as input a time series of `M` channels by `L` time steps. It divides the time dimension into patches of size `P` and projects these linearly into an embedding space of latent dimension `D`. This results in an output of size  `M × (L/P) × D` which is fed to the transformer decoder. **C.** The transformer stack contains `F` identical segments. Each segment contains `N` time-wise transformer blocks followed by one channel-wise block.
+## Key Features - TODO develop those or remove them
+<!-- TODO: Update this section to align with the introduction in the paper once finalized. -->
+- **Multi-Variate Time Series Support:** using **Proportional Factorized Space-Time Attention** that efficiently groups multivariate features, reducing computational overhead while maintaining high accuracy.
+- **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
+- **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and lengths.
+- **Point and Probabilisitc Forecasting**
+- **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
+- **Student-T Mixture Model Prediction Head:** probabilistic forecasts modeling the complex, varied distributions typical of observability data.
+- **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
+- **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
+### Resources - TODO
+- **Paper:** "[Link to arxiv paper]"
+- **Repository:** "[Link to github repo]"
+- **Blog Post:** "[Link to Datadog BlogPost]"
+- **BOOM:** "[Link to BOOM's Dataset card]"
+## Usage
+### Installation
+```bash
+# TODO(Anna) - update these with correct instructions
+# Clone the repository
+git clone https://github.com/DataDogFutureOpenSource/TOTO.git
+# Navigate to the project directory
+cd foundation-models-research/toto
+# Install the required dependencies
+pip install -r requirements.txt
+```
+### Running an Inference
+For a step-by-step guide on running inferences with TOTO, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDogFutureOpenSource/TOTO/XXX/notebooks/inference_tutorial.ipynb).
+### Usage Recommendations - TODO remove or develop
+<!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
+## Training Details - TODO keep or remove?
+### PreTraining Data
+| Dataset Name       | Link to Dataset Card                     |
+|--------------------|------------------------------------------|
+| GiftEval Pretrain  | [Link to GiftEval Pretrain Dataset Card] |
+| Chronos            | [Link to Chronos Dataset Card]          |
+| Synthetic          | [Link to Synthetic Dataset Card]        |
+| Observability      | [Link to Observability Dataset Card]    |
+For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDogFutureOpenSource/TOTO).
+### Training Hyperparameters - TODO keep or remove?
+The training hyperparameters for TOTO are defined in the YAML configuration file located in our GitHub repository. You can find the configuration file [here](https://github.com/DataDogFutureOpenSource/TOTO/blob/main/configs/toto_config.yaml).
+## Results - TODO keep or remove?
+| Dataset Name | Link to Dataset Card                     | CRPS  | MASE  |
+|--------------|------------------------------------------|-------|-------|
+| BOOM         | [Link to BOOM Dataset Card]             | TBD   | TBD   |
+| GiftEval     | [Link to GiftEval Dataset Card]         | TBD   | TBD   |
+For more detailed information, please refer to the results section in our [paper](#TODO-Link-to-Paper).
+## Citation - TODO
+If you use TOTO in your research or applications, please cite us using the following:
+```
+@article{TOTO-v1-Base-2025,
+  title={TOTO: Time Series Optimized Transformer for Observability},
+  author={Your Author Names Here},
+  journal={arXiv preprint arXiv:XXXX.XXXXX},
+  year={2025},
+  url={https://arxiv.org/abs/XXXX.XXXXX}
+}
+```