ben-cohen-datadog commited on
Commit
a03a45d
·
verified ·
1 Parent(s): 14e44c6

Add initial draft of model card

Browse files
Files changed (1) hide show
  1. README.md +119 -3
README.md CHANGED
@@ -1,3 +1,119 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ model_id: TOTO-v1-Base
3
+ tags:
4
+ - time-series-forecasting
5
+ - foundation models
6
+ - pretrained models
7
+ - time series foundation models
8
+ - time series
9
+ - time-series
10
+ - transformers
11
+ - forecasting
12
+ - safetensors
13
+ - apache-2.0
14
+ paper: [Link to Paper] # TODO(Anna)
15
+ datasets:
16
+ # - BOOM [Link to BOOM Dataset] # TODO(Anna)
17
+ - GiftEvalPretrain
18
+ - Chronos # TODO(Anna) - is there a tag for this?
19
+ leaderboards:
20
+ - GiftEval (if results are public) #TODO(Anna) check how to do that
21
+ license: apache-2.0 # TODO(Anna) - check if renders correctly
22
+ # TODO(Anna) - check if rendered correctly when uploaded to Hub
23
+ # TODO(Anna) - export to RTF or anything to make it reviewable in GoogleDocs
24
+ ---
25
+ # {{ model_id | default("TOTO", true) }}
26
+
27
+ <!-- TODO: Update this section to align with the new abstract of the paper once finalized. -->
28
+
29
+ TOTO, Time Series Optimized Transformer for Observability, is a time-series foundation model designed for multi-variate time series forecasting with a focus on observability metrics. TOTO leverages new architectural innovations and training recipes making it able to efficiently handle high-dimensional, sparse, and non-stationary time series that are hallmarks of the observability domain.
30
+
31
+ Trained on one trillion time series data points, including 43% of in-house real-life observability data, TOTO demonstrates state-of-the-art zero-shot performance on observability-specific tasks as well as a top ranking performance on the multi-domain time series forecasting GiftEval benchmark.
32
+
33
+ ---
34
+
35
+
36
+ ![model architecture](figures/architecture.png)
37
+
38
+ Figure 1: Overview of the {{ model_id | default("TOTO", true) }} model architecture: Multivariate input time series of `L` steps are scaled using causal patch-based instance normalization, transformed into patch embeddings, and passed through a decoder-only transformer stack. The transformed features are unembedded and passed through a Student-T mixture model (Section: Probabilistic Prediction) which generates probabilistic next-patch predictions. **B.** The patch embedding takes as input a time series of `M` channels by `L` time steps. It divides the time dimension into patches of size `P` and projects these linearly into an embedding space of latent dimension `D`. This results in an output of size `M × (L/P) × D` which is fed to the transformer decoder. **C.** The transformer stack contains `F` identical segments. Each segment contains `N` time-wise transformer blocks followed by one channel-wise block.
39
+
40
+ ## Key Features - TODO develop those or remove them
41
+ <!-- TODO: Update this section to align with the introduction in the paper once finalized. -->
42
+ - **Multi-Variate Time Series Support:** using **Proportional Factorized Space-Time Attention** that efficiently groups multivariate features, reducing computational overhead while maintaining high accuracy.
43
+ - **Tailored for Observability:** Observability metrics are machine-generated time series collected in near-real-time to monitor and optimize the performance and reliability of modern infrastructure and applications.
44
+ - **Decoder-Only Transformer Architecture**: supporting variable prediction horizons and lengths.
45
+ - **Point and Probabilisitc Forecasting**
46
+ - **Causal Patch-Wise Instance Normalization:** Improves forecasting performance and training stability in decoder-only models.
47
+ - **Student-T Mixture Model Prediction Head:** probabilistic forecasts modeling the complex, varied distributions typical of observability data.
48
+ - **Extensive Pretraining on Large-Scale Data:** Pretrained on 5–10× more data than leading time series foundation models, using a combination of synthetic, public, and observability-specific datasets.
49
+ - **High-Dimensional Time Series Support:** Efficiently handles datasets with a large number of variables.
50
+
51
+ ### Resources - TODO
52
+
53
+ - **Paper:** "[Link to arxiv paper]"
54
+ - **Repository:** "[Link to github repo]"
55
+ - **Blog Post:** "[Link to Datadog BlogPost]"
56
+ - **BOOM:** "[Link to BOOM's Dataset card]"
57
+
58
+ ## Usage
59
+ ### Installation
60
+
61
+ ```bash
62
+ # TODO(Anna) - update these with correct instructions
63
+ # Clone the repository
64
+ git clone https://github.com/DataDogFutureOpenSource/TOTO.git
65
+
66
+ # Navigate to the project directory
67
+ cd foundation-models-research/toto
68
+
69
+ # Install the required dependencies
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ ### Running an Inference
74
+
75
+ For a step-by-step guide on running inferences with TOTO, please refer to our [GitHub repository's inference tutorial notebook](https://github.com/DataDogFutureOpenSource/TOTO/XXX/notebooks/inference_tutorial.ipynb).
76
+
77
+ ### Usage Recommendations - TODO remove or develop
78
+ <!-- TODO: Share best practices for maybe optimal context length, prediction length?. -->
79
+
80
+ ## Training Details - TODO keep or remove?
81
+
82
+ ### PreTraining Data
83
+
84
+ | Dataset Name | Link to Dataset Card |
85
+ |--------------------|------------------------------------------|
86
+ | GiftEval Pretrain | [Link to GiftEval Pretrain Dataset Card] |
87
+ | Chronos | [Link to Chronos Dataset Card] |
88
+ | Synthetic | [Link to Synthetic Dataset Card] |
89
+ | Observability | [Link to Observability Dataset Card] |
90
+
91
+ For more details about the pretraining data and preprocessing steps, please refer to the [paper](#TODO-Link-to-Paper) or the [GitHub repository](https://github.com/DataDogFutureOpenSource/TOTO).
92
+
93
+ ### Training Hyperparameters - TODO keep or remove?
94
+
95
+ The training hyperparameters for TOTO are defined in the YAML configuration file located in our GitHub repository. You can find the configuration file [here](https://github.com/DataDogFutureOpenSource/TOTO/blob/main/configs/toto_config.yaml).
96
+
97
+ ## Results - TODO keep or remove?
98
+
99
+ | Dataset Name | Link to Dataset Card | CRPS | MASE |
100
+ |--------------|------------------------------------------|-------|-------|
101
+ | BOOM | [Link to BOOM Dataset Card] | TBD | TBD |
102
+ | GiftEval | [Link to GiftEval Dataset Card] | TBD | TBD |
103
+
104
+ For more detailed information, please refer to the results section in our [paper](#TODO-Link-to-Paper).
105
+
106
+
107
+ ## Citation - TODO
108
+
109
+ If you use TOTO in your research or applications, please cite us using the following:
110
+
111
+ ```
112
+ @article{TOTO-v1-Base-2025,
113
+ title={TOTO: Time Series Optimized Transformer for Observability},
114
+ author={Your Author Names Here},
115
+ journal={arXiv preprint arXiv:XXXX.XXXXX},
116
+ year={2025},
117
+ url={https://arxiv.org/abs/XXXX.XXXXX}
118
+ }
119
+ ```