File size: 9,304 Bytes
cd18dd0
d85d83b
 
 
 
8f3e4ca
cd18dd0
851133a
cd18dd0
 
 
 
d85d83b
 
 
3a2569c
d85d83b
 
3a2569c
 
 
 
 
 
 
 
 
 
 
 
e391132
3a2569c
d85d83b
 
3a2569c
 
efa4c13
e391132
 
d85d83b
 
 
efa4c13
d85d83b
e391132
 
 
efa4c13
e391132
 
 
efa4c13
e391132
 
 
aafacb6
e391132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d85d83b
 
efa4c13
d85d83b
 
efa4c13
d85d83b
 
efa4c13
d85d83b
 
efa4c13
d85d83b
 
efa4c13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: matching_series
tags:
- evaluate
- metric
description: "Matching-based time-series generation metric"
sdk: gradio
sdk_version: 3.50
app_file: app.py
pinned: false
---

# Metric Card for matching_series

## Metric Description
Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (MSE) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation.

## How to Use
At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.

```python
>>> num_generation = 100
>>> num_reference = 10
>>> seq_len = 100
>>> num_features = 10
>>> references = np.random.rand(num_reference, seq_len, num_features)
>>> predictions = np.random.rand(num_generation, seq_len, num_features)
>>> metric = evaluate.load("bowdbeg/matching_series")
>>> results = metric.compute(references=references, predictions=predictions, batch_size=1000)
>>> print(results)
{'precision_mse': 0.15642462680824154, 'f1_mse': 0.15423970232736145, 'recall_mse': 0.15211497466247828, 'index_mse': 0.1650527529752939, 'precision_mse_features': [0.14161461272391063, 0.13959801451122986, 0.13494790079336152, 0.13812467072775822, 0.13502155933085397, 0.13773603530687478, 0.13782869677371534, 0.13880373566781345, 0.1347356979110729, 0.1380613227954152], 'f1_mse_features': [0.13200523240237663, 0.1321561699583367, 0.12686344486378406, 0.12979789457435542, 0.12768556637792927, 0.1316950291866994, 0.12937893459231917, 0.13052145628415104, 0.12571029554640592, 0.12686388502130683], 'recall_mse_features': [0.12361708937664843, 0.1254676048318782, 0.11969288602958734, 0.12241798787954035, 0.12110565263179066, 0.12616166677071738, 0.12190537193383513, 0.1231719120998892, 0.1178181328089802, 0.11734651764610313], 'index_mse_features': [0.16728853331521837, 0.1673468681819004, 0.16940025907048203, 0.16828093040638223, 0.17486439883284577, 0.15779474562305962, 0.16255301663470148, 0.16224400164732194, 0.1531092505944622, 0.167645525446565], 'macro_precision_mse': 0.1376472246542006, 'macro_recall_mse': 0.121870482200897, 'macro_f1_mse': 0.12926779088076645, 'macro_index_mse': 0.1650527529752939, 'matching_precision': 0.09, 'matching_recall': 1.0, 'matching_f1': 0.1651376146788991, 'matching_precision_features': [0.1, 0.1, 0.1, 0.1, 0.09, 0.09, 0.1, 0.1, 0.1, 0.1], 'matching_recall_features': [1.0, 1.0, 1.0, 0.7, 0.9, 1.0, 0.9, 1.0, 0.9, 0.8], 'matching_f1_features': [0.18181818181818182, 0.18181818181818182, 0.18181818181818182, 0.175, 0.16363636363636364, 0.1651376146788991, 0.18, 0.18181818181818182, 0.18, 0.17777777777777778], 'macro_matching_precision': 0.098, 'macro_matching_recall': 0.92, 'macro_matching_f1': 0.1768824483365768, 'cuc': 0.1364, 'coverages': [0.10000000000000002, 0.16666666666666666, 0.3, 0.5333333333333333, 0.9], 'macro_cuc': 0.13874, 'macro_coverages': [0.10000000000000002, 0.18000000000000002, 0.31, 0.48, 0.98], 'cuc_features': [0.1428, 0.13580000000000003, 0.15250000000000002, 0.14579999999999999, 0.12990000000000002, 0.1364, 0.1459, 0.12330000000000002, 0.13580000000000003, 0.13920000000000002], 'coverages_features': [[0.10000000000000002, 0.16666666666666666, 0.3666666666666667, 0.5, 1.0], [0.10000000000000002, 0.16666666666666666, 0.26666666666666666, 0.43333333333333335, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3666666666666667, 0.6, 1.0], [0.10000000000000002, 0.16666666666666666, 0.3333333333333333, 0.5333333333333333, 1.0], [0.10000000000000002, 0.20000000000000004, 0.26666666666666666, 0.4666666666666666, 0.9], [0.10000000000000002, 0.16666666666666666, 0.30000000000000004, 0.5333333333333333, 0.9], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.5333333333333333, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.3, 1.0], [0.10000000000000002, 0.16666666666666666, 0.26666666666666666, 0.4333333333333333, 1.0], [0.10000000000000002, 0.16666666666666666, 0.30000000000000004, 0.4666666666666666, 1.0]]}
```

### Inputs
- **predictions**: (list of list of list of float or numpy.ndarray): The generated time-series. The shape of the array should be `(num_generation, seq_len, num_features)`.
- **references**: (list of list of list of float or numpy.ndarray): The original time-series. The shape of the array should be `(num_reference, seq_len, num_features)`.
- **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.
- **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3.
- **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$.

### Output Values

Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$.

- **precision_mse**: (float): Average of the MSE between the generated instance and the reference instance with the lowest MSE. Intuitively, this is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{MSE}(p_i, r_j)$.
- **recall_mse**: (float): Average of the MSE between the reference instance and the  with the lowest MSE. Intuitively, this is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{MSE}(p_i, r_j)$.
- **f1_mse**: (float): Harmonic mean of the precision_mse and recall_mse. This is similar to F1-score in classification.
- **index_mse**: (float): Average of the MSE between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{MSE}(p_i, r_i)$.
- **precision_mse_features**: (list of float): precision_mse computed individually for each feature.
- **recall_mse_features**: (list of float): recall_mse computed individually for each feature.
- **f1_mse_features**: (list of float): f1_mse computed individually for each feature.
- **index_mse_features**: (list of float): index_mse computed individually for each feature.
- **macro_precision_mse**: (float): Average of the precision_mse_features.
- **macro_recall_mse**: (float): Average of the recall_mse_features.
- **macro_f1_mse**: (float): Average of the f1_mse_features.
- **macro_index_mse**: (float): Average of the index_mse_features.
- **matching_precision**: (float): Precision of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{MSE}(p_i, r_j)\} | }{m}$.
- **matching_recall**: (float): Recall of the matching instances. In the equation, $\frac{ | \{j | \min_{j} \mathrm{MSE}(p_i, r_j)\} | }{n}$.
- **matching_f1**: (float): F1-score of the matching instances.
- **matching_precision_features**: (list of float): matching_precision computed individually for each feature.
- **matching_recall_features**: (list of float): matching_recall computed individually for each feature.
- **matching_f1_features**: (list of float): matching_f1 computed individually for each feature.
- **macro_matching_precision**: (float): Average of the matching_precision_features.
- **macro_matching_recall**: (float): Average of the matching_recall_features.
- **macro_matching_f1**: (float): Average of the matching_f1_features.
- **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{ | \{ j | \min_{j} \mathrm{MSE}(p_i, r_j) \text{where}~p_i \in \mathrm{sample}(P, \mathrm{n\_sample}) \}  | }{m} \text{for}~\mathrm{n\_sample} \in \mathrm{cuc\_n\_samples} ]$.
- **cuc**: (float): Coverage of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{MSE}(p_i, r_j) < \mathrm{threshold}\} | }{n}$.
- **coverages_features**: (list of list of float): coverages computed individually for each feature.
- **cuc_features**: (list of float): cuc computed individually for each feature.
- **macro_coverages**: (list of float): Average of the coverages_features.
- **macro_cuc**: (float): Average of the cuc_features.

#### Values from Popular Papers
<!-- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.* -->

### Examples
<!-- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.* -->

## Limitations and Bias
This metric is based on the assumption that the generated time-series should match the original time-series. This may not be the case in some scenarios. The metric may not be suitable for evaluating time-series generation models that are not required to match the original time-series.

## Citation
<!-- *Cite the source where this metric was introduced.* -->

## Further References
<!-- *Add any useful further references.* -->