arrash commited on
Commit
f5c5980
·
1 Parent(s): 03ecde4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -45
README.md CHANGED
@@ -25,8 +25,10 @@ SOFTWARE.
25
  # Background
26
 
27
  ```text
28
- The models provided here were created using open source modeling techniques provided in https://github.com/taoshidev/time-series-prediction-subnet (TSPS).
29
- They were achieved using the runnable/miner_training.py, and tested against existing models and dummy models in runnable/miner_testing.py
 
 
30
  ```
31
 
32
  # Build Strategy
@@ -37,21 +39,30 @@ This section outlines the strategy used to build the models.
37
 
38
  ## Understanding Dataset Used
39
  ```text
40
- The dataset used to build the models can be generated using the runnable/generate_historical_data.py. A lookback period between June 2022 and July 2023 on the 5m interval was used to
41
- train the model. Through analysis, the reason this dataset was used is because historical data beyond June 2022 provides strongly trending price movement or data movement that is from a
42
- period where Bitcoin's market cap was too small to be relevant to where Bitcoin is now. Therefore, using more recent data was used which correlates to the current market
43
- cap and macroeconomic conditions where its uncertain we'll continue to get highly trending Bitcoin data.
44
-
45
- Testing data was used between June 2023 and Nov 2023 to determine performance of the models. This was tested using the runnable/miner_testing.py file with a separately
46
- generated test dataset from runnable/generate_historical_data.py.
 
 
 
 
 
 
 
47
  ```
48
 
49
  ## Understanding Model Creation
50
  ```text
51
- As of now, the TSPS infrastructure only provides close, high, low, and volume. It also provides financial indicators such as RSI, MACD, and Bollinger Bands but they were not
52
- used for the purposes of training these models.
 
53
 
54
- The models were derived using a variety of windows and iterations through the June 2022 to June 2023 dataset. The strategy to derive the model was the following:
 
55
 
56
  base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
57
  .set_neurons([[1024, 0]]) \
@@ -61,34 +72,45 @@ base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
61
  .set_model_dir(f'mining_models/model1.h5')
62
  base_mining_model.train(prep_dataset, epochs=25)
63
 
64
- where an LSTM model was created by using a few or no stacked layers. Most of the v4 models are actually not stacked as they performed better not being stacked for the most part.
65
- This could very likely change as more feature inputs are added (this is being worked on as part of the open source infra in TSPS). The window size of 100 helped best predict
66
- the outcome, derived in mining_objects/base_mining_model.py
 
 
67
  ```
68
 
69
  ## Understanding Training Decisions
70
  ```text
71
- Training the model used the previous 601 rows of data as an input. This is because 500 rows were used to batch, and we are looking to predict 100 rows into the future (the challenge
72
- presented in the Time Series Prediction Subnet). Measures were taken to ensure all data was trained on in the training data.
 
 
73
 
74
- Each set of 601 rows was trained on 25 times, inside another loop which iterated on the entirety of the dataset from 6/22 to 6/23 50 times. This provided the model the ability
75
- to get granular with details yet not overfit to any single set of rows at once. Therefore, a multi-layered looping infrastructure was used to derive the models.
 
 
76
 
77
  for x in range(50):
78
- for i in range(25):
79
- train_model()
80
  ```
81
 
82
  ## Strategy to Predict
83
  ```text
84
- The strategy to predict 100 closes of data into the future was to use a 1 step methodology of predicting 1 step at 100 intervals into the future and connect the information by
85
- generating a line from the last close to the prediction 100 closes into the future. By doing so, the model could learn to predict a single step rather than all 100 where loss
86
- could continue to increase with each misstep.
 
 
87
  ```
88
 
89
  # Model V5
90
  ```text
91
- Recommendations on how to perform better than V4 and what Model V5 will look like are outlined below:
 
 
 
92
 
93
  1. Concentrate on more difficult moves
94
  2. Get more granular data (1m)
@@ -97,37 +119,47 @@ Recommendations on how to perform better than V4 and what Model V5 will look lik
97
 
98
  -- Concentrate on more difficult moves
99
 
100
- The Time Series Prediction Subnet will reward models that are capable of predicting more "difficult" movements in the market more than those that are less difficult. Therefore,
101
- taking a strategy to train your model on larger movements or bigger magnitude movements would be a good consideration. Some additional details on how difficulty is calculated
102
- will be released soon but it is a combination of the magnitude of the movement with the std dev of the movement in the predicted interval.
 
 
 
103
 
104
  -- Get more granular data (1m)
105
 
106
- With these larger magnitude movements, a strategy to get more granular with the data would be recommended. Using 1m data to train rather than 5m would help the models better
107
- predict information.
 
108
 
109
  -- Get more data sources
110
 
111
- Beyond using financial market indicators like RSI, MACD, and Bollinger Bands the TSPS open source infra will gather information for miners to help train.
 
112
 
113
- The TSPS infrastructure will be adding data scrapers and using those data scrapers automatically gather information for you. The following pieces of information will be gathered & accessible
114
- through the open source infra:
 
115
 
116
- Bitcoin open interest
117
- Bitcoin OHLCV data
118
- Bitcoin funding rate
119
- DXY OHLCV data
120
- Gold OHLCV data
121
- S&P 500 OHLCV data
122
- Bitcoin dominance
123
- Historical news data (sentiment analysis)
124
 
125
- Using this information will provide models with information they can use to better predict prices as markets correlate in movement and Bitcoin responds to other markets.
 
126
 
127
  -- Use more predicted steps
128
 
129
- Rather than only predicting a single step at the 100th predicted close in the future predict more steps. This can be achieved by training multiple models, for example
130
- 10 models each at 10 closes into the future (10, 20, 30, 40, 50, 60, 70, 80, 90, 100), or by using a multi-step model with 10 steps.
131
- Both will achieve more granularity when it comes to predictions and therefore can achieve a much greater RMSE score.
 
 
 
132
  ```
133
 
 
25
  # Background
26
 
27
  ```text
28
+ The models provided here were created using open source modeling techniques
29
+ provided in https://github.com/taoshidev/time-series-prediction-subnet (TSPS).
30
+ They were achieved using the runnable/miner_training.py, and tested against
31
+ existing models and dummy models in runnable/miner_testing.py.
32
  ```
33
 
34
  # Build Strategy
 
39
 
40
  ## Understanding Dataset Used
41
  ```text
42
+ The dataset used to build the models can be generated using the
43
+ runnable/generate_historical_data.py. A lookback period between June 2022 and
44
+ July 2023 on the 5m interval was used to train the model. Through analysis, the
45
+ reason this dataset was used is because historical data beyond June 2022 provides
46
+ strongly trending price movement or data movement that is from a period where
47
+ Bitcoin's market cap was too small to be relevant to where Bitcoin is now.
48
+
49
+ Therefore, using more recent data was used which correlates to the current market
50
+ cap and macroeconomic conditions where its uncertain we'll continue to get highly
51
+ trending Bitcoin data.
52
+
53
+ Testing data was used between June 2023 and Nov 2023 to determine performance of
54
+ the models. This was tested using the runnable/miner_testing.py file with a
55
+ separately generated test dataset from runnable/generate_historical_data.py.
56
  ```
57
 
58
  ## Understanding Model Creation
59
  ```text
60
+ As of now, the TSPS infrastructure only provides close, high, low, and volume. It
61
+ also provides financial indicators such as RSI, MACD, and Bollinger Bands but they
62
+ were not used for the purposes of training these models.
63
 
64
+ The models were derived using a variety of windows and iterations through the June
65
+ 2022 to June 2023 dataset. The strategy to derive the model was the following:
66
 
67
  base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
68
  .set_neurons([[1024, 0]]) \
 
72
  .set_model_dir(f'mining_models/model1.h5')
73
  base_mining_model.train(prep_dataset, epochs=25)
74
 
75
+ where an LSTM model was created by using a few or no stacked layers. Most of the
76
+ v4 models are actually not stacked as they performed better not being stacked for
77
+ the most part. This could very likely change as more feature inputs are added (this
78
+ is being worked on as part of the open source infra in TSPS). The window size of
79
+ 100 helped best predict the outcome, derived in mining_objects/base_mining_model.py
80
  ```
81
 
82
  ## Understanding Training Decisions
83
  ```text
84
+ Training the model used the previous 601 rows of data as an input. This is because
85
+ 500 rows were used to batch, and we are looking to predict 100 rows into the future
86
+ (the challenge presented in the Time Series Prediction Subnet). Measures were taken
87
+ to ensure all data was trained on in the training data.
88
 
89
+ Each set of 601 rows was trained on 25 times, inside another loop which iterated on
90
+ the entirety of the dataset from 6/22 to 6/23 50 times. This provided the model the
91
+ ability to get granular with details yet not overfit to any single set of rows at
92
+ once. Therefore, a multi-layered looping infrastructure was used to derive the models.
93
 
94
  for x in range(50):
95
+ for i in range(25):
96
+ train_model()
97
  ```
98
 
99
  ## Strategy to Predict
100
  ```text
101
+ The strategy to predict 100 closes of data into the future was to use a 1 step
102
+ methodology of predicting 1 step at 100 intervals into the future and connect the
103
+ information by generating a line from the last close to the prediction 100 closes
104
+ into the future. By doing so, the model could learn to predict a single step rather
105
+ than all 100 where loss could continue to increase with each misstep.
106
  ```
107
 
108
  # Model V5
109
  ```text
110
+ Here's the text spaced out for readability in a README file:
111
+
112
+ Recommendations on how to perform better than V4 and what Model V5 will look like
113
+ are outlined below:
114
 
115
  1. Concentrate on more difficult moves
116
  2. Get more granular data (1m)
 
119
 
120
  -- Concentrate on more difficult moves
121
 
122
+ The Time Series Prediction Subnet will reward models that are capable of predicting
123
+ more "difficult" movements in the market more than those that are less difficult.
124
+ Therefore, taking a strategy to train your model on larger movements or bigger
125
+ magnitude movements would be a good consideration. Some additional details on how
126
+ difficulty is calculated will be released soon but it is a combination of the
127
+ magnitude of the movement with the std dev of the movement in the predicted interval.
128
 
129
  -- Get more granular data (1m)
130
 
131
+ With these larger magnitude movements, a strategy to get more granular with the data
132
+ would be recommended. Using 1m data to train rather than 5m would help the models
133
+ better predict information.
134
 
135
  -- Get more data sources
136
 
137
+ Beyond using financial market indicators like RSI, MACD, and Bollinger Bands, the
138
+ TSPS open source infra will gather information for miners to help train.
139
 
140
+ The TSPS infrastructure will be adding data scrapers and using those data scrapers
141
+ to automatically gather information for you. The following pieces of information will
142
+ be gathered & accessible through the open source infra:
143
 
144
+ - Bitcoin open interest
145
+ - Bitcoin OHLCV data
146
+ - Bitcoin funding rate
147
+ - DXY OHLCV data
148
+ - Gold OHLCV data
149
+ - S&P 500 OHLCV data
150
+ - Bitcoin dominance
151
+ - Historical news data (sentiment analysis)
152
 
153
+ Using this information will provide models with information they can use to better
154
+ predict prices as markets correlate in movement and Bitcoin responds to other markets.
155
 
156
  -- Use more predicted steps
157
 
158
+ Rather than only predicting a single step at the 100th predicted close in the future,
159
+ predict more steps. This can be achieved by training multiple models, for example,
160
+ 10 models each at 10 closes into the future (10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
161
+ or by using a multi-step model with 10 steps. Both will achieve more granularity when
162
+ it comes to predictions and therefore can achieve a much greater RMSE score.
163
+
164
  ```
165