Update README.md
Browse files
README.md
CHANGED
@@ -25,8 +25,10 @@ SOFTWARE.
|
|
25 |
# Background
|
26 |
|
27 |
```text
|
28 |
-
The models provided here were created using open source modeling techniques
|
29 |
-
|
|
|
|
|
30 |
```
|
31 |
|
32 |
# Build Strategy
|
@@ -37,21 +39,30 @@ This section outlines the strategy used to build the models.
|
|
37 |
|
38 |
## Understanding Dataset Used
|
39 |
```text
|
40 |
-
The dataset used to build the models can be generated using the
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
```
|
48 |
|
49 |
## Understanding Model Creation
|
50 |
```text
|
51 |
-
As of now, the TSPS infrastructure only provides close, high, low, and volume. It
|
52 |
-
|
|
|
53 |
|
54 |
-
The models were derived using a variety of windows and iterations through the June
|
|
|
55 |
|
56 |
base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
|
57 |
.set_neurons([[1024, 0]]) \
|
@@ -61,34 +72,45 @@ base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
|
|
61 |
.set_model_dir(f'mining_models/model1.h5')
|
62 |
base_mining_model.train(prep_dataset, epochs=25)
|
63 |
|
64 |
-
where an LSTM model was created by using a few or no stacked layers. Most of the
|
65 |
-
|
66 |
-
the
|
|
|
|
|
67 |
```
|
68 |
|
69 |
## Understanding Training Decisions
|
70 |
```text
|
71 |
-
Training the model used the previous 601 rows of data as an input. This is because
|
72 |
-
|
|
|
|
|
73 |
|
74 |
-
Each set of 601 rows was trained on 25 times, inside another loop which iterated on
|
75 |
-
|
|
|
|
|
76 |
|
77 |
for x in range(50):
|
78 |
-
|
79 |
-
|
80 |
```
|
81 |
|
82 |
## Strategy to Predict
|
83 |
```text
|
84 |
-
The strategy to predict 100 closes of data into the future was to use a 1 step
|
85 |
-
|
86 |
-
|
|
|
|
|
87 |
```
|
88 |
|
89 |
# Model V5
|
90 |
```text
|
91 |
-
|
|
|
|
|
|
|
92 |
|
93 |
1. Concentrate on more difficult moves
|
94 |
2. Get more granular data (1m)
|
@@ -97,37 +119,47 @@ Recommendations on how to perform better than V4 and what Model V5 will look lik
|
|
97 |
|
98 |
-- Concentrate on more difficult moves
|
99 |
|
100 |
-
The Time Series Prediction Subnet will reward models that are capable of predicting
|
101 |
-
|
102 |
-
|
|
|
|
|
|
|
103 |
|
104 |
-- Get more granular data (1m)
|
105 |
|
106 |
-
With these larger magnitude movements, a strategy to get more granular with the data
|
107 |
-
|
|
|
108 |
|
109 |
-- Get more data sources
|
110 |
|
111 |
-
Beyond using financial market indicators like RSI, MACD, and Bollinger Bands the
|
|
|
112 |
|
113 |
-
The TSPS infrastructure will be adding data scrapers and using those data scrapers
|
114 |
-
|
|
|
115 |
|
116 |
-
Bitcoin open interest
|
117 |
-
Bitcoin OHLCV data
|
118 |
-
Bitcoin funding rate
|
119 |
-
DXY OHLCV data
|
120 |
-
Gold OHLCV data
|
121 |
-
S&P 500 OHLCV data
|
122 |
-
Bitcoin dominance
|
123 |
-
Historical news data (sentiment analysis)
|
124 |
|
125 |
-
Using this information will provide models with information they can use to better
|
|
|
126 |
|
127 |
-- Use more predicted steps
|
128 |
|
129 |
-
Rather than only predicting a single step at the 100th predicted close in the future
|
130 |
-
|
131 |
-
|
|
|
|
|
|
|
132 |
```
|
133 |
|
|
|
25 |
# Background
|
26 |
|
27 |
```text
|
28 |
+
The models provided here were created using open source modeling techniques
|
29 |
+
provided in https://github.com/taoshidev/time-series-prediction-subnet (TSPS).
|
30 |
+
They were achieved using the runnable/miner_training.py, and tested against
|
31 |
+
existing models and dummy models in runnable/miner_testing.py.
|
32 |
```
|
33 |
|
34 |
# Build Strategy
|
|
|
39 |
|
40 |
## Understanding Dataset Used
|
41 |
```text
|
42 |
+
The dataset used to build the models can be generated using the
|
43 |
+
runnable/generate_historical_data.py. A lookback period between June 2022 and
|
44 |
+
July 2023 on the 5m interval was used to train the model. Through analysis, the
|
45 |
+
reason this dataset was used is because historical data beyond June 2022 provides
|
46 |
+
strongly trending price movement or data movement that is from a period where
|
47 |
+
Bitcoin's market cap was too small to be relevant to where Bitcoin is now.
|
48 |
+
|
49 |
+
Therefore, using more recent data was used which correlates to the current market
|
50 |
+
cap and macroeconomic conditions where its uncertain we'll continue to get highly
|
51 |
+
trending Bitcoin data.
|
52 |
+
|
53 |
+
Testing data was used between June 2023 and Nov 2023 to determine performance of
|
54 |
+
the models. This was tested using the runnable/miner_testing.py file with a
|
55 |
+
separately generated test dataset from runnable/generate_historical_data.py.
|
56 |
```
|
57 |
|
58 |
## Understanding Model Creation
|
59 |
```text
|
60 |
+
As of now, the TSPS infrastructure only provides close, high, low, and volume. It
|
61 |
+
also provides financial indicators such as RSI, MACD, and Bollinger Bands but they
|
62 |
+
were not used for the purposes of training these models.
|
63 |
|
64 |
+
The models were derived using a variety of windows and iterations through the June
|
65 |
+
2022 to June 2023 dataset. The strategy to derive the model was the following:
|
66 |
|
67 |
base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
|
68 |
.set_neurons([[1024, 0]]) \
|
|
|
72 |
.set_model_dir(f'mining_models/model1.h5')
|
73 |
base_mining_model.train(prep_dataset, epochs=25)
|
74 |
|
75 |
+
where an LSTM model was created by using a few or no stacked layers. Most of the
|
76 |
+
v4 models are actually not stacked as they performed better not being stacked for
|
77 |
+
the most part. This could very likely change as more feature inputs are added (this
|
78 |
+
is being worked on as part of the open source infra in TSPS). The window size of
|
79 |
+
100 helped best predict the outcome, derived in mining_objects/base_mining_model.py
|
80 |
```
|
81 |
|
82 |
## Understanding Training Decisions
|
83 |
```text
|
84 |
+
Training the model used the previous 601 rows of data as an input. This is because
|
85 |
+
500 rows were used to batch, and we are looking to predict 100 rows into the future
|
86 |
+
(the challenge presented in the Time Series Prediction Subnet). Measures were taken
|
87 |
+
to ensure all data was trained on in the training data.
|
88 |
|
89 |
+
Each set of 601 rows was trained on 25 times, inside another loop which iterated on
|
90 |
+
the entirety of the dataset from 6/22 to 6/23 50 times. This provided the model the
|
91 |
+
ability to get granular with details yet not overfit to any single set of rows at
|
92 |
+
once. Therefore, a multi-layered looping infrastructure was used to derive the models.
|
93 |
|
94 |
for x in range(50):
|
95 |
+
for i in range(25):
|
96 |
+
train_model()
|
97 |
```
|
98 |
|
99 |
## Strategy to Predict
|
100 |
```text
|
101 |
+
The strategy to predict 100 closes of data into the future was to use a 1 step
|
102 |
+
methodology of predicting 1 step at 100 intervals into the future and connect the
|
103 |
+
information by generating a line from the last close to the prediction 100 closes
|
104 |
+
into the future. By doing so, the model could learn to predict a single step rather
|
105 |
+
than all 100 where loss could continue to increase with each misstep.
|
106 |
```
|
107 |
|
108 |
# Model V5
|
109 |
```text
|
110 |
+
Here's the text spaced out for readability in a README file:
|
111 |
+
|
112 |
+
Recommendations on how to perform better than V4 and what Model V5 will look like
|
113 |
+
are outlined below:
|
114 |
|
115 |
1. Concentrate on more difficult moves
|
116 |
2. Get more granular data (1m)
|
|
|
119 |
|
120 |
-- Concentrate on more difficult moves
|
121 |
|
122 |
+
The Time Series Prediction Subnet will reward models that are capable of predicting
|
123 |
+
more "difficult" movements in the market more than those that are less difficult.
|
124 |
+
Therefore, taking a strategy to train your model on larger movements or bigger
|
125 |
+
magnitude movements would be a good consideration. Some additional details on how
|
126 |
+
difficulty is calculated will be released soon but it is a combination of the
|
127 |
+
magnitude of the movement with the std dev of the movement in the predicted interval.
|
128 |
|
129 |
-- Get more granular data (1m)
|
130 |
|
131 |
+
With these larger magnitude movements, a strategy to get more granular with the data
|
132 |
+
would be recommended. Using 1m data to train rather than 5m would help the models
|
133 |
+
better predict information.
|
134 |
|
135 |
-- Get more data sources
|
136 |
|
137 |
+
Beyond using financial market indicators like RSI, MACD, and Bollinger Bands, the
|
138 |
+
TSPS open source infra will gather information for miners to help train.
|
139 |
|
140 |
+
The TSPS infrastructure will be adding data scrapers and using those data scrapers
|
141 |
+
to automatically gather information for you. The following pieces of information will
|
142 |
+
be gathered & accessible through the open source infra:
|
143 |
|
144 |
+
- Bitcoin open interest
|
145 |
+
- Bitcoin OHLCV data
|
146 |
+
- Bitcoin funding rate
|
147 |
+
- DXY OHLCV data
|
148 |
+
- Gold OHLCV data
|
149 |
+
- S&P 500 OHLCV data
|
150 |
+
- Bitcoin dominance
|
151 |
+
- Historical news data (sentiment analysis)
|
152 |
|
153 |
+
Using this information will provide models with information they can use to better
|
154 |
+
predict prices as markets correlate in movement and Bitcoin responds to other markets.
|
155 |
|
156 |
-- Use more predicted steps
|
157 |
|
158 |
+
Rather than only predicting a single step at the 100th predicted close in the future,
|
159 |
+
predict more steps. This can be achieved by training multiple models, for example,
|
160 |
+
10 models each at 10 closes into the future (10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
|
161 |
+
or by using a multi-step model with 10 steps. Both will achieve more granularity when
|
162 |
+
it comes to predictions and therefore can achieve a much greater RMSE score.
|
163 |
+
|
164 |
```
|
165 |
|