Update Readme.md
Browse files
README.md
CHANGED
@@ -1,10 +1,8 @@
|
|
1 |
# Title
|
2 |
-
|
3 |
-
|
4 |
-
- Researchers: Cyrille Grumbach, Didier Sornette
|
5 |
-
- Research Assistant: Timothé Laborie
|
6 |
-
|
7 |
|
|
|
|
|
8 |
|
9 |
# Folders
|
10 |
|
@@ -12,34 +10,31 @@ Analyzing Bitcointalk.org with Large Language Models
|
|
12 |
|
13 |
File main.ipynb is used to scrape the forum.
|
14 |
|
15 |
-
|
16 |
## hardwarelist
|
17 |
|
18 |
Includes the following:
|
19 |
|
20 |
-
- pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates
|
21 |
-
- get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js:
|
22 |
- hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
|
23 |
- 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
|
24 |
- 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
|
25 |
- 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
|
26 |
- 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
|
27 |
|
28 |
-
|
29 |
## bitcoinforum
|
30 |
|
31 |
### 1_forum_dataset
|
32 |
|
33 |
-
|
34 |
|
35 |
### 2_train_set_creation
|
36 |
|
37 |
-
Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, also creates the inputs that will be given to Mistral 7B after training.
|
38 |
|
39 |
### 3_training
|
40 |
|
41 |
-
Trains Mistral 7B using LoRA
|
42 |
-
|
43 |
|
44 |
### 4_inference
|
45 |
|
@@ -59,45 +54,41 @@ Includes the following files:
|
|
59 |
|
60 |
Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
|
61 |
|
62 |
-
monthly_stuff.csv contains columns: date,price,hashrate,coins_per_block,efficiency,max_efficiency
|
63 |
-
|
64 |
|
65 |
## plots
|
66 |
|
67 |
Includes the following:
|
68 |
|
69 |
-
-
|
70 |
- carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
|
71 |
- appendix2.ipynb: Creates all plots from appendix2
|
72 |
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
# System requirements
|
77 |
|
78 |
-
Running
|
79 |
|
80 |
-
Everything else can be run on a normal desktop/laptop computer with
|
81 |
|
82 |
# Operating system
|
83 |
|
84 |
-
Code
|
85 |
|
86 |
Code for Mistral 7B training and inference has been tested on Runpod instances.
|
87 |
|
88 |
# Installation guide for software dependencies
|
89 |
|
90 |
-
For the code
|
91 |
|
92 |
## Installation guide for Mistral 7B training and inference
|
93 |
|
94 |
-
Setup a Runpod instance with the axolotl docker image, then install
|
95 |
|
96 |
-
Also install SGLang for inference.
|
97 |
|
98 |
## Typical install time on a "normal" desktop computer
|
99 |
|
100 |
-
For the code
|
101 |
|
102 |
For Mistral 7B training and inference, the install time is around 1 hour.
|
103 |
|
@@ -107,17 +98,17 @@ For Mistral 7B training and inference, the install time is around 1 hour.
|
|
107 |
|
108 |
Run the code in the order listed in the folders section above.
|
109 |
|
110 |
-
Note:
|
111 |
|
112 |
- The scraper takes over 12 hours to run.
|
113 |
-
- The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$
|
114 |
-
- The process of mapping the hardware names to those of the efficiency table takes around 3
|
115 |
|
116 |
All other files can be run in a few minutes.
|
117 |
|
118 |
## Expected output
|
119 |
|
120 |
-
You should re-obtain the
|
121 |
|
122 |
## Expected run time for demo on a "normal" desktop computer
|
123 |
|
@@ -125,4 +116,12 @@ The expected run time to run every notebook on a "normal" desktop computer is ar
|
|
125 |
|
126 |
## Instructions for use on custom data
|
127 |
|
128 |
-
The code is designed only to
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Title
|
2 |
+
"Bitcoin Forum analysis underscores enduring and substantial carbon footprint of Bitcoin"
|
|
|
|
|
|
|
|
|
3 |
|
4 |
+
- Cyrille Grumbach, ETH Zurich, Switzerland ([email protected])
|
5 |
+
- Didier Sornette, Southern University of Science and Technology, China ([email protected])
|
6 |
|
7 |
# Folders
|
8 |
|
|
|
10 |
|
11 |
File main.ipynb is used to scrape the forum.
|
12 |
|
|
|
13 |
## hardwarelist
|
14 |
|
15 |
Includes the following:
|
16 |
|
17 |
+
- pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates originally made by Cyrille.
|
18 |
+
- get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: These can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
|
19 |
- hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
|
20 |
- 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
|
21 |
- 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
|
22 |
- 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
|
23 |
- 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
|
24 |
|
|
|
25 |
## bitcoinforum
|
26 |
|
27 |
### 1_forum_dataset
|
28 |
|
29 |
+
It contains the raw HTML from the forum and code to parse it and combine it into data frames.
|
30 |
|
31 |
### 2_train_set_creation
|
32 |
|
33 |
+
Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, and also creates the inputs that will be given to Mistral 7B after training.
|
34 |
|
35 |
### 3_training
|
36 |
|
37 |
+
Trains Mistral 7B using LoRA on the dataset generated earlier and saves the merged model.
|
|
|
38 |
|
39 |
### 4_inference
|
40 |
|
|
|
54 |
|
55 |
Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
|
56 |
|
57 |
+
monthly_stuff.csv contains columns: date, price, hashrate, coins_per_block, efficiency, max_efficiency
|
|
|
58 |
|
59 |
## plots
|
60 |
|
61 |
Includes the following:
|
62 |
|
63 |
+
- carbon-comparison folder: Contains the 17 sources used to create the carbon comparison table
|
64 |
- carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
|
65 |
- appendix2.ipynb: Creates all plots from appendix2
|
66 |
|
|
|
|
|
|
|
67 |
# System requirements
|
68 |
|
69 |
+
Running Mistral 7 B's training or inference requires an NVIDIA GPU with at least 24GB of VRAM (it can also be a Runpod instance).
|
70 |
|
71 |
+
Everything else can be run on a normal desktop/laptop computer with Python 3.10 installed.
|
72 |
|
73 |
# Operating system
|
74 |
|
75 |
+
Code unrelated to training or inference of Mistral 7B has been tested on Windows 10.
|
76 |
|
77 |
Code for Mistral 7B training and inference has been tested on Runpod instances.
|
78 |
|
79 |
# Installation guide for software dependencies
|
80 |
|
81 |
+
For the code unrelated to training or inference of Mistral 7B, use the packages listed in requirements.txt
|
82 |
|
83 |
## Installation guide for Mistral 7B training and inference
|
84 |
|
85 |
+
Setup a Runpod instance with the axolotl docker image, then install Unsloth using the instructions at https://github.com/unslothai/unsloth
|
86 |
|
87 |
+
Also, install SGLang for inference.
|
88 |
|
89 |
## Typical install time on a "normal" desktop computer
|
90 |
|
91 |
+
For the code unrelated to training or inference of Mistral 7B, the install time is around 5 minutes.
|
92 |
|
93 |
For Mistral 7B training and inference, the install time is around 1 hour.
|
94 |
|
|
|
98 |
|
99 |
Run the code in the order listed in the folders section above.
|
100 |
|
101 |
+
Note: 3 files normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:
|
102 |
|
103 |
- The scraper takes over 12 hours to run.
|
104 |
+
- The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ OpenAI credits.
|
105 |
+
- The process of mapping the hardware names to those of the efficiency table takes around 3 hours and also costs about 10$ of OpenAI credits.
|
106 |
|
107 |
All other files can be run in a few minutes.
|
108 |
|
109 |
## Expected output
|
110 |
|
111 |
+
You should re-obtain the CSV files already in the folders and the plots used in the paper.
|
112 |
|
113 |
## Expected run time for demo on a "normal" desktop computer
|
114 |
|
|
|
116 |
|
117 |
## Instructions for use on custom data
|
118 |
|
119 |
+
The code is designed only to analyze the mining section of bitcointalk.org.
|
120 |
+
|
121 |
+
**Acknowledgments**
|
122 |
+
This work was partially supported by the National Natural Science Foundation of China (Grant No.
|
123 |
+
T2350710802 and No. U2039202), Shenzhen Science and Technology Innovation Commission
|
124 |
+
Project (Grants No. GJHZ20210705141805017 and No. K23405006), and the Center for
|
125 |
+
Computational Science and Engineering at Southern University of Science and Technology. The
|
126 |
+
authors acknowledge T. Laborie for excellent research assistance and Y. Cui and M. von Krosigk
|
127 |
+
for helpful comments. Any errors are our own.
|