cgrumbach commited on
Commit
c454553
·
verified ·
1 Parent(s): 3327b57

Update Readme.md

Browse files
Files changed (1) hide show
  1. README.md +30 -31
README.md CHANGED
@@ -1,10 +1,8 @@
1
  # Title
2
- Analyzing Bitcointalk.org with Large Language Models
3
-
4
- - Researchers: Cyrille Grumbach, Didier Sornette
5
- - Research Assistant: Timothé Laborie
6
-
7
 
 
 
8
 
9
  # Folders
10
 
@@ -12,34 +10,31 @@ Analyzing Bitcointalk.org with Large Language Models
12
 
13
  File main.ipynb is used to scrape the forum.
14
 
15
-
16
  ## hardwarelist
17
 
18
  Includes the following:
19
 
20
- - pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates, originally made by Cyrille.
21
- - get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: Can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
22
  - hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
23
  - 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
24
  - 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
25
  - 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
26
  - 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
27
 
28
-
29
  ## bitcoinforum
30
 
31
  ### 1_forum_dataset
32
 
33
- Contains the raw HTML from the forum, and code to parse it and combine it into dataframes.
34
 
35
  ### 2_train_set_creation
36
 
37
- Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, also creates the inputs that will be given to Mistral 7B after training.
38
 
39
  ### 3_training
40
 
41
- Trains Mistral 7B using LoRA, on the dataset generated earlier, and saves the merged model
42
-
43
 
44
  ### 4_inference
45
 
@@ -59,45 +54,41 @@ Includes the following files:
59
 
60
  Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
61
 
62
- monthly_stuff.csv contains columns: date,price,hashrate,coins_per_block,efficiency,max_efficiency
63
-
64
 
65
  ## plots
66
 
67
  Includes the following:
68
 
69
- - carboncomparison folder: Contains the 17 sources used to create the carbon comparison table
70
  - carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
71
  - appendix2.ipynb: Creates all plots from appendix2
72
 
73
-
74
-
75
-
76
  # System requirements
77
 
78
- Running the training or inference of Mistral 7B requires an NVIDIA GPU with at least 24GB of VRAM (can also be a Runpod instance).
79
 
80
- Everything else can be run on a normal desktop/laptop computer with python 3.10 installed.
81
 
82
  # Operating system
83
 
84
- Code which is not related to training or inference of Mistral 7B has been tested on Windows 10.
85
 
86
  Code for Mistral 7B training and inference has been tested on Runpod instances.
87
 
88
  # Installation guide for software dependencies
89
 
90
- For the code which is not related to training or inference of Mistral 7B, use the packages listed in requirements.txt
91
 
92
  ## Installation guide for Mistral 7B training and inference
93
 
94
- Setup a Runpod instance with the axolotl docker image, then install unsloth using the instructions at https://github.com/unslothai/unsloth
95
 
96
- Also install SGLang for inference.
97
 
98
  ## Typical install time on a "normal" desktop computer
99
 
100
- For the code which is not related to training or inference of Mistral 7B, the install time is around 5 minutes.
101
 
102
  For Mistral 7B training and inference, the install time is around 1 hour.
103
 
@@ -107,17 +98,17 @@ For Mistral 7B training and inference, the install time is around 1 hour.
107
 
108
  Run the code in the order listed in the folders section above.
109
 
110
- Note: There are 3 files that normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:
111
 
112
  - The scraper takes over 12 hours to run.
113
- - The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ of OpenAI credits.
114
- - The process of mapping the hardware names to those of the efficiency table takes around 3 hour and also costs about 10$ of OpenAI credits.
115
 
116
  All other files can be run in a few minutes.
117
 
118
  ## Expected output
119
 
120
- You should re-obtain the csv files that are already in the folders, and the plots used in the paper.
121
 
122
  ## Expected run time for demo on a "normal" desktop computer
123
 
@@ -125,4 +116,12 @@ The expected run time to run every notebook on a "normal" desktop computer is ar
125
 
126
  ## Instructions for use on custom data
127
 
128
- The code is designed only to analyse the mining section of bitcointalk.org.
 
 
 
 
 
 
 
 
 
1
  # Title
2
+ "Bitcoin Forum analysis underscores enduring and substantial carbon footprint of Bitcoin"
 
 
 
 
3
 
4
+ - Cyrille Grumbach, ETH Zurich, Switzerland ([email protected])
5
+ - Didier Sornette, Southern University of Science and Technology, China ([email protected])
6
 
7
  # Folders
8
 
 
10
 
11
  File main.ipynb is used to scrape the forum.
12
 
 
13
  ## hardwarelist
14
 
15
  Includes the following:
16
 
17
+ - pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates originally made by Cyrille.
18
+ - get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: These can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
19
  - hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
20
  - 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
21
  - 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
22
  - 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
23
  - 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
24
 
 
25
  ## bitcoinforum
26
 
27
  ### 1_forum_dataset
28
 
29
+ It contains the raw HTML from the forum and code to parse it and combine it into data frames.
30
 
31
  ### 2_train_set_creation
32
 
33
+ Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, and also creates the inputs that will be given to Mistral 7B after training.
34
 
35
  ### 3_training
36
 
37
+ Trains Mistral 7B using LoRA on the dataset generated earlier and saves the merged model.
 
38
 
39
  ### 4_inference
40
 
 
54
 
55
  Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
56
 
57
+ monthly_stuff.csv contains columns: date, price, hashrate, coins_per_block, efficiency, max_efficiency
 
58
 
59
  ## plots
60
 
61
  Includes the following:
62
 
63
+ - carbon-comparison folder: Contains the 17 sources used to create the carbon comparison table
64
  - carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
65
  - appendix2.ipynb: Creates all plots from appendix2
66
 
 
 
 
67
  # System requirements
68
 
69
+ Running Mistral 7 B's training or inference requires an NVIDIA GPU with at least 24GB of VRAM (it can also be a Runpod instance).
70
 
71
+ Everything else can be run on a normal desktop/laptop computer with Python 3.10 installed.
72
 
73
  # Operating system
74
 
75
+ Code unrelated to training or inference of Mistral 7B has been tested on Windows 10.
76
 
77
  Code for Mistral 7B training and inference has been tested on Runpod instances.
78
 
79
  # Installation guide for software dependencies
80
 
81
+ For the code unrelated to training or inference of Mistral 7B, use the packages listed in requirements.txt
82
 
83
  ## Installation guide for Mistral 7B training and inference
84
 
85
+ Setup a Runpod instance with the axolotl docker image, then install Unsloth using the instructions at https://github.com/unslothai/unsloth
86
 
87
+ Also, install SGLang for inference.
88
 
89
  ## Typical install time on a "normal" desktop computer
90
 
91
+ For the code unrelated to training or inference of Mistral 7B, the install time is around 5 minutes.
92
 
93
  For Mistral 7B training and inference, the install time is around 1 hour.
94
 
 
98
 
99
  Run the code in the order listed in the folders section above.
100
 
101
+ Note: 3 files normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:
102
 
103
  - The scraper takes over 12 hours to run.
104
+ - The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ OpenAI credits.
105
+ - The process of mapping the hardware names to those of the efficiency table takes around 3 hours and also costs about 10$ of OpenAI credits.
106
 
107
  All other files can be run in a few minutes.
108
 
109
  ## Expected output
110
 
111
+ You should re-obtain the CSV files already in the folders and the plots used in the paper.
112
 
113
  ## Expected run time for demo on a "normal" desktop computer
114
 
 
116
 
117
  ## Instructions for use on custom data
118
 
119
+ The code is designed only to analyze the mining section of bitcointalk.org.
120
+
121
+ **Acknowledgments**
122
+ This work was partially supported by the National Natural Science Foundation of China (Grant No.
123
+ T2350710802 and No. U2039202), Shenzhen Science and Technology Innovation Commission
124
+ Project (Grants No. GJHZ20210705141805017 and No. K23405006), and the Center for
125
+ Computational Science and Engineering at Southern University of Science and Technology. The
126
+ authors acknowledge T. Laborie for excellent research assistance and Y. Cui and M. von Krosigk
127
+ for helpful comments. Any errors are our own.