cgrumbach
/

BitcoinPaper

@@ -1,10 +1,8 @@
 # Title
-Analyzing Bitcointalk.org with Large Language Models
-- Researchers: Cyrille Grumbach, Didier Sornette
-- Research Assistant: Timothé Laborie
 # Folders
@@ -12,34 +10,31 @@ Analyzing Bitcointalk.org with Large Language Models
 File main.ipynb is used to scrape the forum.
 ## hardwarelist
 Includes the following:
-- pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates, originally made by Cyrille.
-- get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: Can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
 - hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
 - 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
 - 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
 - 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
 - 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
 ## bitcoinforum
 ### 1_forum_dataset
-Contains the raw HTML from the forum, and code to parse it and combine it into dataframes.
 ### 2_train_set_creation
-Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, also creates the inputs that will be given to Mistral 7B after training.
 ### 3_training
-Trains Mistral 7B using LoRA, on the dataset generated earlier, and saves the merged model
 ### 4_inference
@@ -59,45 +54,41 @@ Includes the following files:
 Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
-monthly_stuff.csv contains columns: date,price,hashrate,coins_per_block,efficiency,max_efficiency
 ## plots
 Includes the following:
-- carboncomparison folder: Contains the 17 sources used to create the carbon comparison table
 - carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
 - appendix2.ipynb: Creates all plots from appendix2
 # System requirements
-Running the training or inference of Mistral 7B requires an NVIDIA GPU with at least 24GB of VRAM (can also be a Runpod instance).
-Everything else can be run on a normal desktop/laptop computer with python 3.10 installed.
 # Operating system
-Code which is not related to training or inference of Mistral 7B has been tested on Windows 10.
 Code for Mistral 7B training and inference has been tested on Runpod instances.
 # Installation guide for software dependencies
-For the code which is not related to training or inference of Mistral 7B, use the packages listed in requirements.txt
 ## Installation guide for Mistral 7B training and inference
-Setup a Runpod instance with the axolotl docker image, then install unsloth using the instructions at https://github.com/unslothai/unsloth
-Also install SGLang for inference.
 ## Typical install time on a "normal" desktop computer
-For the code which is not related to training or inference of Mistral 7B, the install time is around 5 minutes.
 For Mistral 7B training and inference, the install time is around 1 hour.
@@ -107,17 +98,17 @@ For Mistral 7B training and inference, the install time is around 1 hour.
 Run the code in the order listed in the folders section above.
-Note: There are 3 files that normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:
 - The scraper takes over 12 hours to run.
-- The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ of OpenAI credits.
-- The process of mapping the hardware names to those of the efficiency table takes around 3 hour and also costs about 10$ of OpenAI credits.
 All other files can be run in a few minutes.
 ## Expected output
-You should re-obtain the csv files that are already in the folders, and the plots used in the paper.
 ## Expected run time for demo on a "normal" desktop computer
@@ -125,4 +116,12 @@ The expected run time to run every notebook on a "normal" desktop computer is ar
 ## Instructions for use on custom data
-The code is designed only to analyse the mining section of bitcointalk.org.

 # Title
+"Bitcoin Forum analysis underscores enduring and substantial carbon footprint of Bitcoin"
+- Cyrille Grumbach, ETH Zurich, Switzerland ([email protected])
+- Didier Sornette, Southern University of Science and Technology, China ([email protected])
 # Folders
 File main.ipynb is used to scrape the forum.
 ## hardwarelist
 Includes the following:
+- pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates originally made by Cyrille.
+- get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: These can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
 - hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
 - 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
 - 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
 - 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
 - 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
 ## bitcoinforum
 ### 1_forum_dataset
+It contains the raw HTML from the forum and code to parse it and combine it into data frames.
 ### 2_train_set_creation
+Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, and also creates the inputs that will be given to Mistral 7B after training.
 ### 3_training
+Trains Mistral 7B using LoRA on the dataset generated earlier and saves the merged model.
 ### 4_inference
 Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
+monthly_stuff.csv contains columns: date, price, hashrate, coins_per_block, efficiency, max_efficiency
 ## plots
 Includes the following:
+- carbon-comparison folder: Contains the 17 sources used to create the carbon comparison table
 - carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
 - appendix2.ipynb: Creates all plots from appendix2
 # System requirements
+Running Mistral 7 B's training or inference requires an NVIDIA GPU with at least 24GB of VRAM (it can also be a Runpod instance).
+Everything else can be run on a normal desktop/laptop computer with Python 3.10 installed.
 # Operating system
+Code unrelated to training or inference of Mistral 7B has been tested on Windows 10.
 Code for Mistral 7B training and inference has been tested on Runpod instances.
 # Installation guide for software dependencies
+For the code unrelated to training or inference of Mistral 7B, use the packages listed in requirements.txt
 ## Installation guide for Mistral 7B training and inference
+Setup a Runpod instance with the axolotl docker image, then install Unsloth using the instructions at https://github.com/unslothai/unsloth
+Also, install SGLang for inference.
 ## Typical install time on a "normal" desktop computer
+For the code unrelated to training or inference of Mistral 7B, the install time is around 5 minutes.
 For Mistral 7B training and inference, the install time is around 1 hour.
 Run the code in the order listed in the folders section above.
+Note: 3 files normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:
 - The scraper takes over 12 hours to run.
+- The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ OpenAI credits.
+- The process of mapping the hardware names to those of the efficiency table takes around 3 hours and also costs about 10$ of OpenAI credits.
 All other files can be run in a few minutes.
 ## Expected output
+You should re-obtain the CSV files already in the folders and the plots used in the paper.
 ## Expected run time for demo on a "normal" desktop computer
 ## Instructions for use on custom data
+The code is designed only to analyze the mining section of bitcointalk.org.
+**Acknowledgments**
+This work was partially supported by the National Natural Science Foundation of China (Grant No.
+T2350710802 and No. U2039202), Shenzhen Science and Technology Innovation Commission
+Project (Grants No. GJHZ20210705141805017 and No. K23405006), and the Center for
+Computational Science and Engineering at Southern University of Science and Technology. The
+authors acknowledge T. Laborie for excellent research assistance and Y. Cui and M. von Krosigk
+for helpful comments. Any errors are our own.