BriLLM commited on
Commit
6941262
verified
1 Parent(s): db57b4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -1,5 +1,7 @@
1
  # BriLLM: Brain-inspired Large Language Model
2
 
 
 
3
  ## Overview
4
  This work introduces the first brain-inspired large language model (BriLLM). This is a non-Transformer, non-GPT, non-traditional machine learning input-output controlled generative language model. The model is based on the Signal Fully-connected flowing (SiFu) definition on the directed graph in terms of the neural network, and has the interpretability of all nodes on the graph of the whole model, instead of the traditional machine learning model that only has limited interpretability at the input and output ends.
5
 
@@ -32,15 +34,15 @@ We select a vocabulary of 4,000 tokens consisting of the most frequently used Ch
32
 
33
  ## Implementation Details.
34
  BriLLM is implemented using PyTorch.
35
- It uses sinusoidal positional encoding, GeLU as the activation function, cross-entropy loss for next-token prediction, and an embedding size of $d_{model} = 32$.
36
- We used the AdamW optimizer with $\beta_1 = 0.9$, $\beta_2 = 0.999$ and $\epsilon = 10^{-8}$.
37
- The model size is about $512 + 4000 * 4000 * (32 * 32 + 32) \approx 16B$.
38
  We trained our models on one machine with 8 NVIDIA A800 GPUs for 1.5k steps.
39
  ![](./figs/fig4.png)
40
 
41
 
42
  ## Complexity
43
- $n$ is the sequence length, $v$ is the vocabulary size, and $d$ is the representation dimension. The computational complexity is $O(n \cdot v \cdot d^2)$.
44
 
45
 
46
  ## Case Study
 
1
  # BriLLM: Brain-inspired Large Language Model
2
 
3
+ Our github repo: https://github.com/brillm05/BriLLM0.5
4
+
5
  ## Overview
6
  This work introduces the first brain-inspired large language model (BriLLM). This is a non-Transformer, non-GPT, non-traditional machine learning input-output controlled generative language model. The model is based on the Signal Fully-connected flowing (SiFu) definition on the directed graph in terms of the neural network, and has the interpretability of all nodes on the graph of the whole model, instead of the traditional machine learning model that only has limited interpretability at the input and output ends.
7
 
 
34
 
35
  ## Implementation Details.
36
  BriLLM is implemented using PyTorch.
37
+ It uses sinusoidal positional encoding, GeLU as the activation function, cross-entropy loss for next-token prediction, and an embedding size of d_model = 32.
38
+ We used the AdamW optimizer with beta_1 = 0.9, beta_2 = 0.999 and epsilon = 1e-8.
39
+ The model size is about 512 + 4000 * 4000 * (32 * 32 + 32) = 16B.
40
  We trained our models on one machine with 8 NVIDIA A800 GPUs for 1.5k steps.
41
  ![](./figs/fig4.png)
42
 
43
 
44
  ## Complexity
45
+ n is the sequence length, v is the vocabulary size, and d is the representation dimension. The computational complexity is O(nvd^2).
46
 
47
 
48
  ## Case Study