mebubo commited on
Commit
777b898
·
1 Parent(s): dc63063
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -1,4 +1,15 @@
1
- # GPTed blog post part 1
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ![](img/GPTed.jpeg)
4
 
@@ -184,5 +195,12 @@ def test_expander_zero_budget():
184
 
185
  They are based on a non-llm expander based on a hardcoded list of possible expansions, so they are very easy to write, straightforward to interpret, and run very fast.
186
 
 
 
 
 
187
  ### Other potential possibilities / ideas
188
  - Instead of using a local model, investigate using an API of a provider that exposes logprobs e.g. replicate
 
 
 
 
1
+ ---
2
+ title: Gpted
3
+ emoji: 🏃
4
+ colorFrom: pink
5
+ colorTo: gray
6
+ sdk: docker
7
+ pinned: false
8
+ ---
9
+
10
+ # GPTed blog post
11
+
12
+ ## Part 1
13
 
14
  ![](img/GPTed.jpeg)
15
 
 
195
 
196
  They are based on a non-llm expander based on a hardcoded list of possible expansions, so they are very easy to write, straightforward to interpret, and run very fast.
197
 
198
+ ### Limitations of the decoder-only approach
199
+
200
+ The main limitation of using decoder-only models like GPT or Llama for this task is the unidirectional attention. It means that we are not using the context on the right of the word. This is especially problematic at the start of the text: the first tokens get very little context, so the the probabilities we get from the model are not very useful. The obvious solution is to use a model with bi-directional attention, such as BERT. This will be covered in the part 2 of the post.
201
+
202
  ### Other potential possibilities / ideas
203
  - Instead of using a local model, investigate using an API of a provider that exposes logprobs e.g. replicate
204
+
205
+
206
+ ## Part 2