Snapshot
Browse files
README.md
CHANGED
@@ -1,4 +1,15 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |

|
4 |
|
@@ -184,5 +195,12 @@ def test_expander_zero_budget():
|
|
184 |
|
185 |
They are based on a non-llm expander based on a hardcoded list of possible expansions, so they are very easy to write, straightforward to interpret, and run very fast.
|
186 |
|
|
|
|
|
|
|
|
|
187 |
### Other potential possibilities / ideas
|
188 |
- Instead of using a local model, investigate using an API of a provider that exposes logprobs e.g. replicate
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: Gpted
|
3 |
+
emoji: 🏃
|
4 |
+
colorFrom: pink
|
5 |
+
colorTo: gray
|
6 |
+
sdk: docker
|
7 |
+
pinned: false
|
8 |
+
---
|
9 |
+
|
10 |
+
# GPTed blog post
|
11 |
+
|
12 |
+
## Part 1
|
13 |
|
14 |

|
15 |
|
|
|
195 |
|
196 |
They are based on a non-llm expander based on a hardcoded list of possible expansions, so they are very easy to write, straightforward to interpret, and run very fast.
|
197 |
|
198 |
+
### Limitations of the decoder-only approach
|
199 |
+
|
200 |
+
The main limitation of using decoder-only models like GPT or Llama for this task is the unidirectional attention. It means that we are not using the context on the right of the word. This is especially problematic at the start of the text: the first tokens get very little context, so the the probabilities we get from the model are not very useful. The obvious solution is to use a model with bi-directional attention, such as BERT. This will be covered in the part 2 of the post.
|
201 |
+
|
202 |
### Other potential possibilities / ideas
|
203 |
- Instead of using a local model, investigate using an API of a provider that exposes logprobs e.g. replicate
|
204 |
+
|
205 |
+
|
206 |
+
## Part 2
|