| Since the hash can be a bit random, several hash functions are used in practice | |
| (determined by a n_rounds parameter) and then are averaged together. | |
| Local attention | |
| Longformer uses local attention: often, the local context (e.g., what are the two tokens to the | |
| left and right?) is enough to take action for a given token. |