Young developer. Interested in advancing every aspect of artificial intelligence and making new discoveries while being fully open source.
Currently working on an advanced language model from scratch.
Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.
Here is a list of 15 types of attention mechanisms used in AI models:
3. Self-attention -> Attention Is All You Need (1706.03762) Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.
5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762) Multiple attention βheadsβ are run in parallel.β The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.