Jayan Kesavan

jayan12k

jayan12k

AI & ML interests

Young developer. Interested in advancing every aspect of artificial intelligence and making new discoveries while being fully open source. Currently working on an advanced language model from scratch.

Recent Activity

liked a model 1 day ago

HuggingFaceH4/zephyr-7b-beta

liked a Space 1 day ago

unity/ML-Agents-SoccerTwos

liked a Space 1 day ago

mrm8488/FlappyBirds

View all activity

Organizations

None yet

jayan12k's activity

liked a model 1 day ago

HuggingFaceH4/zephyr-7b-beta

Text Generation • Updated Oct 16, 2024 • 673k • • 1.67k

liked 2 Spaces 1 day ago

SoccerTwos

⚽

Play a 2v2 soccer game in your browser

FlappyBirds

Simulate neuroevolution to train flappy birds

liked a dataset 1 day ago

bigcode/starcoderdata

Viewer • Updated May 16, 2023 • 207M • 4.62k • 422

upvoted a paper 1 day ago

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published 29 days ago • 30

liked a dataset 1 day ago

allenai/c4

Viewer • Updated Jan 9, 2024 • 10.4B • 389k • 388

reacted to Kseniase's post with 🔥 2 days ago

Post

6624

15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇