Yamata Zen
yamatazen
AI & ML interests
None yet
Recent Activity
liked
a model
3 days ago
lightx2v/Qwen-Image-Edit-2511-Lightning
liked
a model
3 days ago
unsloth/Qwen-Image-Edit-2511-GGUF
liked
a model
3 days ago
Qwen/Qwen-Image-Edit-2511
Organizations
None yet
Autoregressive image generation
AGI
-
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 23 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper • 2505.04620 • Published • 82 -
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Paper • 2507.06952 • Published • 7
Multilingual LLMs
-
Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability
Paper • 2306.06688 • Published -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper • 2412.14471 • Published -
Language Models' Factuality Depends on the Language of Inquiry
Paper • 2502.17955 • Published • 33 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper • 2410.01335 • Published • 5
AI censorship
-
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper • 2501.18492 • Published • 88 -
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Paper • 2412.19512 • Published • 9 -
Course-Correction: Safety Alignment Using Synthetic Preferences
Paper • 2407.16637 • Published • 26 -
Refusal in Language Models Is Mediated by a Single Direction
Paper • 2406.11717 • Published • 4
Grokking
-
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 41 -
Grokking at the Edge of Numerical Stability
Paper • 2501.04697 • Published • 2 -
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Paper • 2506.21551 • Published • 28
Optimizers
-
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Paper • 2307.02047 • Published • 2 -
Practical Efficiency of Muon for Pretraining
Paper • 2505.02222 • Published • 40 -
AdaMuon: Adaptive Muon Optimizer
Paper • 2507.11005 • Published • 2 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 8
GGUF tools
Model merging
-
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 21 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 13 -
Mergenetic: a Simple Evolutionary Model Merging Library
Paper • 2505.11427 • Published • 14 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper • 2410.01335 • Published • 5
Japanese LLMs
-
mradermacher/Himeyuri-v0.1-12B-i1-GGUF
12B • Updated • 73 • 2 -
spow12/ChatWaifu_12B_v2.0
Text Generation • 12B • Updated • 56 • • 22 -
Local-Novel-LLM-project/Vecteus-v1
Text Generation • 7B • Updated • 120 • 30 -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper • 2412.14471 • Published
LLM leaderboards
-
Running1.35k
UGI Leaderboard
📢1.35kUncensored General Intelligence Leaderboard
-
Paused104
Open Japanese LLM Leaderboard
🌸104Explore and compare LLM models with interactive filters and visualizations
-
Running on CPU Upgrade13.8k
Open LLM Leaderboard
🏆13.8kTrack, rank and evaluate open LLMs and chatbots
-
Running4.7k
LMArena Leaderboard
🏆4.7kDisplay LMArena Leaderboard
Genshin Impact
Optimizers
-
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Paper • 2307.02047 • Published • 2 -
Practical Efficiency of Muon for Pretraining
Paper • 2505.02222 • Published • 40 -
AdaMuon: Adaptive Muon Optimizer
Paper • 2507.11005 • Published • 2 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 8
Autoregressive image generation
GGUF tools
AGI
-
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 23 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper • 2505.04620 • Published • 82 -
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Paper • 2507.06952 • Published • 7
Model merging
-
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 21 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 13 -
Mergenetic: a Simple Evolutionary Model Merging Library
Paper • 2505.11427 • Published • 14 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper • 2410.01335 • Published • 5
Multilingual LLMs
-
Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability
Paper • 2306.06688 • Published -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper • 2412.14471 • Published -
Language Models' Factuality Depends on the Language of Inquiry
Paper • 2502.17955 • Published • 33 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper • 2410.01335 • Published • 5
Japanese LLMs
-
mradermacher/Himeyuri-v0.1-12B-i1-GGUF
12B • Updated • 73 • 2 -
spow12/ChatWaifu_12B_v2.0
Text Generation • 12B • Updated • 56 • • 22 -
Local-Novel-LLM-project/Vecteus-v1
Text Generation • 7B • Updated • 120 • 30 -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper • 2412.14471 • Published
AI censorship
-
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper • 2501.18492 • Published • 88 -
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Paper • 2412.19512 • Published • 9 -
Course-Correction: Safety Alignment Using Synthetic Preferences
Paper • 2407.16637 • Published • 26 -
Refusal in Language Models Is Mediated by a Single Direction
Paper • 2406.11717 • Published • 4
LLM leaderboards
-
Running1.35k
UGI Leaderboard
📢1.35kUncensored General Intelligence Leaderboard
-
Paused104
Open Japanese LLM Leaderboard
🌸104Explore and compare LLM models with interactive filters and visualizations
-
Running on CPU Upgrade13.8k
Open LLM Leaderboard
🏆13.8kTrack, rank and evaluate open LLMs and chatbots
-
Running4.7k
LMArena Leaderboard
🏆4.7kDisplay LMArena Leaderboard
Grokking
-
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 41 -
Grokking at the Edge of Numerical Stability
Paper • 2501.04697 • Published • 2 -
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Paper • 2506.21551 • Published • 28