PZ's picture

PZ PRO

philipp-zettl

AI & ML interests

NLP/CV/Multimodal learning

Recent Activity

updated a model about 19 hours ago
philipp-zettl/T5-small-tinyqa
published a model about 19 hours ago
philipp-zettl/T5-small-tinyqa
liked a Space about 24 hours ago
elismasilva/mixture-of-diffusers-sdxl-tiling
View all activity

Organizations

Blog-explorers's profile picture easybits's profile picture

philipp-zettl's activity

New activity in philipp-zettl/chessPT 1 day ago

Training Date Size

1
#3 opened 2 days ago by
nh185285
reacted to schuler's post with ๐Ÿ”ฅ 2 days ago
view post
Post
6547
๐Ÿ“ข New Research Alert: Making Language Models Smaller & Smarter!

Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.

The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.

๐Ÿ”‘ Key Findings:
โ€ข 77% parameter reduction.
โ€ข Maintained model capabilities.
โ€ข Improved generalization.

Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm
  • 1 reply
ยท
New activity in philipp-zettl/chessPT 4 days ago

Any results?

1
#2 opened 6 days ago by
AlvaroMros
reacted to hexgrad's post with ๐Ÿ”ฅ 12 days ago
upvoted an article 16 days ago
view article
Article

FineWeb2-C: Help Build Better Language Models in Your Language

By davanstrien and 5 others โ€ข
โ€ข 18
reacted to mitkox's post with ๐Ÿš€ 16 days ago
view post
Post
2265
llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
ยท