Kenneth Hamilton's picture

Kenneth Hamilton PRO

ZennyKenny

AI & ML interests

Building and enablement @ montebello.ai Certified vibe coder

Recent Activity

Organizations

scikit-learn's profile picture TorchGeo's profile picture Kornia AI's profile picture Blog-explorers's profile picture OpenLLM France's profile picture Team Tonic's profile picture ZeroGPU Explorers's profile picture Data is Better Together - Russian Language Team's profile picture The Nevsky Collective's profile picture Plan Communications's profile picture MLX Community's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Data Is Better Together Contributor's profile picture

ZennyKenny's activity

replied to clem's post 1 day ago
view reply

That's largely been the move by Big Tech when they open source or open weight their models I think. What they are actually releasing is just a watered-down version of the product that they are capitalizing. Even Deep Seek and other open source first teams keep their capitalized models private.

Maybe the same people who sold your data out the backdoor whilst championing your "privacy rights" on their platform by letting you block people have suddenly had a massive change of heart, but something tells me the play is just to further increase the gap between Big Tech and emerging players by watering down the market so much that only companies who already have the compute maximalist infrastructure are going to be able to train meaningful models.

Maybe I'm just a cynic though.

posted an update 2 days ago
view post
Post
2007
A few new Russian-language synthetic datasets. The labelling is good, but some of the syntax and grammar is not great.

Great for Russian-language classification models, probably not great for fine-tuning Russian-langauge text generation.

- Virtual Assistant Query / Responses: ZennyKenny/ru_virtual_assistant_chatgpt_distill
- LLM Query / Responses: ZennyKenny/russian_llm_response_chatgpt_distill

Crazy how much language drift is still an issue, especially given that Russian constitutes nearly 5% of the content on the internet.
replied to their post 8 days ago
view reply

Definitely true about the underlying model, but the eval dataset and notebook are!

posted an update 8 days ago
view post
Post
1908
Besides being the coolest named benchmark in the game, HellaSwag is an important measurement of здравый смысль (or common sense) in LLMs.

- More on HellaSwag: https://github.com/rowanz/hellaswag

I spent the afternoon benchmarking YandexGPT Pro 4th Gen, one of the Russian tech giant's premier models.

- Yandex HF Org: yandex
- More on Yandex models: https://yandex.cloud/ru/docs/foundation-models/concepts/yandexgpt/models

The eval notebook is available on GitHub and the resulting dataset is already on the HF Hub!

- Eval Notebook: https://github.com/kghamilton89/ai-explorer/blob/main/yandex-hellaswag/hellaswag-assess.ipynb
- Eval Dataset: ZennyKenny/yandexgptpro_4th_gen-hellaswag

And of course, everyone wants to see the results so have a look at the results in the context of other zero-shot experiments that I was able to find!
  • 2 replies
·
reacted to giadap's post with 🔥 8 days ago
view post
Post
2268
We've all become experts at clicking "I agree" without a second thought. In my latest blog post, I explore why these traditional consent models are increasingly problematic in the age of generative AI.

I found three fundamental challenges:
- Scope problem: how can you know what you're agreeing to when AI could use your data in different ways?
- Temporality problem: once an AI system learns from your data, good luck trying to make it "unlearn" it.
- Autonomy trap: the data you share today could create systems that pigeonhole you tomorrow.

Individual users shouldn't bear all the responsibility, while big tech holds all the cards. We need better approaches to level the playing field, from collective advocacy and stronger technological safeguards to establishing "data fiduciaries" with a legal duty to protect our digital interests.

Available here: https://huggingface.co/blog/giadap/beyond-consent