69 22 93

Kenneth Hamilton PRO

ZennyKenny

https://nevskycollective.com

AI & ML interests

Building and enablement @ montebello.ai Certified vibe coder

Recent Activity

replied to clem's post 1 day ago

Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possible—just look at the “T” in ChatGPT, which comes from the Transformer architecture openly shared by Google. Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating. With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratization—powered by openness and collaboration, in the US and around the world. This is incredibly exciting. Let’s go, open science and open-source AI!

liked a Space 1 day ago

enzostvs/deepsite

updated a dataset 1 day ago

ZennyKenny/yandex-alice-sessions-large-syn

View all activity

Organizations

ZennyKenny's activity

replied to clem's post 1 day ago

That's largely been the move by Big Tech when they open source or open weight their models I think. What they are actually releasing is just a watered-down version of the product that they are capitalizing. Even Deep Seek and other open source first teams keep their capitalized models private.

Maybe the same people who sold your data out the backdoor whilst championing your "privacy rights" on their platform by letting you block people have suddenly had a massive change of heart, but something tells me the play is just to further increase the gap between Big Tech and emerging players by watering down the market so much that only companies who already have the compute maximalist infrastructure are going to be able to train meaningful models.

Maybe I'm just a cynic though.

liked a Space 1 day ago

2.03k

DeepSite

🐳

Generate any application with DeepSeek

updated a dataset 1 day ago

ZennyKenny/yandex-alice-sessions-large-syn

Viewer • Updated 1 day ago • 180k • 2

published a dataset 1 day ago

ZennyKenny/yandex-alice-sessions-large-syn

Viewer • Updated 1 day ago • 180k • 2

updated a dataset 2 days ago

ZennyKenny/ru-image-generation

Viewer • Updated 2 days ago • 100k • 4

published a dataset 2 days ago

ZennyKenny/ru-image-generation

Viewer • Updated 2 days ago • 100k • 4

posted an update 2 days ago

Post

2007

A few new Russian-language synthetic datasets. The labelling is good, but some of the syntax and grammar is not great.

Great for Russian-language classification models, probably not great for fine-tuning Russian-langauge text generation.

- Virtual Assistant Query / Responses: ZennyKenny/ru_virtual_assistant_chatgpt_distill
- LLM Query / Responses: ZennyKenny/russian_llm_response_chatgpt_distill

Crazy how much language drift is still an issue, especially given that Russian constitutes nearly 5% of the content on the internet.

updated 2 datasets 2 days ago

ZennyKenny/russian_llm_response_chatgpt_distill

Viewer • Updated 2 days ago • 50k • 19 • 1

ZennyKenny/ru_virtual_assistant_chatgpt_distill

Viewer • Updated 2 days ago • 100k • 5 • 1

published a dataset 2 days ago

ZennyKenny/ru_virtual_assistant_chatgpt_distill

Viewer • Updated 2 days ago • 100k • 5 • 1

liked a model 2 days ago

all-hands/openhands-lm-32b-v0.1

Text Generation • Updated 3 days ago • 2.39k • 195

published a dataset 5 days ago

ZennyKenny/russian_llm_response_chatgpt_distill

Viewer • Updated 2 days ago • 50k • 19 • 1

liked a Space 5 days ago

First Llm Classifier

💻

Learn to use LLMS to organize and analyze massive datasets

updated a Space 7 days ago

Note To Text

✏

Convert handwritten notes to digital format using AI

replied to their post 8 days ago

Definitely true about the underlying model, but the eval dataset and notebook are!

posted an update 8 days ago

Post

1908

Besides being the coolest named benchmark in the game, HellaSwag is an important measurement of здравый смысль (or common sense) in LLMs.

- More on HellaSwag: https://github.com/rowanz/hellaswag

I spent the afternoon benchmarking YandexGPT Pro 4th Gen, one of the Russian tech giant's premier models.

- Yandex HF Org:

yandex
- More on Yandex models: https://yandex.cloud/ru/docs/foundation-models/concepts/yandexgpt/models

The eval notebook is available on GitHub and the resulting dataset is already on the HF Hub!

- Eval Notebook: https://github.com/kghamilton89/ai-explorer/blob/main/yandex-hellaswag/hellaswag-assess.ipynb
- Eval Dataset: ZennyKenny/yandexgptpro_4th_gen-hellaswag

And of course, everyone wants to see the results so have a look at the results in the context of other zero-shot experiments that I was able to find!

2 replies

updated a dataset 8 days ago

ZennyKenny/yandexgptpro_4th_gen-hellaswag

Viewer • Updated 8 days ago • 10k • 20

published a dataset 8 days ago

ZennyKenny/yandexgptpro_4th_gen-hellaswag

Viewer • Updated 8 days ago • 10k • 20

reacted to giadap's post with 🔥 8 days ago

Post

2268

We've all become experts at clicking "I agree" without a second thought. In my latest blog post, I explore why these traditional consent models are increasingly problematic in the age of generative AI.

I found three fundamental challenges:
- Scope problem: how can you know what you're agreeing to when AI could use your data in different ways?
- Temporality problem: once an AI system learns from your data, good luck trying to make it "unlearn" it.
- Autonomy trap: the data you share today could create systems that pigeonhole you tomorrow.

Individual users shouldn't bear all the responsibility, while big tech holds all the cards. We need better approaches to level the playing field, from collective advocacy and stronger technological safeguards to establishing "data fiduciaries" with a legal duty to protect our digital interests.

Available here: https://huggingface.co/blog/giadap/beyond-consent