Nathan Habib's picture

Nathan Habib PRO

SaylorTwift

·

AI & ML interests

None yet

Recent Activity

updated a Space 2 days ago

OpenEvals/open_benchmark_index

liked a Space 3 days ago

OpenEvals/find-a-leaderboard

liked a Space 3 days ago

yourbench/advanced

View all activity

Organizations

Posts 1

Post

1057

How do I test an LLM for my unique needs?
If you work in finance, law, or medicine, generic benchmarks are not enough.
This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models.

https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md

Articles 12

Article

9

来自OpenAI gpt-oss的技巧，你🫵在transformers中也可以使用

View all Articles

Collections 7

View 7 collections

Papers 1

arxiv:2310.16944

spaces 11

EvalFlip - AI Benchmark Universe 🚀

Explore AI benchmarks for math, QA, and multitask understanding

Smollm3 Mmlu Pro

Display log files in a themed view

Inspect Bundle

Hf Providers Tool Calling Dashboard

Load and analyze BFCL results from JSON files

Lighteval Wandb

Visualize project metrics and runs

Wanddb

Visualize project metrics and runs

models 2

SaylorTwift/gpt2_test

Text Generation • 0.1B • Updated Sep 23, 2024 • 770

SaylorTwift/xlm-roberta-base-finetuned-panx-fr

Updated Mar 13, 2023

datasets 47

SaylorTwift/lighteval-tasks-database

Viewer • Updated Sep 25 • 1.36k • 22

SaylorTwift/winograd_wsc

Updated Aug 13 • 11

SaylorTwift/xcopa

Updated Aug 13 • 313

SaylorTwift/qasper

Updated Aug 13 • 12

SaylorTwift/social_i_qa

Updated Aug 13 • 10

SaylorTwift/math_qa

Updated Aug 13 • 15

SaylorTwift/details_meta-llama__Llama-3.1-8B-Instruct_private

Viewer • Updated Jul 1 • 21 • 2.51k

SaylorTwift/details_HuggingFaceTB__SmolLM2-1.7B-Instruct

Viewer • Updated Jun 25 • 1.35k • 63

SaylorTwift/details_Qwen__Qwen3-14B

Viewer • Updated Jun 25 • 403 • 67

SaylorTwift/details_HuggingFaceTB__SmolLM2-1.7B

Viewer • Updated Jun 25 • 202 • 22

View 47 datasets