metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:156
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-l
widget:
- source_sentence: How is the author planning to utilize prompts in their Datasette project?
sentences:
- >-
January
7th: It’s OK to call it Artificial Intelligence
9th: What I should have said about the term Artificial Intelligence
17th: Talking about Open Source LLMs on Oxide and Friends
26th: LLM 0.13: The annotated release notes
February
21st: The killer app of Gemini Pro 1.5 is video
March
5th: Prompt injection and jailbreaking are not the same thing
8th: The GPT-4 barrier has finally been broken
22nd: Claude and ChatGPT for ad-hoc sidequests
23rd: Building and testing C extensions for SQLite with ChatGPT Code
Interpreter
26th: llm cmd undo last git commit—a new plugin for LLM
April
8th: Building files-to-prompt entirely using Claude 3 Opus
10th: Three major LLM releases in 24 hours (plus weeknotes)
- >-
Then in December, the Chatbot Arena team introduced a whole new
leaderboard for this feature, driven by users building the same
interactive app twice with two different models and voting on the
answer. Hard to come up with a more convincing argument that this
feature is now a commodity that can be effectively implemented against
all of the leading models.
I’ve been tinkering with a version of this myself for my Datasette
project, with the goal of letting users use prompts to build and iterate
on custom widgets and data visualizations against their own data. I also
figured out a similar pattern for writing one-shot Python programs,
enabled by uv.
- >-
Another common technique is to use larger models to help create training
data for their smaller, cheaper alternatives—a trick used by an
increasing number of labs. DeepSeek v3 used “reasoning” data created by
DeepSeek-R1. Meta’s Llama 3.3 70B fine-tuning used over 25M
synthetically generated examples.
Careful design of the training data that goes into an LLM appears to be
the entire game for creating these models. The days of just grabbing a
full scrape of the web and indiscriminately dumping it into a training
run are long gone.
LLMs somehow got even harder to use
- source_sentence: What are the potential pitfalls of using LLMs as power-user tools?
sentences:
- >-
Another common technique is to use larger models to help create training
data for their smaller, cheaper alternatives—a trick used by an
increasing number of labs. DeepSeek v3 used “reasoning” data created by
DeepSeek-R1. Meta’s Llama 3.3 70B fine-tuning used over 25M
synthetically generated examples.
Careful design of the training data that goes into an LLM appears to be
the entire game for creating these models. The days of just grabbing a
full scrape of the web and indiscriminately dumping it into a training
run are long gone.
LLMs somehow got even harder to use
- >-
A drum I’ve been banging for a while is that LLMs are power-user
tools—they’re chainsaws disguised as kitchen knives. They look
deceptively simple to use—how hard can it be to type messages to a
chatbot?—but in reality you need a huge depth of both understanding and
experience to make the most of them and avoid their many pitfalls.
If anything, this problem got worse in 2024.
We’ve built computer systems you can talk to in human language, that
will answer your questions and usually get them right! ... depending on
the question, and how you ask it, and whether it’s accurately reflected
in the undocumented and secret training set.
- >-
These abilities are just a few weeks old at this point, and I don’t
think their impact has been fully felt yet. If you haven’t tried them
out yet you really should.
Both Gemini and OpenAI offer API access to these features as well.
OpenAI started with a WebSocket API that was quite challenging to use,
but in December they announced a new WebRTC API which is much easier to
get started with. Building a web app that a user can talk to via voice
is easy now!
Prompt driven app generation is a commodity already
This was possible with GPT-4 in 2023, but the value it provides became
evident in 2024.
- source_sentence: What challenges are associated with using LLMs in the year of slop?
sentences:
- >-
So far, I think they’re a net positive. I’ve used them on a personal
level to improve my productivity (and entertain myself) in all sorts of
different ways. I think people who learn how to use them effectively can
gain a significant boost to their quality of life.
A lot of people are yet to be sold on their value! Some think their
negatives outweigh their positives, some think they are all hot air, and
some even think they represent an existential threat to humanity.
They’re actually quite easy to build
The most surprising thing we’ve learned about LLMs this year is that
they’re actually quite easy to build.
- |-
The year of slop
Synthetic training data works great
LLMs somehow got even harder to use
Knowledge is incredibly unevenly distributed
LLMs need better criticism
Everything tagged “llms” on my blog in 2024
- >-
Meta’s Llama 3.2 models deserve a special mention. They may not be GPT-4
class, but at 1B and 3B sizes they punch massively above their weight. I
run Llama 3.2 3B on my iPhone using the free MLC Chat iOS app and it’s a
shockingly capable model for its tiny (<2GB) size. Try firing it up and
asking it for “a plot outline of a Netflix Christmas movie where a data
journalist falls in love with a local ceramacist”. Here’s what I got, at
a respectable 20 tokens per second:
- source_sentence: >-
What capabilities does Google’s Gemini have regarding audio input and
output?
sentences:
- >-
There’s a flipside to this too: a lot of better informed people have
sworn off LLMs entirely because they can’t see how anyone could benefit
from a tool with so many flaws. The key skill in getting the most out of
LLMs is learning to work with tech that is both inherently unreliable
and incredibly powerful at the same time. This is a decidedly
non-obvious skill to acquire!
There is so much space for helpful education content here, but we need
to do do a lot better than outsourcing it all to AI grifters with
bombastic Twitter threads.
Knowledge is incredibly unevenly distributed
Most people have heard of ChatGPT by now. How many have heard of Claude?
- >-
There’s still plenty to worry about with respect to the environmental
impact of the great AI datacenter buildout, but a lot of the concerns
over the energy cost of individual prompts are no longer credible.
Here’s a fun napkin calculation: how much would it cost to generate
short descriptions of every one of the 68,000 photos in my personal
photo library using Google’s Gemini 1.5 Flash 8B (released in October),
their cheapest model?
Each photo would need 260 input tokens and around 100 output tokens.
260 * 68,000 = 17,680,000 input tokens
17,680,000 * $0.0375/million = $0.66
100 * 68,000 = 6,800,000 output tokens
6,800,000 * $0.15/million = $1.02
- >-
Your browser does not support the audio element.
OpenAI aren’t the only group with a multi-modal audio model. Google’s
Gemini also accepts audio input, and the Google Gemini apps can speak in
a similar way to ChatGPT now. Amazon also pre-announced voice mode for
Amazon Nova, but that’s meant to roll out in Q1 of 2025.
Google’s NotebookLM, released in September, took audio output to a new
level by producing spookily realistic conversations between two “podcast
hosts” about anything you fed into their tool. They later added custom
instructions, so naturally I turned them into pelicans:
Your browser does not support the audio element.
- source_sentence: >-
What improvements were noted in the intonation of ChatGPT Advanced Voice
mode during its rollout?
sentences:
- >-
When ChatGPT Advanced Voice mode finally did roll out (a slow roll from
August through September) it was spectacular. I’ve been using it
extensively on walks with my dog and it’s amazing how much the
improvement in intonation elevates the material. I’ve also had a lot of
fun experimenting with the OpenAI audio APIs.
Even more fun: Advanced Voice mode can do accents! Here’s what happened
when I told it I need you to pretend to be a California brown pelican
with a very thick Russian accent, but you talk to me exclusively in
Spanish.
- >-
When @v0 first came out we were paranoid about protecting the prompt
with all kinds of pre and post processing complexity.
We completely pivoted to let it rip. A prompt without the evals, models,
and especially UX is like getting a broken ASML machine without a manual
- >-
January
7th: It’s OK to call it Artificial Intelligence
9th: What I should have said about the term Artificial Intelligence
17th: Talking about Open Source LLMs on Oxide and Friends
26th: LLM 0.13: The annotated release notes
February
21st: The killer app of Gemini Pro 1.5 is video
March
5th: Prompt injection and jailbreaking are not the same thing
8th: The GPT-4 barrier has finally been broken
22nd: Claude and ChatGPT for ad-hoc sidequests
23rd: Building and testing C extensions for SQLite with ChatGPT Code
Interpreter
26th: llm cmd undo last git commit—a new plugin for LLM
April
8th: Building files-to-prompt entirely using Claude 3 Opus
10th: Three major LLM releases in 24 hours (plus weeknotes)
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.75
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.75
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.20000000000000004
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.10000000000000002
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.75
name: Cosine Recall@1
- type: cosine_recall@3
value: 1
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8968216255952429
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.861111111111111
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8611111111111112
name: Cosine Map@100
SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Snowflake/snowflake-arctic-embed-l
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ngiometti/legal-ft-2")
# Run inference
sentences = [
'What improvements were noted in the intonation of ChatGPT Advanced Voice mode during its rollout?',
'When ChatGPT Advanced Voice mode finally did roll out (a slow roll from August through September) it was spectacular. I’ve been using it extensively on walks with my dog and it’s amazing how much the improvement in intonation elevates the material. I’ve also had a lot of fun experimenting with the OpenAI audio APIs.\nEven more fun: Advanced Voice mode can do accents! Here’s what happened when I told it I need you to pretend to be a California brown pelican with a very thick Russian accent, but you talk to me exclusively in Spanish.',
'January\n\n7th: It’s OK to call it Artificial Intelligence\n\n9th: What I should have said about the term Artificial Intelligence\n\n17th: Talking about Open Source LLMs on Oxide and Friends\n\n26th: LLM 0.13: The annotated release notes\n\n\n\nFebruary\n\n21st: The killer app of Gemini Pro 1.5 is video\n\n\n\nMarch\n\n5th: Prompt injection and jailbreaking are not the same thing\n\n8th: The GPT-4 barrier has finally been broken\n\n22nd: Claude and ChatGPT for ad-hoc sidequests\n\n23rd: Building and testing C extensions for SQLite with ChatGPT Code Interpreter\n\n26th: llm cmd undo last git commit—a new plugin for LLM\n\n\n\nApril\n\n8th: Building files-to-prompt entirely using Claude 3 Opus\n\n10th: Three major LLM releases in 24 hours (plus weeknotes)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.75 |
cosine_accuracy@3 | 1.0 |
cosine_accuracy@5 | 1.0 |
cosine_accuracy@10 | 1.0 |
cosine_precision@1 | 0.75 |
cosine_precision@3 | 0.3333 |
cosine_precision@5 | 0.2 |
cosine_precision@10 | 0.1 |
cosine_recall@1 | 0.75 |
cosine_recall@3 | 1.0 |
cosine_recall@5 | 1.0 |
cosine_recall@10 | 1.0 |
cosine_ndcg@10 | 0.8968 |
cosine_mrr@10 | 0.8611 |
cosine_map@100 | 0.8611 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 156 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 156 samples:
sentence_0 sentence_1 type string string details - min: 14 tokens
- mean: 20.31 tokens
- max: 36 tokens
- min: 43 tokens
- mean: 130.44 tokens
- max: 204 tokens
- Samples:
sentence_0 sentence_1 What are some potential applications of Large Language Models (LLMs) mentioned in the context?
Large Language Models
They’re actually quite easy to build
You can run LLMs on your own devices
Hobbyists can build their own fine-tuned models
We don’t yet know how to build GPT-4
Vibes Based Development
LLMs are really smart, and also really, really dumb
Gullibility is the biggest unsolved problem
Code may be the best application
The ethics of this space remain diabolically complex
My blog in 2023What is identified as the biggest unsolved problem related to LLMs?
Large Language Models
They’re actually quite easy to build
You can run LLMs on your own devices
Hobbyists can build their own fine-tuned models
We don’t yet know how to build GPT-4
Vibes Based Development
LLMs are really smart, and also really, really dumb
Gullibility is the biggest unsolved problem
Code may be the best application
The ethics of this space remain diabolically complex
My blog in 2023What improvements were noted in the intonation of ChatGPT Advanced Voice mode during its rollout?
When ChatGPT Advanced Voice mode finally did roll out (a slow roll from August through September) it was spectacular. I’ve been using it extensively on walks with my dog and it’s amazing how much the improvement in intonation elevates the material. I’ve also had a lot of fun experimenting with the OpenAI audio APIs.
Even more fun: Advanced Voice mode can do accents! Here’s what happened when I told it I need you to pretend to be a California brown pelican with a very thick Russian accent, but you talk to me exclusively in Spanish. - Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 10per_device_eval_batch_size
: 10num_train_epochs
: 10multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 10per_device_eval_batch_size
: 10per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 10max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | cosine_ndcg@10 |
---|---|---|
1.0 | 16 | 0.9122 |
2.0 | 32 | 0.9093 |
3.0 | 48 | 0.8968 |
3.125 | 50 | 0.8968 |
4.0 | 64 | 0.8939 |
5.0 | 80 | 0.8908 |
6.0 | 96 | 0.8908 |
6.25 | 100 | 0.8908 |
7.0 | 112 | 0.8939 |
8.0 | 128 | 0.8968 |
9.0 | 144 | 0.8968 |
9.375 | 150 | 0.8968 |
10.0 | 160 | 0.8968 |
Framework Versions
- Python: 3.13.1
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}