metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:798
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-m
widget:
- source_sentence: >-
What is the definition of a sponsor-investigator according to the provided
context?
sentences:
- >-
§ 312.47 Meetings.
(a) General. Meetings between a sponsor and the agency are frequently
useful in resolving questions and
issues raised during the course of a clinical investigation. FDA
encourages such meetings to the extent
that they aid in the evaluation of the drug and in the solution of
scientific problems concerning the drug, to
the extent that FDA's resources permit. The general principle underlying
the conduct of such meetings is
- >-
employees to conduct an investigation that it has initiated is a
sponsor, not a sponsor-investigator, and
the employees are investigators.
Sponsor-Investigator means an individual who both initiates and conducts
an investigation, and under whose
immediate direction the investigational drug is administered or
dispensed. The term does not include any
person other than an individual. The requirements applicable to a
sponsor-investigator under this part
- >-
practice regulations in part 58, or, if the study was not conducted in
compliance with those
regulations, a brief statement of the reason for the noncompliance.
(9) Previous human experience with the investigational drug. A summary
of previous human experience
known to the applicant, if any, with the investigational drug. The
information is required to include
the following:
(i) If the investigational drug has been investigated or marketed
previously, either in the United
- source_sentence: What is the primary purpose of Phase 1 studies in drug development?
sentences:
- |-
§ 312.53 Selecting investigators and monitors.
§ 312.54 Emergency research under § 50.24 of this chapter.
§ 312.55 Informing investigators.
This content is from the eCFR and is authoritative but unofficial.
21 CFR Part 312 (up to date as of 1/23/2025)
Investigational New Drug Application 21 CFR Part 312 (Jan. 23, 2025)
21 CFR Part 312 (Jan. 23, 2025) (enhanced display) page 1 of 54
- >-
relevant to the safety of the drug as are required under § 312.32. The
sponsor shall make annual reports
on the progress of the investigation in accordance with § 312.33.
(d) A sponsor who determines that its investigational drug presents an
unreasonable and significant risk to
subjects shall discontinue those investigations that present the risk,
notify FDA, all institutional review
boards, and all investigators who have at any time participated in the
investigation of the discontinuance,
- >-
are typically closely monitored and may be conducted in patients or
normal volunteer subjects.
These studies are designed to determine the metabolism and pharmacologic
actions of the drug in
humans, the side effects associated with increasing doses, and, if
possible, to gain early evidence on
effectiveness. During Phase 1, sufficient information about the drug's
pharmacokinetics and
pharmacological effects should be obtained to permit the design of
well-controlled, scientifically
- source_sentence: >-
What is the required format for numbering submissions related to the
investigation?
sentences:
- >-
using a single, three-digit serial number. The initial IND is required
to be numbered 000; each subsequent
submission (e.g., amendment, report, or correspondence) is required to
be numbered chronologically in
sequence.
(f) Identification of exception from informed consent. If the
investigation involves an exception from informed
consent under § 50.24 of this chapter, the sponsor shall prominently
identify on the cover sheet that the
- >-
response time, a sponsor may not proceed with a clinical trial on which
a clinical hold has been imposed
until the sponsor has been notified by FDA that the hold has been
lifted.
(f) Appeal. If the sponsor disagrees with the reasons cited for the
clinical hold, the sponsor may request
reconsideration of the decision in accordance with § 312.48.
(g) Conversion of IND on clinical hold to inactive status. If all
investigations covered by an IND remain on
- >-
investigator, the sponsor of any investigation in which the investigator
has been named as a participant,
and the reviewing institutional review boards (IRBs) that the
investigator is not eligible to receive test
articles under this part. The notification to the investigator, sponsor,
and IRBs will provide a statement of
21 CFR Part 312 (up to date as of 1/23/2025)
Investigational New Drug Application 21 CFR 312.66
21 CFR 312.70(b) (enhanced display) page 37 of 54
- source_sentence: What are the regions mentioned in the context where drugs can be exported?
sentences:
- >-
Africa, or to any country in the European Union or the European Economic
Area, and complies with
the laws of the country to which it is being exported, the applicable
provisions of section 802(c), (f),
and (g) of the act, and § 1.101 of this chapter. Drugs exported under
this paragraph that are not the
subject of an IND are exempt from the label requirement in § 312.6(a);
or
(4) Except as provided in paragraph (b)(5) of this section, the person
exporting the drug sends an email
- >-
before its implementation. Protocol amendments to add a new investigator
or to provide additional
information about investigators may be grouped and submitted at 30-day
intervals. When several
submissions of new protocols or protocol changes are anticipated during
a short period, the sponsor is
encouraged, to the extent feasible, to include these all in a single
submission.
21 CFR Part 312 (up to date as of 1/23/2025)
Investigational New Drug Application 21 CFR 312.30(b)(2)(i)(b)
- >-
that apply to specific types of expanded access are described in §§
312.310 through 312.320.
(a) Scope. This subpart contains the requirements for the use of
investigational new drugs and approved
drugs where availability is limited by a risk evaluation and mitigation
strategy (REMS) when the primary
purpose is to diagnose, monitor, or treat a patient's disease or
condition. The aim of this subpart is to
- source_sentence: >-
What regulatory framework does 21 CFR Part 312 pertain to as of January
23, 2025?
sentences:
- >-
risk-benefit judgment in making the final decision on approvability. As
part of this evaluation, consistent
with the statement of purpose in § 312.80, FDA will consider whether the
benefits of the drug outweigh
the known and potential risks of the drug and the need to answer
remaining questions about risks and
benefits of the drug, taking into consideration the severity of the
disease and the absence of satisfactory
alternative therapy.
- >-
provide for disposition of the unused supplies of the drug under §
312.59.
(b) Case histories. An investigator is required to prepare and maintain
adequate and accurate case histories
that record all observations and other data pertinent to the
investigation on each individual administered
the investigational drug or employed as a control in the investigation.
Case histories include the case
report forms and supporting data including, for example, signed and
dated consent forms and medical
- |-
§ 312.315 Intermediate-size patient populations.
21 CFR Part 312 (up to date as of 1/23/2025)
Investigational New Drug Application 21 CFR Part 312 (Jan. 23, 2025)
21 CFR Part 312 (Jan. 23, 2025) (enhanced display) page 2 of 54
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.92
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.99
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.99
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.92
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.33000000000000007
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19799999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999998
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.92
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.99
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.99
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9637992620139386
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9516666666666665
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9516666666666667
name: Cosine Map@100
SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Snowflake/snowflake-arctic-embed-m
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("philipk22/ind312-ft-v0")
# Run inference
sentences = [
'What regulatory framework does 21 CFR Part 312 pertain to as of January 23, 2025?',
'§ 312.315 Intermediate-size patient populations.\n21 CFR Part 312 (up to date as of 1/23/2025)\nInvestigational New Drug Application 21 CFR Part 312 (Jan. 23, 2025)\n21 CFR Part 312 (Jan. 23, 2025) (enhanced display) page 2 of 54',
'risk-benefit judgment in making the final decision on approvability. As part of this evaluation, consistent\nwith the statement of purpose in § 312.80, FDA will consider whether the benefits of the drug outweigh\nthe known and potential risks of the drug and the need to answer remaining questions about risks and\nbenefits of the drug, taking into consideration the severity of the disease and the absence of satisfactory\nalternative therapy.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.92 |
cosine_accuracy@3 | 0.99 |
cosine_accuracy@5 | 0.99 |
cosine_accuracy@10 | 1.0 |
cosine_precision@1 | 0.92 |
cosine_precision@3 | 0.33 |
cosine_precision@5 | 0.198 |
cosine_precision@10 | 0.1 |
cosine_recall@1 | 0.92 |
cosine_recall@3 | 0.99 |
cosine_recall@5 | 0.99 |
cosine_recall@10 | 1.0 |
cosine_ndcg@10 | 0.9638 |
cosine_mrr@10 | 0.9517 |
cosine_map@100 | 0.9517 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 798 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 798 samples:
sentence_0 sentence_1 type string string details - min: 12 tokens
- mean: 20.82 tokens
- max: 46 tokens
- min: 19 tokens
- mean: 93.06 tokens
- max: 158 tokens
- Samples:
sentence_0 sentence_1 What is the scope of Part 312 in Title 21 regarding investigational new drug applications?
Title 21 —Food and Drugs
Chapter I —Food and Drug Administration, Department of Health and Human Services
Subchapter D —Drugs for Human Use
Part 312 Investigational New Drug Application
Subpart A General Provisions
§ 312.1 Scope.
§ 312.2 Applicability.
§ 312.3 Definitions and interpretations.
§ 312.6 Labeling of an investigational new drug.
§ 312.7 Promotion of investigational drugs.
§ 312.8 Charging for investigational drugs under an IND.
§ 312.10 Waivers.How does § 3126 address the labeling requirements for investigational new drugs?
Title 21 —Food and Drugs
Chapter I —Food and Drug Administration, Department of Health and Human Services
Subchapter D —Drugs for Human Use
Part 312 Investigational New Drug Application
Subpart A General Provisions
§ 312.1 Scope.
§ 312.2 Applicability.
§ 312.3 Definitions and interpretations.
§ 312.6 Labeling of an investigational new drug.
§ 312.7 Promotion of investigational drugs.
§ 312.8 Charging for investigational drugs under an IND.
§ 312.10 Waivers.What are the general principles outlined in § 31222 regarding the IND submission?
§ 312.10 Waivers.
Subpart B Investigational New Drug Application (IND)
§ 312.20 Requirement for an IND.
§ 312.21 Phases of an investigation.
§ 312.22 General principles of the IND submission.
§ 312.23 IND content and format.
§ 312.30 Protocol amendments.
§ 312.31 Information amendments.
§ 312.32 IND safety reporting.
§ 312.33 Annual reports.
§ 312.38 Withdrawal of an IND.
Subpart C Administrative Actions - Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 10per_device_eval_batch_size
: 10num_train_epochs
: 10multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 10per_device_eval_batch_size
: 10per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 10max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss | cosine_ndcg@10 |
---|---|---|---|
0.625 | 50 | - | 0.9091 |
1.0 | 80 | - | 0.9209 |
1.25 | 100 | - | 0.9329 |
1.875 | 150 | - | 0.9439 |
2.0 | 160 | - | 0.9379 |
2.5 | 200 | - | 0.9367 |
3.0 | 240 | - | 0.9459 |
3.125 | 250 | - | 0.9432 |
3.75 | 300 | - | 0.9479 |
4.0 | 320 | - | 0.9515 |
4.375 | 350 | - | 0.9509 |
5.0 | 400 | - | 0.9581 |
5.625 | 450 | - | 0.9551 |
6.0 | 480 | - | 0.9604 |
6.25 | 500 | 0.3078 | 0.9577 |
6.875 | 550 | - | 0.9651 |
7.0 | 560 | - | 0.9651 |
7.5 | 600 | - | 0.9641 |
8.0 | 640 | - | 0.9641 |
8.125 | 650 | - | 0.9638 |
8.75 | 700 | - | 0.9638 |
9.0 | 720 | - | 0.9638 |
9.375 | 750 | - | 0.9601 |
10.0 | 800 | - | 0.9638 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}