dptrsa
/

STAR-QA

@@ -7,40 +7,17 @@ tags:
 license: apache-2.0
 ---
-# {MODEL_NAME}
 Sentence Transformer for Assurance & Risk Question-Answering (STAR-QA) is a fine-tuned [sentence-transformers](https://www.SBERT.net) model based on ALL-MPNET-BASE-V2. It has been developed to produce **SOTA embeddings for audit, risk-management, compliance and associated regulatory documents**. The model maps sentence pairs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search as part of retrieval-augmented generation pipelines.
-<!--- Describe your model here -->
-## Usage (Sentence-Transformers)
-Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
-```
-pip install -U sentence-transformers
-```
-Then you can use the model like this:
-```python
-from sentence_transformers import SentenceTransformer
-sentences = ["This is an example sentence", "Each sentence is converted"]
-model = SentenceTransformer('{MODEL_NAME}')
-embeddings = model.encode(sentences)
-print(embeddings)
-```
 ## Evaluation Results
 The model was evaluated on a held-out sample from the STAR-QA dataset (see below) using `sentence-transformers.InformationRetrievalEvaluator`. Reported metrics include P/R @ 3 candidates, as well as MRR @ 10, MAP @ 10 and NDCG @ 100. This fine-tuned model was also benchmarked against its base model using the same methodology.
 ## Training Data
-The model was fine-tuned from a corpus of audit, risk-management, compliance and associated regulatory documents sourced from the public internet. Documents were cleaned and chunked into 2-sentence blocks. Each block was then sent to a state-of-the-art LLM with the following prompt:
-"Write a question about {document_topic} for which this is the answer: {block}"
 The resulting question and its associated ground-truth answer (collectively a "pair") constitute a single training example for the fine-tuning step.
@@ -90,4 +67,10 @@ SentenceTransformer(
 ## Citing & Authors
-@misc{Theron_2024, title={Sentence Transformer for Assurance &#38; Risk Question-Answering (STAR-QA)}, url={https://huggingface.co/dptrsa/STAR-QA}, author={Theron, Daniel}, year={2024}, month={Feb} }

 license: apache-2.0
 ---
+# Sentence Transformer for Assurance & Risk Question-Answering (STAR-QA)
 Sentence Transformer for Assurance & Risk Question-Answering (STAR-QA) is a fine-tuned [sentence-transformers](https://www.SBERT.net) model based on ALL-MPNET-BASE-V2. It has been developed to produce **SOTA embeddings for audit, risk-management, compliance and associated regulatory documents**. The model maps sentence pairs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search as part of retrieval-augmented generation pipelines.
 ## Evaluation Results
 The model was evaluated on a held-out sample from the STAR-QA dataset (see below) using `sentence-transformers.InformationRetrievalEvaluator`. Reported metrics include P/R @ 3 candidates, as well as MRR @ 10, MAP @ 10 and NDCG @ 100. This fine-tuned model was also benchmarked against its base model using the same methodology.
 ## Training Data
+The model was fine-tuned from a corpus of audit, risk-management, compliance and associated regulatory documents sourced from the public internet. Documents were cleaned and chunked into 2-sentence blocks. Each block was then sent to a state-of-the-art LLM with the following prompt: "Write a question about {document_topic} for which this is the answer: {block}"
 The resulting question and its associated ground-truth answer (collectively a "pair") constitute a single training example for the fine-tuning step.
 ## Citing & Authors
+@misc{Theron_2024,
+  title={Sentence Transformer for Assurance &#38; Risk Question-Answering (STAR-QA)},
+  url={https://huggingface.co/dptrsa/STAR-QA},
+  author={Theron, Daniel},
+  year={2024},
+  month={Feb}
+}