Update README.md
Browse files
README.md
CHANGED
@@ -4,12 +4,12 @@ tags:
|
|
4 |
- sentence-transformers
|
5 |
- feature-extraction
|
6 |
- sentence-similarity
|
7 |
-
|
8 |
---
|
9 |
|
10 |
# {MODEL_NAME}
|
11 |
|
12 |
-
|
13 |
|
14 |
<!--- Describe your model here -->
|
15 |
|
@@ -32,17 +32,20 @@ embeddings = model.encode(sentences)
|
|
32 |
print(embeddings)
|
33 |
```
|
34 |
|
|
|
35 |
|
|
|
36 |
|
37 |
-
##
|
38 |
|
39 |
-
|
40 |
|
41 |
-
|
42 |
|
|
|
43 |
|
44 |
## Training
|
45 |
-
The model was
|
46 |
|
47 |
**DataLoader**:
|
48 |
|
@@ -76,7 +79,6 @@ Parameters of the fit()-Method:
|
|
76 |
}
|
77 |
```
|
78 |
|
79 |
-
|
80 |
## Full Model Architecture
|
81 |
```
|
82 |
SentenceTransformer(
|
@@ -88,4 +90,4 @@ SentenceTransformer(
|
|
88 |
|
89 |
## Citing & Authors
|
90 |
|
91 |
-
|
|
|
4 |
- sentence-transformers
|
5 |
- feature-extraction
|
6 |
- sentence-similarity
|
7 |
+
license: apache-2.0
|
8 |
---
|
9 |
|
10 |
# {MODEL_NAME}
|
11 |
|
12 |
+
Sentence Transformer for Assurance & Risk Question-Answering (STAR-QA) is a fine-tuned [sentence-transformers](https://www.SBERT.net) model based on ALL-MPNET-BASE-V2. It has been developed to produce **SOTA embeddings for audit, risk-management, compliance and associated regulatory documents**. The model maps sentence pairs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search as part of retrieval-augmented generation pipelines.
|
13 |
|
14 |
<!--- Describe your model here -->
|
15 |
|
|
|
32 |
print(embeddings)
|
33 |
```
|
34 |
|
35 |
+
## Evaluation Results
|
36 |
|
37 |
+
The model was evaluated on a held-out sample from the STAR-QA dataset (see below) using `sentence-transformers.InformationRetrievalEvaluator`. Reported metrics include P/R @ 3 candidates, as well as MRR @ 10, MAP @ 10 and NDCG @ 100. This fine-tuned model was also benchmarked against its base model using the same methodology.
|
38 |
|
39 |
+
## Training Data
|
40 |
|
41 |
+
The model was fine-tuned from a corpus of audit, risk-management, compliance and associated regulatory documents sourced from the public internet. Documents were cleaned and chunked into 2-sentence blocks. Each block was then sent to a state-of-the-art LLM with the following prompt:
|
42 |
|
43 |
+
"Write a question about {document_topic} for which this is the answer: {block}"
|
44 |
|
45 |
+
The resulting question and its associated ground-truth answer (collectively a "pair") constitute a single training example for the fine-tuning step.
|
46 |
|
47 |
## Training
|
48 |
+
The model was fine-tuned with the parameters:
|
49 |
|
50 |
**DataLoader**:
|
51 |
|
|
|
79 |
}
|
80 |
```
|
81 |
|
|
|
82 |
## Full Model Architecture
|
83 |
```
|
84 |
SentenceTransformer(
|
|
|
90 |
|
91 |
## Citing & Authors
|
92 |
|
93 |
+
@misc{Theron_2024, title={Sentence Transformer for Assurance & Risk Question-Answering (STAR-QA)}, url={https://huggingface.co/dptrsa/STAR-QA}, author={Theron, Daniel}, year={2024}, month={Feb} }
|