dptrsa commited on
Commit
39c4bb0
·
verified ·
1 Parent(s): e886fe6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -8
README.md CHANGED
@@ -4,12 +4,12 @@ tags:
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
-
8
  ---
9
 
10
  # {MODEL_NAME}
11
 
12
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
13
 
14
  <!--- Describe your model here -->
15
 
@@ -32,17 +32,20 @@ embeddings = model.encode(sentences)
32
  print(embeddings)
33
  ```
34
 
 
35
 
 
36
 
37
- ## Evaluation Results
38
 
39
- <!--- Describe how your model was evaluated -->
40
 
41
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
42
 
 
43
 
44
  ## Training
45
- The model was trained with the parameters:
46
 
47
  **DataLoader**:
48
 
@@ -76,7 +79,6 @@ Parameters of the fit()-Method:
76
  }
77
  ```
78
 
79
-
80
  ## Full Model Architecture
81
  ```
82
  SentenceTransformer(
@@ -88,4 +90,4 @@ SentenceTransformer(
88
 
89
  ## Citing & Authors
90
 
91
- <!--- Describe where people can find more information -->
 
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
+ license: apache-2.0
8
  ---
9
 
10
  # {MODEL_NAME}
11
 
12
+ Sentence Transformer for Assurance & Risk Question-Answering (STAR-QA) is a fine-tuned [sentence-transformers](https://www.SBERT.net) model based on ALL-MPNET-BASE-V2. It has been developed to produce **SOTA embeddings for audit, risk-management, compliance and associated regulatory documents**. The model maps sentence pairs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search as part of retrieval-augmented generation pipelines.
13
 
14
  <!--- Describe your model here -->
15
 
 
32
  print(embeddings)
33
  ```
34
 
35
+ ## Evaluation Results
36
 
37
+ The model was evaluated on a held-out sample from the STAR-QA dataset (see below) using `sentence-transformers.InformationRetrievalEvaluator`. Reported metrics include P/R @ 3 candidates, as well as MRR @ 10, MAP @ 10 and NDCG @ 100. This fine-tuned model was also benchmarked against its base model using the same methodology.
38
 
39
+ ## Training Data
40
 
41
+ The model was fine-tuned from a corpus of audit, risk-management, compliance and associated regulatory documents sourced from the public internet. Documents were cleaned and chunked into 2-sentence blocks. Each block was then sent to a state-of-the-art LLM with the following prompt:
42
 
43
+ "Write a question about {document_topic} for which this is the answer: {block}"
44
 
45
+ The resulting question and its associated ground-truth answer (collectively a "pair") constitute a single training example for the fine-tuning step.
46
 
47
  ## Training
48
+ The model was fine-tuned with the parameters:
49
 
50
  **DataLoader**:
51
 
 
79
  }
80
  ```
81
 
 
82
  ## Full Model Architecture
83
  ```
84
  SentenceTransformer(
 
90
 
91
  ## Citing & Authors
92
 
93
+ @misc{Theron_2024, title={Sentence Transformer for Assurance &#38; Risk Question-Answering (STAR-QA)}, url={https://huggingface.co/dptrsa/STAR-QA}, author={Theron, Daniel}, year={2024}, month={Feb} }