Update README.md
Browse files
README.md
CHANGED
@@ -53,10 +53,10 @@ Training code, as well as RAG experiments with Provence can be found in the [BER
|
|
53 |
|
54 |
Interface of the `process` function:
|
55 |
* `question`: `Union[List[str], str]`: an input question (str) or a list of input questions (for batched processing)
|
56 |
-
* `context`: `Union[List[List[str]],
|
57 |
* `title`: `Optional[Union[List[List[str]], str]]`, _default: “first_sentence”_: an optional argument for defining titles. If `title=first_sentence`, then the first sentence of each context is assumed to be the title. If `title=None`, then it is assumed that no titles are provided. Titles can be also passed as a list of lists of str, i.e. titles shaped the same way as contexts. Titles are only used if `always_select_title=True`.
|
58 |
* `threshold` _(float, $ \in [0, 1]$, default 0.1)_: which threshold to use for context pruning. We recommend 0.1 for more conservative pruning (no performance drop or lowest performance drops) and 0.5 for higher compression, but this value can be further tuned to meet the specific use case requirements.
|
59 |
-
* `always_select_title` _(bool, default: True)_: if True, the first sentence (title) will
|
60 |
* `batch_size` (int, default: 32)
|
61 |
* `reorder` _(bool, default: False)_: if True, the provided contexts for each question will be reordered according to the computed question-passage relevance scores. If False, the original user-provided order of contexts will be preserved.
|
62 |
* `top_k` _(int, default: 5)_: if `reorder=True`, specifies the number of top-ranked passages to keep for each question.
|
@@ -66,7 +66,7 @@ Interface of the `process` function:
|
|
66 |
|
67 |
* **Provence encodes all sentences in the passage together**: this enables capturing of coreferences between sentences and provides more accurate context pruning.
|
68 |
* **Provence automatically detects the number of sentences to keep**, based on a threshold. We found that the default value of a threshold works well across various domains, but the threshold can be adjusted further to better meet the particular use case needs.
|
69 |
-
* **Provence is robust to various domains**, being trained on a combination of diverse MS Marco and
|
70 |
* **Provence works out-of-the-box with any LLM**.
|
71 |
* **Provence is fast**: we release a standalone DeBERTa-based model [here]() and a unified reranking+context pruning model, which incorporates context pruning into reranking, an already existing stage of modern RAG pipelines. The latter makes context pruning basically zero cost in the RAG pipeline!
|
72 |
|
|
|
53 |
|
54 |
Interface of the `process` function:
|
55 |
* `question`: `Union[List[str], str]`: an input question (str) or a list of input questions (for batched processing)
|
56 |
+
* `context`: `Union[List[List[str]], str]`: context(s) to be pruned. This can be either a single string (in case of a singe str question), or a list of lists contexts (a list of contexts per question), with `len(contexts)` equal to `len(questions)`
|
57 |
* `title`: `Optional[Union[List[List[str]], str]]`, _default: “first_sentence”_: an optional argument for defining titles. If `title=first_sentence`, then the first sentence of each context is assumed to be the title. If `title=None`, then it is assumed that no titles are provided. Titles can be also passed as a list of lists of str, i.e. titles shaped the same way as contexts. Titles are only used if `always_select_title=True`.
|
58 |
* `threshold` _(float, $ \in [0, 1]$, default 0.1)_: which threshold to use for context pruning. We recommend 0.1 for more conservative pruning (no performance drop or lowest performance drops) and 0.5 for higher compression, but this value can be further tuned to meet the specific use case requirements.
|
59 |
+
* `always_select_title` _(bool, default: True)_: if True, the first sentence (title) will be included into the selection each time the model select a non-empty selection of sentences. This is important, e.g., for Wikipedia passages, to provide proper contextualization for the next sentences.
|
60 |
* `batch_size` (int, default: 32)
|
61 |
* `reorder` _(bool, default: False)_: if True, the provided contexts for each question will be reordered according to the computed question-passage relevance scores. If False, the original user-provided order of contexts will be preserved.
|
62 |
* `top_k` _(int, default: 5)_: if `reorder=True`, specifies the number of top-ranked passages to keep for each question.
|
|
|
66 |
|
67 |
* **Provence encodes all sentences in the passage together**: this enables capturing of coreferences between sentences and provides more accurate context pruning.
|
68 |
* **Provence automatically detects the number of sentences to keep**, based on a threshold. We found that the default value of a threshold works well across various domains, but the threshold can be adjusted further to better meet the particular use case needs.
|
69 |
+
* **Provence is robust to various domains**, being trained on a combination of diverse MS Marco and Natural Questions data.
|
70 |
* **Provence works out-of-the-box with any LLM**.
|
71 |
* **Provence is fast**: we release a standalone DeBERTa-based model [here]() and a unified reranking+context pruning model, which incorporates context pruning into reranking, an already existing stage of modern RAG pipelines. The latter makes context pruning basically zero cost in the RAG pipeline!
|
72 |
|