naver
/

provence-reranker-debertav3-v1

Safetensors

English

Provence

custom_code

Model card Files Files and versions Community

nadiinchi commited on Jan 13

Commit

27108b5

verified ·

1 Parent(s): 0aef913

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -53,10 +53,10 @@ Training code, as well as RAG experiments with Provence can be found in the [BER
 Interface of the `process` function:
 * `question`: `Union[List[str], str]`: an input question (str) or a list of input questions (for batched processing)
-* `context`: `Union[List[List[str]], [List[str]], str]`: context(s) to be pruned. This can be either a single string (in case of a singe str question), or a list of lists contexts (a list of contexts per question), with `len(contexts)` equal to `len(questions)`
 *  `title`: `Optional[Union[List[List[str]], str]]`, _default: “first_sentence”_: an optional argument for defining titles. If `title=first_sentence`, then the first sentence of each context is assumed to be the title. If `title=None`, then it is assumed that no titles are provided. Titles can be also passed as a list of lists of str, i.e. titles shaped the same way as contexts. Titles are only used if `always_select_title=True`.
 * `threshold` _(float, $ \in [0, 1]$, default 0.1)_: which threshold to use for context pruning. We recommend 0.1 for more conservative pruning (no performance drop or lowest performance drops) and 0.5 for higher compression, but this value can be further tuned to meet the specific use case requirements.
-* `always_select_title` _(bool, default: True)_: if True, the first sentence (title) will always be selected. This is important, e.g., for Wikipedia passages, to provide proper contextualization for the next sentences.
 * `batch_size` (int, default: 32)
 * `reorder` _(bool, default: False)_: if True, the provided contexts for each question will be reordered according to the computed question-passage relevance scores. If False, the original user-provided order of contexts will be preserved.
 * `top_k` _(int, default: 5)_: if `reorder=True`, specifies the number of top-ranked passages to keep for each question.
@@ -66,7 +66,7 @@ Interface of the `process` function:
 * **Provence encodes all sentences in the passage together**: this enables capturing of coreferences between sentences and provides more accurate context pruning.
 * **Provence automatically detects the number of sentences to keep**, based on a threshold. We found that the default value of a threshold works well across various domains, but the threshold can be adjusted further to better meet the particular use case needs.
-* **Provence is robust to various domains**, being trained on a combination of diverse MS Marco and NQ data.
 * **Provence works out-of-the-box with any LLM**.
 * **Provence is fast**: we release a standalone DeBERTa-based model [here]() and a unified reranking+context pruning model, which incorporates context pruning into reranking, an already existing stage of modern RAG pipelines. The latter makes context pruning basically zero cost in the RAG pipeline!

 Interface of the `process` function:
 * `question`: `Union[List[str], str]`: an input question (str) or a list of input questions (for batched processing)
+* `context`: `Union[List[List[str]], str]`: context(s) to be pruned. This can be either a single string (in case of a singe str question), or a list of lists contexts (a list of contexts per question), with `len(contexts)` equal to `len(questions)`
 *  `title`: `Optional[Union[List[List[str]], str]]`, _default: “first_sentence”_: an optional argument for defining titles. If `title=first_sentence`, then the first sentence of each context is assumed to be the title. If `title=None`, then it is assumed that no titles are provided. Titles can be also passed as a list of lists of str, i.e. titles shaped the same way as contexts. Titles are only used if `always_select_title=True`.
 * `threshold` _(float, $ \in [0, 1]$, default 0.1)_: which threshold to use for context pruning. We recommend 0.1 for more conservative pruning (no performance drop or lowest performance drops) and 0.5 for higher compression, but this value can be further tuned to meet the specific use case requirements.
+* `always_select_title` _(bool, default: True)_: if True, the first sentence (title) will be included into the selection each time the model select a non-empty selection of sentences. This is important, e.g., for Wikipedia passages, to provide proper contextualization for the next sentences.
 * `batch_size` (int, default: 32)
 * `reorder` _(bool, default: False)_: if True, the provided contexts for each question will be reordered according to the computed question-passage relevance scores. If False, the original user-provided order of contexts will be preserved.
 * `top_k` _(int, default: 5)_: if `reorder=True`, specifies the number of top-ranked passages to keep for each question.
 * **Provence encodes all sentences in the passage together**: this enables capturing of coreferences between sentences and provides more accurate context pruning.
 * **Provence automatically detects the number of sentences to keep**, based on a threshold. We found that the default value of a threshold works well across various domains, but the threshold can be adjusted further to better meet the particular use case needs.
+* **Provence is robust to various domains**, being trained on a combination of diverse MS Marco and Natural Questions data.
 * **Provence works out-of-the-box with any LLM**.
 * **Provence is fast**: we release a standalone DeBERTa-based model [here]() and a unified reranking+context pruning model, which incorporates context pruning into reranking, an already existing stage of modern RAG pipelines. The latter makes context pruning basically zero cost in the RAG pipeline!