--- library_name: transformers license: apache-2.0 datasets: - Tevatron/msmarco-passage --- # QWEN2.5 based Setwise reranker fine-tuned on MSMARCO dataset GitHub repo: https://github.com/ielab/llm-rankers/tree/main/Rank-R1 # Python code examples: Using [llm-rankers](https://github.com/ielab/llm-rankers) library: ```Python from llmrankers.setwise import RankR1SetwiseLlmRanker from llmrankers.rankers import SearchResult docs = [SearchResult(docid=i, text=f'this is passage {i}', score=None) for i in range(20)] query = 'Give me passage 6' ranker = RankR1SetwiseLlmRanker( model_name_or_path='Qwen/Qwen2.5-3B-Instruct', lora_name_or_path='ielabgroup/Setwise-SFT-3B-v0.1', prompt_file='prompt_setwise.toml', num_child=19, k=1, verbose=True ) print(ranker.rerank(query, docs)[0]) ``` The `prompt_setwise.toml` is a .toml file with the following fields: ```toml prompt_system = "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant provides the user with the answer enclosed within tags, i.e., answer here ." prompt_user = '''Given the query: "{query}", which of the following documents is most relevant? {docs} Please provide only the label of the most relevant document to the query, enclosed in square brackets, within the answer tags. For example, if the third document is the most relevant, the answer should be: [3].''' pattern = '(.*?)' ``` Internally, the above code is equivalent to the following transformers code: ```Python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel, PeftConfig def get_model(peft_model_name): config = PeftConfig.from_pretrained(peft_model_name) base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path) model = PeftModel.from_pretrained(base_model, peft_model_name) model = model.merge_and_unload() return model # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-3B-Instruct') model = get_model('ielabgroup/Setwise-SFT-3B-v0.1').to('cuda:0').eval() prompt_system = "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant provides the user with the answer enclosed within tags, i.e., answer here ." prompt_user = '''Given the query: "{query}", which of the following documents is most relevant? {docs} Please provide only the label of the most relevant document to the query, enclosed in square brackets, within the answer tags. For example, if the third document is the most relevant, the answer should be: [3].''' query = 'Give me passage 6' docs = [f'[{i+1}] this is passage {i+1}' for i in range(20)] docs = '\n'.join(docs) messages = [ {'role': "system", 'content': prompt_system}, {'role': "user", 'content': prompt_user.format(query=query, docs=docs)} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to('cuda:0') generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=2048, do_sample=False, ) generated_ids = [ output_ids[len(input_ids)-1:] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ''' [6] ''' # extract the answer import re pattern = '(.*?)' answer = re.search(pattern, response, re.DOTALL).group(1) # answer = '[6]' ``` > Note that this Setwise rerankers are trained with the prompt format shown above, which includes 20 documents. Other numbers of documents should also work fine, but this would represent a "zero-shot" setting for the model.