File size: 2,790 Bytes
f0a4095
 
42bf832
8213fcb
 
f8694ee
 
f0a4095
 
 
 
 
 
 
 
 
 
 
 
 
 
8213fcb
 
 
 
 
f0a4095
8213fcb
f0a4095
 
 
8213fcb
f0a4095
3f90d01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
acfb065
 
3f90d01
 
 
 
 
 
f0a4095
8213fcb
f0a4095
 
 
8213fcb
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
library_name: transformers
license: cc-by-4.0
language:
- en
base_model:
- Equall/Saul-7B-Base
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** Ehsan Shareghi, Jiuzhou Han, Paul Burgess
- **Model type:** 7B
- **Language(s) (NLP):** English
- **License:** CC BY 4.0
- **Finetuned from model:** Saul-7B-Base

### Model Sources

<!-- Provide the basic links for the model. -->

- **Paper:** https://arxiv.org/pdf/2412.06272

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Here's how you can run the model:

```python
# pip install git+https://github.com/huggingface/transformers.git
# pip install git+https://github.com/huggingface/peft.git

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig
)
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained(
    "Equall/Saul-7B-Base",
    quantization_config=BitsAndBytesConfig(load_in_8bit=True),
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained("Equall/Saul-7B-Base")
tokenizer.pad_token = tokenizer.eos_token

model = PeftModel.from_pretrained(
            model,
            "auslawbench/Cite-SaulLM-7B",
            device_map="auto",
            torch_dtype=torch.bfloat16,
        )
model.eval()

fine_tuned_prompt = """
### Instruction:
{}

### Input:
{}

### Response:
{}"""

example_input="Many of ZAR’s grounds of appeal related to fact finding. Drawing on principles set down in several other courts and tribunals, the Appeal Panel summarised the circumstances in which leave may be granted for a person to appeal from findings of fact: <CASENAME> at [84]."
model_input = fine_tuned_prompt.format("Predict the name of the case that needs to be cited in the text and explain why it should be cited.", example_input, '')
inputs = tokenizer(model_input, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=1.0)
output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(output.split("### Response:")[1].strip().split('>')[0] + '>')

```

## Citation

**BibTeX:**

```
@misc{shareghi2024auslawcite,
      title={Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study}, 
      author={Ehsan Shareghi, Jiuzhou Han, Paul Burgess},
      year={2024},
      eprint={arXiv:2412.06272},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```