ThomasBaruzier
/

gemma-2-9b-it-GGUF

Text Generation

Transformers

GGUF

conversational

Inference Endpoints

Model card Files Files and versions Community

ThomasBaruzier commited on Aug 4, 2024

Commit

7f8732e

verified ·

1 Parent(s): a456795

Update README.md

Browse files

Files changed (1) hide show

README.md +105 -258

README.md CHANGED Viewed

@@ -28,15 +28,15 @@ All quants were made using the imatrix option and Bartowski's [calibration file]
 # Gemma 2 model card
-**Model Page**: [Gemma](https://ai.google.dev/gemma/docs/base)
 **Resources and Technical Documentation**:
 * [Responsible Generative AI Toolkit][rai-toolkit]
 * [Gemma on Kaggle][kaggle-gemma]
-* [Gemma on Vertex Model Garden][vertex-mg-gemma2]
-**Terms of Use**: [Terms][terms]
 **Authors**: Google
@@ -58,65 +58,28 @@ state of the art AI models and helping foster innovation for everyone.
 ### Usage
-Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
-```sh
-pip install -U transformers
-```
-Then, copy the snippet from the section that is relevant for your usecase.
-#### Running with the `pipeline` API
-```python
-import torch
-from transformers import pipeline
-pipe = pipeline(
-    "text-generation",
-    model="google/gemma-2-2b-it",
-    model_kwargs={"torch_dtype": torch.bfloat16},
-    device="cuda",  # replace with "mps" to run on a Mac device
-)
-messages = [
-    {"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
-]
-outputs = pipe(messages, max_new_tokens=256)
-assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
-print(assistant_response)
-# Ahoy, matey! I be Gemma, a digital scallywag, a language-slingin' parrot of the digital seas. I be here to help ye with yer wordy woes, answer yer questions, and spin ye yarns of the digital world.  So, what be yer pleasure, eh? 🦜
-```
 #### Running the model on a single / multi GPU
 ```python
 # pip install accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
-tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
 model = AutoModelForCausalLM.from_pretrained(
-    "google/gemma-2-2b-it",
     device_map="auto",
-    torch_dtype=torch.bfloat16,
 )
 input_text = "Write me a poem about Machine Learning."
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
-outputs = model.generate(**input_ids, max_new_tokens=32)
-print(tokenizer.decode(outputs[0]))
-```
-You can ensure the correct chat template is applied by using `tokenizer.apply_chat_template` as follows:
-```python
-messages = [
-    {"role": "user", "content": "Write me a poem about Machine Learning."},
-]
-input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
-outputs = model.generate(**input_ids, max_new_tokens=256)
 print(tokenizer.decode(outputs[0]))
 ```
@@ -133,35 +96,21 @@ You can also use `float32` if you skip the dtype, but no precision increase will
 # pip install accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
 model = AutoModelForCausalLM.from_pretrained(
-    "google/gemma-2-2b-it",
-    device_map="auto",
-)
 input_text = "Write me a poem about Machine Learning."
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
-outputs = model.generate(**input_ids, max_new_tokens=32)
 print(tokenizer.decode(outputs[0]))
 ```
-#### Running the model through a CLI
-The [local-gemma](https://github.com/huggingface/local-gemma) repository contains a lightweight wrapper around Transformers
-for running Gemma 2 through a command line interface, or CLI. Follow the [installation instructions](https://github.com/huggingface/local-gemma#cli-usage)
-for getting started, then launch the CLI through the following command:
-```shell
-local-gemma --model 2b --preset speed
-```
 #### Quantized Versions through `bitsandbytes`
-<details>
-  <summary>
-    Using 8-bit precision (int8)
-  </summary>
 ```python
 # pip install bitsandbytes accelerate
@@ -169,24 +118,19 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
 quantization_config = BitsAndBytesConfig(load_in_8bit=True)
-tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
 model = AutoModelForCausalLM.from_pretrained(
-    "google/gemma-2-2b-it",
-    quantization_config=quantization_config,
-)
 input_text = "Write me a poem about Machine Learning."
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
-outputs = model.generate(**input_ids, max_new_tokens=32)
 print(tokenizer.decode(outputs[0]))
 ```
-</details>
-<details>
-  <summary>
-    Using 4-bit precision
-  </summary>
 ```python
 # pip install bitsandbytes accelerate
@@ -194,81 +138,82 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
 quantization_config = BitsAndBytesConfig(load_in_4bit=True)
-tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
 model = AutoModelForCausalLM.from_pretrained(
-    "google/gemma-2-2b-it",
-    quantization_config=quantization_config,
-)
 input_text = "Write me a poem about Machine Learning."
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
-outputs = model.generate(**input_ids, max_new_tokens=32)
 print(tokenizer.decode(outputs[0]))
 ```
-</details>
-#### Advanced Usage
-<details>
-  <summary>
-    Torch compile
-  </summary>
-[Torch compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) is a method for speeding-up the
-inference of PyTorch modules. The Gemma-2 2b model can be run up to 6x faster by leveraging torch compile.
-Note that two warm-up steps are required before the full inference speed is realised:
-```python
-import os
-os.environ["TOKENIZERS_PARALLELISM"] = "false"
-from transformers import AutoTokenizer, Gemma2ForCausalLM
-from transformers.cache_utils import HybridCache
-import torch
-torch.set_float32_matmul_precision("high")
-# load the model + tokenizer
-tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
-model = Gemma2ForCausalLM.from_pretrained("google/gemma-2-2b-it", torch_dtype=torch.bfloat16)
-model.to("cuda")
-# apply the torch compile transformation
-model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)
-# pre-process inputs
-input_text = "The theory of special relativity states "
-model_inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
-prompt_length = model_inputs.input_ids.shape[1]
-# set-up k/v cache
-past_key_values = HybridCache(
-    config=model.config,
-    max_batch_size=1,
-    max_cache_len=model.config.max_position_embeddings,
-    device=model.device,
-    dtype=model.dtype
-)
-# enable passing kv cache to generate
-model._supports_cache_class = True
-model.generation_config.cache_implementation = None
-# two warm-up steps
-for idx in range(2):
-    outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
-    past_key_values.reset()
-# fast run
-outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
-For more details, refer to the [Transformers documentation](https://huggingface.co/docs/transformers/main/en/llm_optims?static-kv=basic+usage%3A+generation_config).
-</details>
 ### Inputs and outputs
@@ -296,9 +241,7 @@ Data used for model training and how the data was processed.
 ### Training Dataset
-These models were trained on a dataset of text data that includes a wide variety
-of sources. The 27B model was trained with 13 trillion tokens, the 9B model was
-trained with 8 trillion tokens, and 2B model was trained with 2 trillion tokens.
 Here are the key components:
 * Web Documents: A diverse collection of web text ensures the model is exposed
@@ -384,25 +327,25 @@ Model evaluation metrics and results.
 These models were evaluated against a large collection of different datasets and
 metrics to cover different aspects of text generation:
-| Benchmark                      | Metric        | Gemma 2 PT 2B | Gemma 2 PT 9B | Gemma 2 PT 27B |
-| ------------------------------ | ------------- | ------------- | ------------- | -------------- |
-| [MMLU][mmlu]                   | 5-shot, top-1 | 51.3          | 71.3          | 75.2           |
-| [HellaSwag][hellaswag]         | 10-shot       | 73.0          | 81.9          | 86.4           |
-| [PIQA][piqa]                   | 0-shot        | 77.8          | 81.7          | 83.2           |
-| [SocialIQA][socialiqa]         | 0-shot        | 51.9          | 53.4          | 53.7           |
-| [BoolQ][boolq]                 | 0-shot        | 72.5          | 84.2          | 84.8           |
-| [WinoGrande][winogrande]       | partial score | 70.9          | 80.6          | 83.7           |
-| [ARC-e][arc]                   | 0-shot        | 80.1          | 88.0          | 88.6           |
-| [ARC-c][arc]                   | 25-shot       | 55.4          | 68.4          | 71.4           |
-| [TriviaQA][triviaqa]           | 5-shot        | 59.4          | 76.6          | 83.7           |
-| [Natural Questions][naturalq]  | 5-shot        | 16.7          | 29.2          | 34.5           |
-| [HumanEval][humaneval]         | pass@1        | 17.7          | 40.2          | 51.8           |
-| [MBPP][mbpp]                   | 3-shot        | 29.6          | 52.4          | 62.6           |
-| [GSM8K][gsm8k]                 | 5-shot, maj@1 | 23.9          | 68.6          | 74.0           |
-| [MATH][math]                   | 4-shot        | 15.0          | 36.6          | 42.3           |
-| [AGIEval][agieval]             | 3-5-shot      | 30.6          | 52.8          | 55.1           |
-| [DROP][drop]                   | 3-shot, F1    | 52.0          | 69.4          | 72.2           |
-| [BIG-Bench][big-bench]         | 3-shot, CoT   | 41.9          | 68.2          | 74.9           |
 ## Ethics and Safety
@@ -437,111 +380,18 @@ are shown here.
 #### Gemma 2.0
-| Benchmark                | Metric        | Gemma 2 IT 2B | Gemma 2 IT 9B | Gemma 2 IT 27B |
-| ------------------------ | ------------- | ------------- | ------------- | -------------- |
-| [RealToxicity][realtox]  | average       |  8.16         |  8.25         |  8.84          |
-| [CrowS-Pairs][crows]     | top-1         | 37.67         | 37.47         | 36.67          |
-| [BBQ Ambig][bbq]         | 1-shot, top-1 | 83.20         | 88.58         | 85.99          |
-| [BBQ Disambig][bbq]      | top-1         | 69.31         | 82.67         | 86.94          |
-| [Winogender][winogender] | top-1         | 52.91         | 79.17         | 77.22          |
-| [TruthfulQA][truthfulqa] |               | 43.72         | 50.27         | 51.60          |
-| [Winobias 1_2][winobias] |               | 59.28         | 78.09         | 81.94          |
-| [Winobias 2_2][winobias] |               | 88.57         | 95.32         | 97.22          |
-| [Toxigen][toxigen]       |               | 48.32         | 39.30         | 38.42          |
-## Dangerous Capability Evaluations
-### Evaluation Approach
-We evaluated a range of dangerous capabilities:
--   **Offensive cybersecurity:** To assess the model's potential for misuse in
-    cybersecurity contexts, we utilized both publicly available
-    Capture-the-Flag (CTF) platforms like InterCode-CTF and Hack the Box, as
-    well as internally developed CTF challenges. These evaluations measure the
-    model's ability to exploit vulnerabilities and gain unauthorized access in
-    simulated environments.
--   **Self-proliferation:** We evaluated the model's capacity for
-    self-proliferation by designing tasks that involve resource acquisition, code
-    execution, and interaction with remote systems. These evaluations assess
-    the model's ability to independently replicate and spread.
--   **Persuasion:** To evaluate the model's capacity for persuasion and
-    deception, we conducted human persuasion studies. These studies involved
-    scenarios that measure the model's ability to build rapport, influence
-    beliefs, and elicit specific actions from human participants.
-### Evaluation Results
-All evaluations are described in detail in
-[Evaluating Frontier Models for Dangerous Capabilities][eval-danger]
-and in brief in the
-[Gemma 2 technical report][tech-report].
-<table>
-  <thead>
-    <tr>
-      <th>Evaluation</th>
-      <th>Capability</th>
-      <th>Gemma 2 IT 27B</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>InterCode-CTF</td>
-      <td>Offensive cybersecurity</td>
-      <td>34/76 challenges</td>
-    </tr>
-    <tr>
-      <td>Internal CTF</td>
-      <td>Offensive cybersecurity</td>
-      <td>1/13 challenges</td>
-    </tr>
-    <tr>
-      <td>Hack the Box</td>
-      <td>Offensive cybersecurity</td>
-      <td>0/13 challenges</td>
-    </tr>
-    <tr>
-      <td>Self-proliferation early warning</td>
-      <td>Self-proliferation</td>
-      <td>1/10 challenges</td>
-    </tr>
-    <tr>
-      <td>Charm offensive</td>
-      <td>Persuasion</td>
-      <td>Percent of participants agreeing:
-        81% interesting,
-        75% would speak again,
-        80% made personal connection</td>
-    </tr>
-    <tr>
-      <td>Click Links</td>
-      <td>Persuasion</td>
-      <td>34% of participants</td>
-    </tr>
-    <tr>
-      <td>Find Info</td>
-      <td>Persuasion</td>
-      <td>9% of participants</td>
-    </tr>
-    <tr>
-      <td>Run Code</td>
-      <td>Persuasion</td>
-      <td>11% of participants</td>
-    </tr>
-    <tr>
-      <td>Money talks</td>
-      <td>Persuasion</td>
-      <td>£3.72 mean donation</td>
-    </tr>
-    <tr>
-      <td>Web of Lies</td>
-      <td>Persuasion</td>
-      <td>18% mean shift towards correct belief, 1% mean shift towards
-incorrect belief</td>
-    </tr>
-  </tbody>
-</table>
 ## Usage and Limitations
@@ -644,11 +494,10 @@ Using the benchmark evaluation metrics described in this document, these models
 have shown to provide superior performance to other, comparably-sized open model
 alternatives.
-[tech-report]: https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf
 [rai-toolkit]: https://ai.google.dev/responsible
 [kaggle-gemma]: https://www.kaggle.com/models/google/gemma-2
 [terms]: https://ai.google.dev/gemma/terms
-[vertex-mg-gemma2]: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemma2
 [sensitive-info]: https://cloud.google.com/dlp/docs/high-sensitivity-infotypes-reference
 [safety-policies]: https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11
 [prohibited-use]: https://ai.google.dev/gemma/prohibited_use_policy
@@ -682,7 +531,5 @@ alternatives.
 [winobias]: https://arxiv.org/abs/1804.06876
 [math]: https://arxiv.org/abs/2103.03874
 [agieval]: https://arxiv.org/abs/2304.06364
-[drop]: https://arxiv.org/abs/1903.00161
 [big-bench]: https://arxiv.org/abs/2206.04615
-[toxigen]: https://arxiv.org/abs/2203.09509
-[eval-danger]: https://arxiv.org/abs/2403.13793

 # Gemma 2 model card
+**Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
 **Resources and Technical Documentation**:
 * [Responsible Generative AI Toolkit][rai-toolkit]
 * [Gemma on Kaggle][kaggle-gemma]
+* [Gemma on Vertex Model Garden][vertex-mg-gemma]
+**Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent/verify/huggingface?returnModelRepoId=google/gemma-2-9b-it)
 **Authors**: Google
 ### Usage
+Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
 #### Running the model on a single / multi GPU
 ```python
 # pip install accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
 model = AutoModelForCausalLM.from_pretrained(
+    "google/gemma-2-9b-it",
     device_map="auto",
+    torch_dtype=torch.bfloat16
 )
 input_text = "Write me a poem about Machine Learning."
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids)
 print(tokenizer.decode(outputs[0]))
 ```
 # pip install accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
 model = AutoModelForCausalLM.from_pretrained(
+    "google/gemma-2-9b-it",
+    device_map="auto")
 input_text = "Write me a poem about Machine Learning."
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids)
 print(tokenizer.decode(outputs[0]))
 ```
 #### Quantized Versions through `bitsandbytes`
+* _Using 8-bit precision (int8)_
 ```python
 # pip install bitsandbytes accelerate
 quantization_config = BitsAndBytesConfig(load_in_8bit=True)
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
 model = AutoModelForCausalLM.from_pretrained(
+    "google/gemma-2-9b-it",
+    quantization_config=quantization_config)
 input_text = "Write me a poem about Machine Learning."
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids)
 print(tokenizer.decode(outputs[0]))
 ```
+* _Using 4-bit precision_
 ```python
 # pip install bitsandbytes accelerate
 quantization_config = BitsAndBytesConfig(load_in_4bit=True)
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
 model = AutoModelForCausalLM.from_pretrained(
+    "google/gemma-2-9b-it",
+    quantization_config=quantization_config)
 input_text = "Write me a poem about Machine Learning."
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids)
 print(tokenizer.decode(outputs[0]))
 ```
+#### Other optimizations
+* _Flash Attention 2_
+First make sure to install `flash-attn` in your environment `pip install flash-attn`
+```diff
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16,
++   attn_implementation="flash_attention_2"
+).to(0)
+```
+### Chat Template
+The instruction-tuned models use a chat template that must be adhered to for conversational use.
+The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
+Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
+```py
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import transformers
+import torch
+model_id = "google/gemma-2-9b-it"
+dtype = torch.bfloat16
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="cuda",
+    torch_dtype=dtype,)
+chat = [
+    { "role": "user", "content": "Write a hello world program" },
+]
+prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
+```
+At this point, the prompt contains the following text:
 ```
+<bos><start_of_turn>user
+Write a hello world program<end_of_turn>
+<start_of_turn>model
+```
+As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
+(either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
+the `<end_of_turn>` token.
+You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
+chat template.
+After the prompt is ready, generation can be performed like this:
+```py
+inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
+outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
+print(tokenizer.decode(outputs[0]))
+```
 ### Inputs and outputs
 ### Training Dataset
+These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 13 trillion tokens and the 9B model was trained with 8 trillion tokens.
 Here are the key components:
 * Web Documents: A diverse collection of web text ensures the model is exposed
 These models were evaluated against a large collection of different datasets and
 metrics to cover different aspects of text generation:
+| Benchmark                      | Metric        | Gemma PT 9B | Gemma PT 27B |
+| ------------------------------ | ------------- | ----------- | ------------ |
+| [MMLU][mmlu]                   | 5-shot, top-1 | 71.3        | 75.2         |
+| [HellaSwag][hellaswag]         | 10-shot       | 81.9        | 86.4         |
+| [PIQA][piqa]                   | 0-shot        | 81.7        | 83.2         |
+| [SocialIQA][socialiqa]         | 0-shot        | 53.4        | 53.7         |
+| [BoolQ][boolq]                 | 0-shot        | 84.2        | 84.8         |
+| [WinoGrande][winogrande]       | partial score | 80.6        | 83.7         |
+| [ARC-e][arc]                   | 0-shot        | 88.0        | 88.6         |
+| [ARC-c][arc]                   | 25-shot       | 68.4        | 71.4         |
+| [TriviaQA][triviaqa]           | 5-shot        | 76.6        | 83.7         |
+| [Natural Questions][naturalq]  | 5-shot        | 29.2        | 34.5         |
+| [HumanEval][humaneval]         | pass@1        | 40.2        | 51.8         |
+| [MBPP][mbpp]                   | 3-shot        | 52.4        | 62.6         |
+| [GSM8K][gsm8k]                 | 5-shot, maj@1 | 68.6        | 74.0         |
+| [MATH][math]                   | 4-shot        | 36.6        | 42.3         |
+| [AGIEval][agieval]             | 3-5-shot      | 52.8        | 55.1         |
+| [BIG-Bench][big-bench]         | 3-shot, CoT   | 68.2        | 74.9         |
+| ------------------------------ | ------------- | ----------- | ------------ |
 ## Ethics and Safety
 #### Gemma 2.0
+| Benchmark                | Metric        | Gemma 2 IT 9B | Gemma 2 IT 27B |
+| ------------------------ | ------------- | --------------- | ---------------- |
+| [RealToxicity][realtox]  | average       |  8.25           |  8.84            |
+| [CrowS-Pairs][crows]     | top-1         | 37.47           | 36.67            |
+| [BBQ Ambig][bbq]         | 1-shot, top-1 | 88.58           | 85.99            |
+| [BBQ Disambig][bbq]      | top-1         | 82.67           | 86.94            |
+| [Winogender][winogender] | top-1         | 79.17           | 77.22            |
+| [TruthfulQA][truthfulqa] |               | 50.27           | 51.60            |
+| [Winobias 1_2][winobias] |               | 78.09           | 81.94            |
+| [Winobias 2_2][winobias] |               | 95.32           | 97.22            |
+| [Toxigen][toxigen]       |               | 39.30           | 38.42            |
+| ------------------------ | ------------- | --------------- | ---------------- |
 ## Usage and Limitations
 have shown to provide superior performance to other, comparably-sized open model
 alternatives.
 [rai-toolkit]: https://ai.google.dev/responsible
 [kaggle-gemma]: https://www.kaggle.com/models/google/gemma-2
 [terms]: https://ai.google.dev/gemma/terms
+[vertex-mg-gemma]: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335
 [sensitive-info]: https://cloud.google.com/dlp/docs/high-sensitivity-infotypes-reference
 [safety-policies]: https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11
 [prohibited-use]: https://ai.google.dev/gemma/prohibited_use_policy
 [winobias]: https://arxiv.org/abs/1804.06876
 [math]: https://arxiv.org/abs/2103.03874
 [agieval]: https://arxiv.org/abs/2304.06364
 [big-bench]: https://arxiv.org/abs/2206.04615
+[toxigen]: https://arxiv.org/abs/2203.09509