Reorganize Docs (#1468)
Browse files- README.md +5 -626
- _quarto.yml +3 -3
- docs/config.qmd +439 -11
- docs/dataset-formats/conversation.qmd +71 -0
- docs/dataset-formats/index.qmd +14 -0
- docs/dataset-formats/inst_tune.qmd +165 -0
- docs/dataset-formats/pretraining.qmd +26 -0
- docs/dataset-formats/template_free.qmd +7 -0
- docs/dataset-formats/tokenized.qmd +12 -0
- docs/fsdp_qlora.qmd +1 -1
- docs/input_output.qmd +6 -4
    	
        README.md
    CHANGED
    
    | @@ -35,13 +35,12 @@ Features: | |
| 35 | 
             
              - [Google Colab](#google-colab)
         | 
| 36 | 
             
              - [Launching on public clouds via SkyPilot](#launching-on-public-clouds-via-skypilot)
         | 
| 37 | 
             
            - [Dataset](#dataset)
         | 
| 38 | 
            -
              - [How to Add Custom Prompts](#how-to-add-custom-prompts)
         | 
| 39 | 
            -
              - [How to Use Custom Pretokenized Dataset](#how-to-use-your-custom-pretokenized-dataset)
         | 
| 40 | 
             
            - [Config](#config)
         | 
| 41 | 
             
              - [Train](#train)
         | 
| 42 | 
             
              - [Inference](#inference-playground)
         | 
| 43 | 
             
              - [Merge LORA to Base](#merge-lora-to-base)
         | 
| 44 | 
             
              - [Special Tokens](#special-tokens)
         | 
|  | |
| 45 | 
             
            - Advanced Topics
         | 
| 46 | 
             
              - [Multipack](./docs/multipack.qmd)<svg width="24" height="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M17 13.5v6H5v-12h6m3-3h6v6m0-6-9 9" class="icon_svg-stroke" stroke="#666" stroke-width="1.5" fill="none" fill-rule="evenodd" stroke-linecap="round" stroke-linejoin="round"></path></svg>
         | 
| 47 | 
             
              - [RLHF & DPO](./docs/rlhf.qmd)<svg width="24" height="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M17 13.5v6H5v-12h6m3-3h6v6m0-6-9 9" class="icon_svg-stroke" stroke="#666" stroke-width="1.5" fill="none" fill-rule="evenodd" stroke-linecap="round" stroke-linejoin="round"></path></svg>
         | 
| @@ -299,186 +298,9 @@ HF_TOKEN=xx BUCKET=<unique-name> sky spot launch axolotl-spot.yaml --env HF_TOKE | |
| 299 |  | 
| 300 | 
             
            ### Dataset
         | 
| 301 |  | 
| 302 | 
            -
            Axolotl supports a variety of dataset formats.  | 
| 303 | 
            -
            Have dataset(s) in one of the following format (JSONL recommended):
         | 
| 304 |  | 
| 305 | 
            -
             | 
| 306 | 
            -
             | 
| 307 | 
            -
            - `completion`: raw corpus
         | 
| 308 | 
            -
              ```json
         | 
| 309 | 
            -
              {"text": "..."}
         | 
| 310 | 
            -
              ```
         | 
| 311 | 
            -
             | 
| 312 | 
            -
            Note: Axolotl usually loads the entire dataset into memory. This will be challenging for large datasets. Use the following config to enable streaming:
         | 
| 313 | 
            -
             | 
| 314 | 
            -
            ```yaml
         | 
| 315 | 
            -
            pretraining_dataset: # hf path only
         | 
| 316 | 
            -
            ```
         | 
| 317 | 
            -
             | 
| 318 | 
            -
            #### Supervised finetuning
         | 
| 319 | 
            -
             | 
| 320 | 
            -
            ##### Instruction
         | 
| 321 | 
            -
             | 
| 322 | 
            -
            - `alpaca`: instruction; input(optional)
         | 
| 323 | 
            -
              ```json
         | 
| 324 | 
            -
              {"instruction": "...", "input": "...", "output": "..."}
         | 
| 325 | 
            -
              ```
         | 
| 326 | 
            -
             | 
| 327 | 
            -
            <details>
         | 
| 328 | 
            -
             | 
| 329 | 
            -
            <summary>See other formats</summary>
         | 
| 330 | 
            -
             | 
| 331 | 
            -
            - `jeopardy`: question and answer
         | 
| 332 | 
            -
              ```json
         | 
| 333 | 
            -
              {"question": "...", "category": "...", "answer": "..."}
         | 
| 334 | 
            -
              ```
         | 
| 335 | 
            -
            - `oasst`: instruction
         | 
| 336 | 
            -
              ```json
         | 
| 337 | 
            -
              {"INSTRUCTION": "...", "RESPONSE": "..."}
         | 
| 338 | 
            -
              ```
         | 
| 339 | 
            -
            - `gpteacher`: instruction; input(optional)
         | 
| 340 | 
            -
              ```json
         | 
| 341 | 
            -
              {"instruction": "...", "input": "...", "response": "..."}
         | 
| 342 | 
            -
              ```
         | 
| 343 | 
            -
            - `reflection`: instruction with reflect; input(optional)
         | 
| 344 | 
            -
              ```json
         | 
| 345 | 
            -
              {"instruction": "...", "input": "...", "output": "...", "reflection": "...", "corrected": "..."}
         | 
| 346 | 
            -
              ```
         | 
| 347 | 
            -
            - `explainchoice`: question, choices, (solution OR explanation)
         | 
| 348 | 
            -
              ```json
         | 
| 349 | 
            -
              {"question": "...", "choices": ["..."], "solution": "...", "explanation": "..."}
         | 
| 350 | 
            -
              ```
         | 
| 351 | 
            -
            - `concisechoice`: question, choices, (solution OR explanation)
         | 
| 352 | 
            -
              ```json
         | 
| 353 | 
            -
              {"question": "...", "choices": ["..."], "solution": "...", "explanation": "..."}
         | 
| 354 | 
            -
              ```
         | 
| 355 | 
            -
            - `summarizetldr`: article and summary
         | 
| 356 | 
            -
              ```json
         | 
| 357 | 
            -
              {"article": "...", "summary": "..."}
         | 
| 358 | 
            -
              ```
         | 
| 359 | 
            -
            - `alpaca_chat`: basic instruct for alpaca chat
         | 
| 360 | 
            -
              ```json
         | 
| 361 | 
            -
              {"instruction": "...", "input": "...", "response": "..."}
         | 
| 362 | 
            -
              ```
         | 
| 363 | 
            -
            - `alpaca_chat.load_qa`: question and answer for alpaca chat
         | 
| 364 | 
            -
              ```json
         | 
| 365 | 
            -
              {"question": "...", "answer": "..."}
         | 
| 366 | 
            -
              ```
         | 
| 367 | 
            -
            - `alpaca_chat.load_concise`: question and answer for alpaca chat, for concise answers
         | 
| 368 | 
            -
              ```json
         | 
| 369 | 
            -
              {"instruction": "...", "input": "...", "response": "..."}
         | 
| 370 | 
            -
              ```
         | 
| 371 | 
            -
            - `alpaca_chat.load_camel_ai`: question and answer for alpaca chat, for load_camel_ai
         | 
| 372 | 
            -
              ```json
         | 
| 373 | 
            -
              {"message_1": "...", "message_2": "..."}
         | 
| 374 | 
            -
              ```
         | 
| 375 | 
            -
            - `alpaca_w_system.load_open_orca`: support for open orca datasets with included system prompts, instruct
         | 
| 376 | 
            -
              ```json
         | 
| 377 | 
            -
              {"system_prompt": "...", "question": "...", "response": "..."}
         | 
| 378 | 
            -
              ```
         | 
| 379 | 
            -
            - `context_qa`: in context question answering from an article
         | 
| 380 | 
            -
              ```json
         | 
| 381 | 
            -
              {"article": "...", "question": "...", "answer": "..."}
         | 
| 382 | 
            -
              ```
         | 
| 383 | 
            -
            - `context_qa.load_v2`: in context question answering (alternate)
         | 
| 384 | 
            -
              ```json
         | 
| 385 | 
            -
              {"context": "...", "question": "...", "answer": "..."}
         | 
| 386 | 
            -
              ```
         | 
| 387 | 
            -
            - `context_qa.load_404`: in context question answering from an article, with default response for no answer from context
         | 
| 388 | 
            -
              ```json
         | 
| 389 | 
            -
              {"article": "...", "unanswerable_question": "..."}
         | 
| 390 | 
            -
              ```
         | 
| 391 | 
            -
            - `creative_acr.load_answer`: instruction and revision
         | 
| 392 | 
            -
              ```json
         | 
| 393 | 
            -
              {"instruction": "...", "revision": "..."}
         | 
| 394 | 
            -
              ```
         | 
| 395 | 
            -
            - `creative_acr.load_critique`: critique
         | 
| 396 | 
            -
              ```json
         | 
| 397 | 
            -
              {"scores": "...", "critiques": "...", "instruction": "...", "answer": "..."}
         | 
| 398 | 
            -
              ```
         | 
| 399 | 
            -
            - `creative_acr.load_revise`: critique and revise
         | 
| 400 | 
            -
              ```json
         | 
| 401 | 
            -
              {"scores": "...", "critiques": "...", "instruction": "...", "answer": "...", "revision": "..."}
         | 
| 402 | 
            -
              ```
         | 
| 403 | 
            -
            - `metharme`: instruction, adds additional eos tokens
         | 
| 404 | 
            -
              ```json
         | 
| 405 | 
            -
              {"prompt": "...", "generation": "..."}
         | 
| 406 | 
            -
              ```
         | 
| 407 | 
            -
             | 
| 408 | 
            -
            </details>
         | 
| 409 | 
            -
             | 
| 410 | 
            -
            ##### Template-Free
         | 
| 411 | 
            -
             | 
| 412 | 
            -
            - `input_output`: template-free prompt construction
         | 
| 413 | 
            -
              ```json
         | 
| 414 | 
            -
               {"segments": [{"label": true|false, "text": "..."}]}
         | 
| 415 | 
            -
              ```
         | 
| 416 | 
            -
             | 
| 417 | 
            -
            This is a special format that allows you to construct prompts without using templates. This is for advanced users who want more freedom with prompt construction.  See [these docs](docs/input_output.qmd) for more details.
         | 
| 418 | 
            -
             | 
| 419 | 
            -
            ##### Conversation
         | 
| 420 | 
            -
             | 
| 421 | 
            -
            - `sharegpt`: conversations where `from` is `human`/`gpt`. (optional: first row with role `system` to override default system prompt)
         | 
| 422 | 
            -
              ```json
         | 
| 423 | 
            -
              {"conversations": [{"from": "...", "value": "..."}]}
         | 
| 424 | 
            -
              ```
         | 
| 425 | 
            -
             | 
| 426 | 
            -
            <details>
         | 
| 427 | 
            -
             | 
| 428 | 
            -
            <summary>See other formats</summary>
         | 
| 429 | 
            -
             | 
| 430 | 
            -
            - `pygmalion`: pygmalion
         | 
| 431 | 
            -
              ```json
         | 
| 432 | 
            -
              {"conversations": [{"role": "...", "value": "..."}]}
         | 
| 433 | 
            -
              ```
         | 
| 434 | 
            -
            - `sharegpt.load_role`: conversations where `role` is used instead of `from`
         | 
| 435 | 
            -
              ```json
         | 
| 436 | 
            -
              {"conversations": [{"role": "...", "value": "..."}]}
         | 
| 437 | 
            -
              ```
         | 
| 438 | 
            -
            - `sharegpt.load_guanaco`: conversations where `from` is `prompter`/`assistant` instead of default sharegpt
         | 
| 439 | 
            -
              ```json
         | 
| 440 | 
            -
              {"conversations": [{"from": "...", "value": "..."}]}
         | 
| 441 | 
            -
              ```
         | 
| 442 | 
            -
            - `sharegpt_jokes`: creates a chat where bot is asked to tell a joke, then explain why the joke is funny
         | 
| 443 | 
            -
              ```json
         | 
| 444 | 
            -
              {"conversations": [{"title": "...", "text": "...", "explanation": "..."}]}
         | 
| 445 | 
            -
              ```
         | 
| 446 | 
            -
             | 
| 447 | 
            -
            </details>
         | 
| 448 | 
            -
             | 
| 449 | 
            -
            Note: `type: sharegpt` opens a special config `conversation:` that enables conversions to many Conversation types. See dataset section under [all yaml options](#all-yaml-options).
         | 
| 450 | 
            -
             | 
| 451 | 
            -
            #### How to add custom prompts
         | 
| 452 | 
            -
             | 
| 453 | 
            -
            For a dataset that is preprocessed for instruction purposes:
         | 
| 454 | 
            -
             | 
| 455 | 
            -
            ```json
         | 
| 456 | 
            -
            {"input": "...", "output": "..."}
         | 
| 457 | 
            -
            ```
         | 
| 458 | 
            -
             | 
| 459 | 
            -
            You can use this example in your YAML config:
         | 
| 460 | 
            -
             | 
| 461 | 
            -
            ```yaml
         | 
| 462 | 
            -
            datasets:
         | 
| 463 | 
            -
              - path: repo
         | 
| 464 | 
            -
                type:
         | 
| 465 | 
            -
                  system_prompt: ""
         | 
| 466 | 
            -
                  field_system: system
         | 
| 467 | 
            -
                  field_instruction: input
         | 
| 468 | 
            -
                  field_output: output
         | 
| 469 | 
            -
                  format: "[INST] {instruction} [/INST]"
         | 
| 470 | 
            -
                  no_input_format: "[INST] {instruction} [/INST]"
         | 
| 471 | 
            -
            ```
         | 
| 472 | 
            -
            See full config options under [all yaml options](#all-yaml-options).
         | 
| 473 | 
            -
             | 
| 474 | 
            -
            #### How to use your custom pretokenized dataset
         | 
| 475 | 
            -
             | 
| 476 | 
            -
            - Do not pass a `type:`
         | 
| 477 | 
            -
            - Columns in Dataset must be exactly `input_ids`, `attention_mask`, `labels`
         | 
| 478 | 
            -
             | 
| 479 | 
            -
            ```yaml
         | 
| 480 | 
            -
            - path: ...
         | 
| 481 | 
            -
            ```
         | 
| 482 |  | 
| 483 | 
             
            ### Config
         | 
| 484 |  | 
| @@ -563,452 +385,9 @@ See [examples](examples) for quick start. It is recommended to duplicate and mod | |
| 563 | 
             
                - v_proj
         | 
| 564 | 
             
              ```
         | 
| 565 |  | 
| 566 | 
            -
             | 
| 567 |  | 
| 568 | 
            -
             | 
| 569 | 
            -
             | 
| 570 | 
            -
            ```yaml
         | 
| 571 | 
            -
            # This is the huggingface model that contains *.pt, *.safetensors, or *.bin files
         | 
| 572 | 
            -
            # This can also be a relative path to a model on disk
         | 
| 573 | 
            -
            base_model: ./llama-7b-hf
         | 
| 574 | 
            -
            # You can specify an ignore pattern if the model repo contains more than 1 model type (*.pt, etc)
         | 
| 575 | 
            -
            base_model_ignore_patterns:
         | 
| 576 | 
            -
            # If the base_model repo on hf hub doesn't include configuration .json files,
         | 
| 577 | 
            -
            # You can set that here, or leave this empty to default to base_model
         | 
| 578 | 
            -
            base_model_config: ./llama-7b-hf
         | 
| 579 | 
            -
            # You can specify to choose a specific model revision from huggingface hub
         | 
| 580 | 
            -
            revision_of_model:
         | 
| 581 | 
            -
            # Optional tokenizer configuration path in case you want to use a different tokenizer
         | 
| 582 | 
            -
            # than the one defined in the base model
         | 
| 583 | 
            -
            tokenizer_config:
         | 
| 584 | 
            -
            # If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too
         | 
| 585 | 
            -
            model_type: AutoModelForCausalLM
         | 
| 586 | 
            -
            # Corresponding tokenizer for the model AutoTokenizer is a good choice
         | 
| 587 | 
            -
            tokenizer_type: AutoTokenizer
         | 
| 588 | 
            -
            # Trust remote code for untrusted source
         | 
| 589 | 
            -
            trust_remote_code:
         | 
| 590 | 
            -
            # use_fast option for tokenizer loading from_pretrained, default to True
         | 
| 591 | 
            -
            tokenizer_use_fast:
         | 
| 592 | 
            -
            # Whether to use the legacy tokenizer setting, defaults to True
         | 
| 593 | 
            -
            tokenizer_legacy:
         | 
| 594 | 
            -
            # Resize the model embeddings when new tokens are added to multiples of 32
         | 
| 595 | 
            -
            # This is reported to improve training speed on some models
         | 
| 596 | 
            -
            resize_token_embeddings_to_32x:
         | 
| 597 | 
            -
             | 
| 598 | 
            -
            # (Internal use only)
         | 
| 599 | 
            -
            # Used to identify which the model is based on
         | 
| 600 | 
            -
            is_falcon_derived_model:
         | 
| 601 | 
            -
            is_llama_derived_model:
         | 
| 602 | 
            -
            is_qwen_derived_model:
         | 
| 603 | 
            -
            # Please note that if you set this to true, `padding_side` will be set to "left" by default
         | 
| 604 | 
            -
            is_mistral_derived_model:
         | 
| 605 | 
            -
             | 
| 606 | 
            -
            # optional overrides to the base model configuration
         | 
| 607 | 
            -
            overrides_of_model_config:
         | 
| 608 | 
            -
              # RoPE Scaling https://github.com/huggingface/transformers/pull/24653
         | 
| 609 | 
            -
              rope_scaling:
         | 
| 610 | 
            -
                type: # linear | dynamic
         | 
| 611 | 
            -
                factor: # float
         | 
| 612 | 
            -
             | 
| 613 | 
            -
            # optional overrides to the bnb 4bit quantization configuration
         | 
| 614 | 
            -
            # https://huggingface.co/docs/transformers/main/main_classes/quantization#transformers.BitsAndBytesConfig
         | 
| 615 | 
            -
            bnb_config_kwargs:
         | 
| 616 | 
            -
              # These are default values
         | 
| 617 | 
            -
              llm_int8_has_fp16_weight: false
         | 
| 618 | 
            -
              bnb_4bit_quant_type: nf4
         | 
| 619 | 
            -
              bnb_4bit_use_double_quant: true
         | 
| 620 | 
            -
             | 
| 621 | 
            -
             | 
| 622 | 
            -
            # Whether you are training a 4-bit GPTQ quantized model
         | 
| 623 | 
            -
            gptq: true
         | 
| 624 | 
            -
             | 
| 625 | 
            -
            # This will attempt to quantize the model down to 8 bits and use adam 8 bit optimizer
         | 
| 626 | 
            -
            load_in_8bit: true
         | 
| 627 | 
            -
            # Use bitsandbytes 4 bit
         | 
| 628 | 
            -
            load_in_4bit:
         | 
| 629 | 
            -
             | 
| 630 | 
            -
            # Use CUDA bf16
         | 
| 631 | 
            -
            bf16: true # bool or 'full' for `bf16_full_eval`. require >=ampere
         | 
| 632 | 
            -
            # Use CUDA fp16
         | 
| 633 | 
            -
            fp16: true
         | 
| 634 | 
            -
            # Use CUDA tf32
         | 
| 635 | 
            -
            tf32: true # require >=ampere
         | 
| 636 | 
            -
             | 
| 637 | 
            -
            # No AMP (automatic mixed precision)
         | 
| 638 | 
            -
            bfloat16: true # require >=ampere
         | 
| 639 | 
            -
            float16: true
         | 
| 640 | 
            -
             | 
| 641 | 
            -
            # Limit the memory for all available GPUs to this amount (if an integer, expressed in gigabytes); default: unset
         | 
| 642 | 
            -
            gpu_memory_limit: 20GiB
         | 
| 643 | 
            -
            # Do the LoRA/PEFT loading on CPU -- this is required if the base model is so large it takes up most or all of the available GPU VRAM, e.g. during a model and LoRA merge
         | 
| 644 | 
            -
            lora_on_cpu: true
         | 
| 645 | 
            -
             | 
| 646 | 
            -
            # A list of one or more datasets to finetune the model with
         | 
| 647 | 
            -
            datasets:
         | 
| 648 | 
            -
              # HuggingFace dataset repo | s3://,gs:// path | "json" for local dataset, make sure to fill data_files
         | 
| 649 | 
            -
              - path: vicgalle/alpaca-gpt4
         | 
| 650 | 
            -
              # The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
         | 
| 651 | 
            -
                type: alpaca # format | format:<prompt_style> (chat/instruct) | <prompt_strategies>.load_<load_fn>
         | 
| 652 | 
            -
                ds_type: # Optional[str] (json|arrow|parquet|text|csv) defines the datatype when path is a file
         | 
| 653 | 
            -
                data_files: # Optional[str] path to source data files
         | 
| 654 | 
            -
                shards: # Optional[int] number of shards to split data into
         | 
| 655 | 
            -
                name: # Optional[str] name of dataset configuration to load
         | 
| 656 | 
            -
                train_on_split: train # Optional[str] name of dataset split to load from
         | 
| 657 | 
            -
             | 
| 658 | 
            -
                # Optional[str] fastchat conversation type, only used with type: sharegpt
         | 
| 659 | 
            -
                conversation: # Options (see Conversation 'name'): https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
         | 
| 660 | 
            -
                field_human: # Optional[str]. Human key to use for conversation.
         | 
| 661 | 
            -
                field_model: # Optional[str]. Assistant key to use for conversation.
         | 
| 662 | 
            -
                # Add additional keys from your dataset as input or output roles
         | 
| 663 | 
            -
                roles:
         | 
| 664 | 
            -
                  input: # Optional[List[str]]. These will be masked based on train_on_input
         | 
| 665 | 
            -
                  output: # Optional[List[str]].
         | 
| 666 | 
            -
             | 
| 667 | 
            -
              # Custom user instruction prompt
         | 
| 668 | 
            -
              - path: repo
         | 
| 669 | 
            -
                type:
         | 
| 670 | 
            -
                  # The below are defaults. only set what's needed if you use a different column name.
         | 
| 671 | 
            -
                  system_prompt: ""
         | 
| 672 | 
            -
                  system_format: "{system}"
         | 
| 673 | 
            -
                  field_system: system
         | 
| 674 | 
            -
                  field_instruction: instruction
         | 
| 675 | 
            -
                  field_input: input
         | 
| 676 | 
            -
                  field_output: output
         | 
| 677 | 
            -
             | 
| 678 | 
            -
                  # Customizable to be single line or multi-line
         | 
| 679 | 
            -
                  # Use {instruction}/{input} as key to be replaced
         | 
| 680 | 
            -
                  # 'format' can include {input}
         | 
| 681 | 
            -
                  format: |-
         | 
| 682 | 
            -
                    User: {instruction} {input}
         | 
| 683 | 
            -
                    Assistant:
         | 
| 684 | 
            -
                  # 'no_input_format' cannot include {input}
         | 
| 685 | 
            -
                  no_input_format: "{instruction} "
         | 
| 686 | 
            -
             | 
| 687 | 
            -
                  # For `completion` datsets only, uses the provided field instead of `text` column
         | 
| 688 | 
            -
                  field:
         | 
| 689 | 
            -
             | 
| 690 | 
            -
            # If false, the datasets will not be shuffled and will keep their original order in `datasets`.
         | 
| 691 | 
            -
            # The same applies to the `test_datasets` option and the `pretraining_dataset` option. Default is true.
         | 
| 692 | 
            -
            shuffle_merged_datasets: true
         | 
| 693 | 
            -
             | 
| 694 | 
            -
            # A list of one or more datasets to eval the model with.
         | 
| 695 | 
            -
            # You can use either test_datasets, or val_set_size, but not both.
         | 
| 696 | 
            -
            test_datasets:
         | 
| 697 | 
            -
              - path: /workspace/data/eval.jsonl
         | 
| 698 | 
            -
                ds_type: json
         | 
| 699 | 
            -
                # You need to specify a split. For "json" datasets the default split is called "train".
         | 
| 700 | 
            -
                split: train
         | 
| 701 | 
            -
                type: completion
         | 
| 702 | 
            -
                data_files:
         | 
| 703 | 
            -
                  - /workspace/data/eval.jsonl
         | 
| 704 | 
            -
             | 
| 705 | 
            -
            # use RL training: 'dpo', 'ipo', 'kto_pair'
         | 
| 706 | 
            -
            rl:
         | 
| 707 | 
            -
             | 
| 708 | 
            -
            # Saves the desired chat template to the tokenizer_config.json for easier inferencing
         | 
| 709 | 
            -
            # Currently supports chatml and inst (mistral/mixtral)
         | 
| 710 | 
            -
            chat_template: chatml
         | 
| 711 | 
            -
            # Changes the default system message
         | 
| 712 | 
            -
            default_system_message: You are a helpful assistant. Please give a long and detailed answer. # Currently only supports chatml.
         | 
| 713 | 
            -
            # Axolotl attempts to save the dataset as an arrow after packing the data together so
         | 
| 714 | 
            -
            # subsequent training attempts load faster, relative path
         | 
| 715 | 
            -
            dataset_prepared_path: data/last_run_prepared
         | 
| 716 | 
            -
            # Push prepared dataset to hub
         | 
| 717 | 
            -
            push_dataset_to_hub: # repo path
         | 
| 718 | 
            -
            # The maximum number of processes to use while preprocessing your input dataset. This defaults to `os.cpu_count()`
         | 
| 719 | 
            -
            # if not set.
         | 
| 720 | 
            -
            dataset_processes: # defaults to os.cpu_count() if not set
         | 
| 721 | 
            -
            # Keep dataset in memory while preprocessing
         | 
| 722 | 
            -
            # Only needed if cached dataset is taking too much storage
         | 
| 723 | 
            -
            dataset_keep_in_memory:
         | 
| 724 | 
            -
            # push checkpoints to hub
         | 
| 725 | 
            -
            hub_model_id: # private repo path to push finetuned model
         | 
| 726 | 
            -
            # how to push checkpoints to hub
         | 
| 727 | 
            -
            # https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
         | 
| 728 | 
            -
            hub_strategy:
         | 
| 729 | 
            -
            # Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
         | 
| 730 | 
            -
            # Required to be true when used in combination with `push_dataset_to_hub`
         | 
| 731 | 
            -
            hf_use_auth_token: # boolean
         | 
| 732 | 
            -
            # How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc. 0 for no eval.
         | 
| 733 | 
            -
            val_set_size: 0.04
         | 
| 734 | 
            -
            # Num shards for whole dataset
         | 
| 735 | 
            -
            dataset_shard_num:
         | 
| 736 | 
            -
            # Index of shard to use for whole dataset
         | 
| 737 | 
            -
            dataset_shard_idx:
         | 
| 738 | 
            -
             | 
| 739 | 
            -
            # The maximum length of an input to train with, this should typically be less than 2048
         | 
| 740 | 
            -
            # as most models have a token/context limit of 2048
         | 
| 741 | 
            -
            sequence_len: 2048
         | 
| 742 | 
            -
            # Pad inputs so each step uses constant sized buffers
         | 
| 743 | 
            -
            # This will reduce memory fragmentation and may prevent OOMs, by re-using memory more efficiently
         | 
| 744 | 
            -
            pad_to_sequence_len:
         | 
| 745 | 
            -
            # Use efficient multi-packing with block diagonal attention and per sequence position_ids. Recommend set to 'true'
         | 
| 746 | 
            -
            sample_packing:
         | 
| 747 | 
            -
            # Set to 'false' if getting errors during eval with sample_packing on.
         | 
| 748 | 
            -
            eval_sample_packing:
         | 
| 749 | 
            -
            # You can set these packing optimizations AFTER starting a training at least once.
         | 
| 750 | 
            -
            # The trainer will provide recommended values for these values.
         | 
| 751 | 
            -
            sample_packing_eff_est:
         | 
| 752 | 
            -
            total_num_tokens:
         | 
| 753 | 
            -
             | 
| 754 | 
            -
            # Passed through to transformers when loading the model when launched without accelerate
         | 
| 755 | 
            -
            # Use `sequential` when training w/ model parallelism to limit memory
         | 
| 756 | 
            -
            device_map:
         | 
| 757 | 
            -
            # Defines the max memory usage per gpu on the system. Passed through to transformers when loading the model.
         | 
| 758 | 
            -
            max_memory:
         | 
| 759 | 
            -
             | 
| 760 | 
            -
            # If you want to use 'lora' or 'qlora' or leave blank to train all parameters in original model
         | 
| 761 | 
            -
            adapter: lora
         | 
| 762 | 
            -
            # If you already have a lora model trained that you want to load, put that here.
         | 
| 763 | 
            -
            # This means after training, if you want to test the model, you should set this to the value of `output_dir`.
         | 
| 764 | 
            -
            # Note that if you merge an adapter to the base model, a new subdirectory `merged` will be created under the `output_dir`.
         | 
| 765 | 
            -
            lora_model_dir:
         | 
| 766 | 
            -
             | 
| 767 | 
            -
            # LoRA hyperparameters
         | 
| 768 | 
            -
            # For more details about the following options, see:
         | 
| 769 | 
            -
            # https://www.anyscale.com/blog/fine-tuning-llms-lora-or-full-parameter-an-in-depth-analysis-with-llama-2
         | 
| 770 | 
            -
            lora_r: 8
         | 
| 771 | 
            -
            lora_alpha: 16
         | 
| 772 | 
            -
            lora_dropout: 0.05
         | 
| 773 | 
            -
            lora_target_modules:
         | 
| 774 | 
            -
              - q_proj
         | 
| 775 | 
            -
              - v_proj
         | 
| 776 | 
            -
            #  - k_proj
         | 
| 777 | 
            -
            #  - o_proj
         | 
| 778 | 
            -
            #  - gate_proj
         | 
| 779 | 
            -
            #  - down_proj
         | 
| 780 | 
            -
            #  - up_proj
         | 
| 781 | 
            -
            lora_target_linear: # If true, will target all linear modules
         | 
| 782 | 
            -
            peft_layers_to_transform: # The layer indices to transform, otherwise, apply to all layers
         | 
| 783 | 
            -
             | 
| 784 | 
            -
            # If you added new tokens to the tokenizer, you may need to save some LoRA modules because they need to know the new tokens.
         | 
| 785 | 
            -
            # For LLaMA and Mistral, you need to save `embed_tokens` and `lm_head`. It may vary for other models.
         | 
| 786 | 
            -
            # `embed_tokens` converts tokens to embeddings, and `lm_head` converts embeddings to token probabilities.
         | 
| 787 | 
            -
            # https://github.com/huggingface/peft/issues/334#issuecomment-1561727994
         | 
| 788 | 
            -
            lora_modules_to_save:
         | 
| 789 | 
            -
            #  - embed_tokens
         | 
| 790 | 
            -
            #  - lm_head
         | 
| 791 | 
            -
             | 
| 792 | 
            -
            lora_fan_in_fan_out: false
         | 
| 793 | 
            -
             | 
| 794 | 
            -
            peft:
         | 
| 795 | 
            -
              # Configuration options for loftq initialization for LoRA
         | 
| 796 | 
            -
              # https://huggingface.co/docs/peft/developer_guides/quantization#loftq-initialization
         | 
| 797 | 
            -
              loftq_config:
         | 
| 798 | 
            -
                loftq_bits:  # typically 4 bits
         | 
| 799 | 
            -
             | 
| 800 | 
            -
            # ReLoRA configuration
         | 
| 801 | 
            -
            # Must use either 'lora' or 'qlora' adapter, and does not support fsdp or deepspeed
         | 
| 802 | 
            -
            relora_steps: # Number of steps per ReLoRA restart
         | 
| 803 | 
            -
            relora_warmup_steps: # Number of per-restart warmup steps
         | 
| 804 | 
            -
            relora_anneal_steps: # Number of anneal steps for each relora cycle
         | 
| 805 | 
            -
            relora_prune_ratio: # threshold for optimizer magnitude when pruning
         | 
| 806 | 
            -
            relora_cpu_offload: # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
         | 
| 807 | 
            -
             | 
| 808 | 
            -
            # wandb configuration if you're using it
         | 
| 809 | 
            -
            # Make sure your `WANDB_API_KEY` environment variable is set (recommended) or you login to wandb with `wandb login`.
         | 
| 810 | 
            -
            wandb_mode: # "offline" to save run metadata locally and not sync to the server, "disabled" to turn off wandb
         | 
| 811 | 
            -
            wandb_project: # Your wandb project name
         | 
| 812 | 
            -
            wandb_entity: # A wandb Team name if using a Team
         | 
| 813 | 
            -
            wandb_watch:
         | 
| 814 | 
            -
            wandb_name: # Set the name of your wandb run
         | 
| 815 | 
            -
            wandb_run_id: # Set the ID of your wandb run
         | 
| 816 | 
            -
            wandb_log_model: # "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only at the end of training
         | 
| 817 | 
            -
             | 
| 818 | 
            -
            # mlflow configuration if you're using it
         | 
| 819 | 
            -
            mlflow_tracking_uri: # URI to mlflow
         | 
| 820 | 
            -
            mlflow_experiment_name: # Your experiment name
         | 
| 821 | 
            -
            hf_mlflow_log_artifacts:  # set to true to copy each saved checkpoint on each save to mlflow artifact registry
         | 
| 822 | 
            -
             | 
| 823 | 
            -
            # Where to save the full-finetuned model to
         | 
| 824 | 
            -
            output_dir: ./completed-model
         | 
| 825 | 
            -
             | 
| 826 | 
            -
            # Whether to use torch.compile and which backend to use
         | 
| 827 | 
            -
            torch_compile:  # bool
         | 
| 828 | 
            -
            torch_compile_backend:  # Optional[str]
         | 
| 829 | 
            -
             | 
| 830 | 
            -
            # Training hyperparameters
         | 
| 831 | 
            -
             | 
| 832 | 
            -
            # If greater than 1, backpropagation will be skipped and the gradients will be accumulated for the given number of steps.
         | 
| 833 | 
            -
            gradient_accumulation_steps: 1
         | 
| 834 | 
            -
            # The number of samples to include in each batch. This is the number of samples sent to each GPU.
         | 
| 835 | 
            -
            micro_batch_size: 2
         | 
| 836 | 
            -
            eval_batch_size:
         | 
| 837 | 
            -
            num_epochs: 4
         | 
| 838 | 
            -
            warmup_steps: 100  # cannot use with warmup_ratio
         | 
| 839 | 
            -
            warmup_ratio: 0.05  # cannot use with warmup_steps
         | 
| 840 | 
            -
            learning_rate: 0.00003
         | 
| 841 | 
            -
            lr_quadratic_warmup:
         | 
| 842 | 
            -
            logging_steps:
         | 
| 843 | 
            -
            eval_steps: # Leave empty to eval at each epoch, integers for every N steps. decimal for fraction of total steps
         | 
| 844 | 
            -
            evals_per_epoch: # number of times per epoch to run evals, mutually exclusive with eval_steps
         | 
| 845 | 
            -
            save_strategy: # Set to `no` to skip checkpoint saves
         | 
| 846 | 
            -
            save_steps: # Leave empty to save at each epoch
         | 
| 847 | 
            -
            saves_per_epoch: # number of times per epoch to save a checkpoint, mutually exclusive with save_steps
         | 
| 848 | 
            -
            save_total_limit: # Checkpoints saved at a time
         | 
| 849 | 
            -
            # Maximum number of iterations to train for. It precedes num_epochs which means that
         | 
| 850 | 
            -
            # if both are set, num_epochs will not be guaranteed.
         | 
| 851 | 
            -
            # e.g., when 1 epoch is 1000 steps => `num_epochs: 2` and `max_steps: 100` will train for 100 steps
         | 
| 852 | 
            -
            max_steps:
         | 
| 853 | 
            -
             | 
| 854 | 
            -
            eval_table_size: # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
         | 
| 855 | 
            -
            eval_max_new_tokens: # Total number of tokens generated for predictions sent to wandb. Default is 128
         | 
| 856 | 
            -
            eval_causal_lm_metrics: # HF evaluate metrics used during evaluation. Default is ["sacrebleu", "comet", "ter", chrf]
         | 
| 857 | 
            -
             | 
| 858 | 
            -
            loss_watchdog_threshold: # High loss value, indicating the learning has broken down (a good estimate is ~2 times the loss at the start of training)
         | 
| 859 | 
            -
            loss_watchdog_patience: # Number of high-loss steps in a row before the trainer aborts (default: 3)
         | 
| 860 | 
            -
             | 
| 861 | 
            -
            # Save model as safetensors (require safetensors package)
         | 
| 862 | 
            -
            save_safetensors:
         | 
| 863 | 
            -
             | 
| 864 | 
            -
            # Whether to mask out or include the human's prompt from the training labels
         | 
| 865 | 
            -
            train_on_inputs: false
         | 
| 866 | 
            -
            # Group similarly sized data to minimize padding.
         | 
| 867 | 
            -
            # May be slower to start, as it must download and sort the entire dataset.
         | 
| 868 | 
            -
            # Note that training loss may have an oscillating pattern with this enabled.
         | 
| 869 | 
            -
            group_by_length: false
         | 
| 870 | 
            -
             | 
| 871 | 
            -
            # Whether to use gradient checkpointing https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing
         | 
| 872 | 
            -
            gradient_checkpointing: false
         | 
| 873 | 
            -
            # additional kwargs to pass to the trainer for gradient checkpointing
         | 
| 874 | 
            -
            # gradient_checkpointing_kwargs:
         | 
| 875 | 
            -
            #   use_reentrant: true
         | 
| 876 | 
            -
             | 
| 877 | 
            -
            # Stop training after this many evaluation losses have increased in a row
         | 
| 878 | 
            -
            # https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
         | 
| 879 | 
            -
            early_stopping_patience: 3
         | 
| 880 | 
            -
             | 
| 881 | 
            -
            # Specify a scheduler and kwargs to use with the optimizer
         | 
| 882 | 
            -
            lr_scheduler: # 'one_cycle' | 'log_sweep' | empty for cosine
         | 
| 883 | 
            -
            lr_scheduler_kwargs:
         | 
| 884 | 
            -
            cosine_min_lr_ratio: # decay lr to some percentage of the peak lr, e.g. cosine_min_lr_ratio=0.1 for 10% of peak lr
         | 
| 885 | 
            -
            cosine_constant_lr_ratio: # freeze lr at some percentage of the step, e.g. cosine_constant_lr_ratio=0.8 means start cosine_min_lr at 80% of training step (https://arxiv.org/pdf/2308.04014.pdf)
         | 
| 886 | 
            -
             | 
| 887 | 
            -
            # For one_cycle optim
         | 
| 888 | 
            -
            lr_div_factor: # Learning rate div factor
         | 
| 889 | 
            -
             | 
| 890 | 
            -
            # Specify optimizer
         | 
| 891 | 
            -
            # Valid values are driven by the Transformers OptimizerNames class, see:
         | 
| 892 | 
            -
            # https://github.com/huggingface/transformers/blob/95b374952dc27d8511541d6f5a4e22c9ec11fb24/src/transformers/training_args.py#L134
         | 
| 893 | 
            -
            #
         | 
| 894 | 
            -
            # Note that not all optimizers may be available in your environment, ex: 'adamw_anyprecision' is part of
         | 
| 895 | 
            -
            # torchdistx, 'adamw_bnb_8bit' is part of bnb.optim.Adam8bit, etc. When in doubt, it is recommended to start with the optimizer used
         | 
| 896 | 
            -
            # in the examples/ for your model and fine-tuning use case.
         | 
| 897 | 
            -
            #
         | 
| 898 | 
            -
            # Valid values for 'optimizer' include:
         | 
| 899 | 
            -
            # - adamw_hf
         | 
| 900 | 
            -
            # - adamw_torch
         | 
| 901 | 
            -
            # - adamw_torch_fused
         | 
| 902 | 
            -
            # - adamw_torch_xla
         | 
| 903 | 
            -
            # - adamw_apex_fused
         | 
| 904 | 
            -
            # - adafactor
         | 
| 905 | 
            -
            # - adamw_anyprecision
         | 
| 906 | 
            -
            # - sgd
         | 
| 907 | 
            -
            # - adagrad
         | 
| 908 | 
            -
            # - adamw_bnb_8bit
         | 
| 909 | 
            -
            # - lion_8bit
         | 
| 910 | 
            -
            # - lion_32bit
         | 
| 911 | 
            -
            # - paged_adamw_32bit
         | 
| 912 | 
            -
            # - paged_adamw_8bit
         | 
| 913 | 
            -
            # - paged_lion_32bit
         | 
| 914 | 
            -
            # - paged_lion_8bit
         | 
| 915 | 
            -
            # - galore_adamw
         | 
| 916 | 
            -
            # - galore_adamw_8bit
         | 
| 917 | 
            -
            # - galore_adafactor
         | 
| 918 | 
            -
            # - galore_adamw_layerwise
         | 
| 919 | 
            -
            # - galore_adamw_8bit_layerwise
         | 
| 920 | 
            -
            # - galore_adafactor_layerwise
         | 
| 921 | 
            -
            optimizer:
         | 
| 922 | 
            -
            # Dictionary of arguments to pass to the optimizer
         | 
| 923 | 
            -
            optim_args:
         | 
| 924 | 
            -
            # For Galore Optimizers the following optim_args are available
         | 
| 925 | 
            -
            # rank:  # type: int
         | 
| 926 | 
            -
            # update_proj_gap  # type: int
         | 
| 927 | 
            -
            # scale  # type: float
         | 
| 928 | 
            -
            # proj_type:  # type: str, default = std
         | 
| 929 | 
            -
             | 
| 930 | 
            -
            # The target modules to optimize, i.e. the module names that you would like to train, right now this is used only for GaLore algorithm
         | 
| 931 | 
            -
            optim_target_modules:
         | 
| 932 | 
            -
            # - self_attn  # for llama
         | 
| 933 | 
            -
            # - mlp
         | 
| 934 | 
            -
             | 
| 935 | 
            -
            # Specify weight decay
         | 
| 936 | 
            -
            weight_decay:
         | 
| 937 | 
            -
            # adamw hyperparams
         | 
| 938 | 
            -
            adam_beta1:
         | 
| 939 | 
            -
            adam_beta2:
         | 
| 940 | 
            -
            adam_epsilon:
         | 
| 941 | 
            -
            # Gradient clipping max norm
         | 
| 942 | 
            -
            max_grad_norm:
         | 
| 943 | 
            -
             | 
| 944 | 
            -
            # Augmentation techniques
         | 
| 945 | 
            -
            # NEFT https://arxiv.org/abs/2310.05914, set this to a number (paper default is 5) to add noise to embeddings
         | 
| 946 | 
            -
            # currently only supported on Llama and Mistral
         | 
| 947 | 
            -
            neftune_noise_alpha:
         | 
| 948 | 
            -
             | 
| 949 | 
            -
            # Whether to bettertransformers
         | 
| 950 | 
            -
            flash_optimum:
         | 
| 951 | 
            -
            # Whether to use xformers attention patch https://github.com/facebookresearch/xformers:
         | 
| 952 | 
            -
            xformers_attention:
         | 
| 953 | 
            -
            # Whether to use flash attention patch https://github.com/Dao-AILab/flash-attention:
         | 
| 954 | 
            -
            flash_attention:
         | 
| 955 | 
            -
            flash_attn_cross_entropy:  # Whether to use flash-attention cross entropy implementation - advanced use only
         | 
| 956 | 
            -
            flash_attn_rms_norm:  # Whether to use flash-attention rms norm implementation - advanced use only
         | 
| 957 | 
            -
            flash_attn_fuse_qkv: # Whether to fuse QKV into a single operation
         | 
| 958 | 
            -
            flash_attn_fuse_mlp: # Whether to fuse part of the MLP into a single operation
         | 
| 959 | 
            -
            # Whether to use scaled-dot-product attention
         | 
| 960 | 
            -
            # https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
         | 
| 961 | 
            -
            sdp_attention:
         | 
| 962 | 
            -
            # Shifted-sparse attention (only llama) - https://arxiv.org/pdf/2309.12307.pdf
         | 
| 963 | 
            -
            s2_attention:
         | 
| 964 | 
            -
            # Resume from a specific checkpoint dir
         | 
| 965 | 
            -
            resume_from_checkpoint:
         | 
| 966 | 
            -
            # If resume_from_checkpoint isn't set and you simply want it to start where it left off.
         | 
| 967 | 
            -
            # Be careful with this being turned on between different models.
         | 
| 968 | 
            -
            auto_resume_from_checkpoints: false
         | 
| 969 | 
            -
             | 
| 970 | 
            -
            # Don't mess with this, it's here for accelerate and torchrun
         | 
| 971 | 
            -
            local_rank:
         | 
| 972 | 
            -
             | 
| 973 | 
            -
            # Add or change special tokens.
         | 
| 974 | 
            -
            # If you add tokens here, you don't need to add them to the `tokens` list.
         | 
| 975 | 
            -
            special_tokens:
         | 
| 976 | 
            -
              # bos_token: "<s>"
         | 
| 977 | 
            -
              # eos_token: "</s>"
         | 
| 978 | 
            -
              # unk_token: "<unk>"
         | 
| 979 | 
            -
             | 
| 980 | 
            -
            # Add extra tokens.
         | 
| 981 | 
            -
            tokens:
         | 
| 982 | 
            -
             | 
| 983 | 
            -
            # FSDP
         | 
| 984 | 
            -
            fsdp:
         | 
| 985 | 
            -
            fsdp_config:
         | 
| 986 | 
            -
             | 
| 987 | 
            -
            # Deepspeed config path. e.g., deepspeed_configs/zero3.json
         | 
| 988 | 
            -
            deepspeed:
         | 
| 989 | 
            -
             | 
| 990 | 
            -
            # Advanced DDP Arguments
         | 
| 991 | 
            -
            ddp_timeout:
         | 
| 992 | 
            -
            ddp_bucket_cap_mb:
         | 
| 993 | 
            -
            ddp_broadcast_buffers:
         | 
| 994 | 
            -
             | 
| 995 | 
            -
            # Path to torch distx for optim 'adamw_anyprecision'
         | 
| 996 | 
            -
            torchdistx_path:
         | 
| 997 | 
            -
             | 
| 998 | 
            -
            # Set to HF dataset for type: 'completion' for streaming instead of pre-tokenize
         | 
| 999 | 
            -
            pretraining_dataset:
         | 
| 1000 | 
            -
             | 
| 1001 | 
            -
            # Debug mode
         | 
| 1002 | 
            -
            debug:
         | 
| 1003 | 
            -
             | 
| 1004 | 
            -
            # Seed
         | 
| 1005 | 
            -
            seed:
         | 
| 1006 | 
            -
             | 
| 1007 | 
            -
            # Allow overwrite yml config using from cli
         | 
| 1008 | 
            -
            strict:
         | 
| 1009 | 
            -
            ```
         | 
| 1010 | 
            -
             | 
| 1011 | 
            -
            </details>
         | 
| 1012 |  | 
| 1013 | 
             
            <details>
         | 
| 1014 | 
             
            <summary> Understanding of batch size and gradient accumulation steps </summary>
         | 
|  | |
| 35 | 
             
              - [Google Colab](#google-colab)
         | 
| 36 | 
             
              - [Launching on public clouds via SkyPilot](#launching-on-public-clouds-via-skypilot)
         | 
| 37 | 
             
            - [Dataset](#dataset)
         | 
|  | |
|  | |
| 38 | 
             
            - [Config](#config)
         | 
| 39 | 
             
              - [Train](#train)
         | 
| 40 | 
             
              - [Inference](#inference-playground)
         | 
| 41 | 
             
              - [Merge LORA to Base](#merge-lora-to-base)
         | 
| 42 | 
             
              - [Special Tokens](#special-tokens)
         | 
| 43 | 
            +
              - [All Config Options](#all-config-options)
         | 
| 44 | 
             
            - Advanced Topics
         | 
| 45 | 
             
              - [Multipack](./docs/multipack.qmd)<svg width="24" height="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M17 13.5v6H5v-12h6m3-3h6v6m0-6-9 9" class="icon_svg-stroke" stroke="#666" stroke-width="1.5" fill="none" fill-rule="evenodd" stroke-linecap="round" stroke-linejoin="round"></path></svg>
         | 
| 46 | 
             
              - [RLHF & DPO](./docs/rlhf.qmd)<svg width="24" height="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M17 13.5v6H5v-12h6m3-3h6v6m0-6-9 9" class="icon_svg-stroke" stroke="#666" stroke-width="1.5" fill="none" fill-rule="evenodd" stroke-linecap="round" stroke-linejoin="round"></path></svg>
         | 
|  | |
| 298 |  | 
| 299 | 
             
            ### Dataset
         | 
| 300 |  | 
| 301 | 
            +
            Axolotl supports a variety of dataset formats.  It is recommended to use a JSONL.  The schema of the JSONL depends upon the task and the prompt template you wish to use.  Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
         | 
|  | |
| 302 |  | 
| 303 | 
            +
            See [these docs](https://openaccess-ai-collective.github.io/axolotl/docs/dataset-formats/) for more information on how to use different dataset formats.
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 304 |  | 
| 305 | 
             
            ### Config
         | 
| 306 |  | 
|  | |
| 385 | 
             
                - v_proj
         | 
| 386 | 
             
              ```
         | 
| 387 |  | 
| 388 | 
            +
            #### All Config Options
         | 
| 389 |  | 
| 390 | 
            +
            See [these docs](docs/config.qmd) for all config options.
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 391 |  | 
| 392 | 
             
            <details>
         | 
| 393 | 
             
            <summary> Understanding of batch size and gradient accumulation steps </summary>
         | 
    	
        _quarto.yml
    CHANGED
    
    | @@ -30,20 +30,20 @@ website: | |
| 30 | 
             
                      # TODO Edit folder structure after we have more docs.
         | 
| 31 | 
             
                        - docs/debugging.qmd
         | 
| 32 | 
             
                        - docs/multipack.qmd
         | 
| 33 | 
            -
                        - docs/ | 
| 34 | 
             
                        - docs/input_output.qmd
         | 
| 35 | 
             
                        - docs/rlhf.qmd
         | 
| 36 | 
             
                        - docs/nccl.qmd
         | 
| 37 | 
             
                        - docs/mac.qmd
         | 
| 38 | 
             
                        - docs/multi-node.qmd
         | 
|  | |
|  | |
| 39 | 
             
                    - section: "Reference"
         | 
| 40 | 
             
                      contents:
         | 
| 41 | 
             
                        - docs/config.qmd
         | 
| 42 | 
             
                    - docs/faq.qmd
         | 
| 43 |  | 
| 44 |  | 
| 45 | 
            -
             | 
| 46 | 
            -
             | 
| 47 | 
             
            format:
         | 
| 48 | 
             
              html:
         | 
| 49 | 
             
                theme: materia
         | 
|  | |
| 30 | 
             
                      # TODO Edit folder structure after we have more docs.
         | 
| 31 | 
             
                        - docs/debugging.qmd
         | 
| 32 | 
             
                        - docs/multipack.qmd
         | 
| 33 | 
            +
                        - docs/fsdp_qlora.qmd
         | 
| 34 | 
             
                        - docs/input_output.qmd
         | 
| 35 | 
             
                        - docs/rlhf.qmd
         | 
| 36 | 
             
                        - docs/nccl.qmd
         | 
| 37 | 
             
                        - docs/mac.qmd
         | 
| 38 | 
             
                        - docs/multi-node.qmd
         | 
| 39 | 
            +
                    - section: "Dataset Formats"
         | 
| 40 | 
            +
                      contents: docs/dataset-formats/*
         | 
| 41 | 
             
                    - section: "Reference"
         | 
| 42 | 
             
                      contents:
         | 
| 43 | 
             
                        - docs/config.qmd
         | 
| 44 | 
             
                    - docs/faq.qmd
         | 
| 45 |  | 
| 46 |  | 
|  | |
|  | |
| 47 | 
             
            format:
         | 
| 48 | 
             
              html:
         | 
| 49 | 
             
                theme: materia
         | 
    	
        docs/config.qmd
    CHANGED
    
    | @@ -3,15 +3,443 @@ title: Config options | |
| 3 | 
             
            description: A complete list of all configuration options.
         | 
| 4 | 
             
            ---
         | 
| 5 |  | 
| 6 | 
            -
            ``` | 
| 7 | 
            -
             | 
| 8 | 
            -
             | 
| 9 | 
            -
             | 
| 10 | 
            -
            #  | 
| 11 | 
            -
             | 
| 12 | 
            -
             | 
| 13 | 
            -
             | 
| 14 | 
            -
             | 
| 15 | 
            -
             | 
| 16 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 17 | 
             
            ```
         | 
|  | |
| 3 | 
             
            description: A complete list of all configuration options.
         | 
| 4 | 
             
            ---
         | 
| 5 |  | 
| 6 | 
            +
            ```yaml
         | 
| 7 | 
            +
            # This is the huggingface model that contains *.pt, *.safetensors, or *.bin files
         | 
| 8 | 
            +
            # This can also be a relative path to a model on disk
         | 
| 9 | 
            +
            base_model: ./llama-7b-hf
         | 
| 10 | 
            +
            # You can specify an ignore pattern if the model repo contains more than 1 model type (*.pt, etc)
         | 
| 11 | 
            +
            base_model_ignore_patterns:
         | 
| 12 | 
            +
            # If the base_model repo on hf hub doesn't include configuration .json files,
         | 
| 13 | 
            +
            # You can set that here, or leave this empty to default to base_model
         | 
| 14 | 
            +
            base_model_config: ./llama-7b-hf
         | 
| 15 | 
            +
            # You can specify to choose a specific model revision from huggingface hub
         | 
| 16 | 
            +
            revision_of_model:
         | 
| 17 | 
            +
            # Optional tokenizer configuration path in case you want to use a different tokenizer
         | 
| 18 | 
            +
            # than the one defined in the base model
         | 
| 19 | 
            +
            tokenizer_config:
         | 
| 20 | 
            +
            # If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too
         | 
| 21 | 
            +
            model_type: AutoModelForCausalLM
         | 
| 22 | 
            +
            # Corresponding tokenizer for the model AutoTokenizer is a good choice
         | 
| 23 | 
            +
            tokenizer_type: AutoTokenizer
         | 
| 24 | 
            +
            # Trust remote code for untrusted source
         | 
| 25 | 
            +
            trust_remote_code:
         | 
| 26 | 
            +
            # use_fast option for tokenizer loading from_pretrained, default to True
         | 
| 27 | 
            +
            tokenizer_use_fast:
         | 
| 28 | 
            +
            # Whether to use the legacy tokenizer setting, defaults to True
         | 
| 29 | 
            +
            tokenizer_legacy:
         | 
| 30 | 
            +
            # Resize the model embeddings when new tokens are added to multiples of 32
         | 
| 31 | 
            +
            # This is reported to improve training speed on some models
         | 
| 32 | 
            +
            resize_token_embeddings_to_32x:
         | 
| 33 | 
            +
             | 
| 34 | 
            +
            # (Internal use only)
         | 
| 35 | 
            +
            # Used to identify which the model is based on
         | 
| 36 | 
            +
            is_falcon_derived_model:
         | 
| 37 | 
            +
            is_llama_derived_model:
         | 
| 38 | 
            +
            is_qwen_derived_model:
         | 
| 39 | 
            +
            # Please note that if you set this to true, `padding_side` will be set to "left" by default
         | 
| 40 | 
            +
            is_mistral_derived_model:
         | 
| 41 | 
            +
             | 
| 42 | 
            +
            # optional overrides to the base model configuration
         | 
| 43 | 
            +
            overrides_of_model_config:
         | 
| 44 | 
            +
              # RoPE Scaling https://github.com/huggingface/transformers/pull/24653
         | 
| 45 | 
            +
              rope_scaling:
         | 
| 46 | 
            +
                type: # linear | dynamic
         | 
| 47 | 
            +
                factor: # float
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            # optional overrides to the bnb 4bit quantization configuration
         | 
| 50 | 
            +
            # https://huggingface.co/docs/transformers/main/main_classes/quantization#transformers.BitsAndBytesConfig
         | 
| 51 | 
            +
            bnb_config_kwargs:
         | 
| 52 | 
            +
              # These are default values
         | 
| 53 | 
            +
              llm_int8_has_fp16_weight: false
         | 
| 54 | 
            +
              bnb_4bit_quant_type: nf4
         | 
| 55 | 
            +
              bnb_4bit_use_double_quant: true
         | 
| 56 | 
            +
             | 
| 57 | 
            +
             | 
| 58 | 
            +
            # Whether you are training a 4-bit GPTQ quantized model
         | 
| 59 | 
            +
            gptq: true
         | 
| 60 | 
            +
             | 
| 61 | 
            +
            # This will attempt to quantize the model down to 8 bits and use adam 8 bit optimizer
         | 
| 62 | 
            +
            load_in_8bit: true
         | 
| 63 | 
            +
            # Use bitsandbytes 4 bit
         | 
| 64 | 
            +
            load_in_4bit:
         | 
| 65 | 
            +
             | 
| 66 | 
            +
            # Use CUDA bf16
         | 
| 67 | 
            +
            bf16: true # bool or 'full' for `bf16_full_eval`. require >=ampere
         | 
| 68 | 
            +
            # Use CUDA fp16
         | 
| 69 | 
            +
            fp16: true
         | 
| 70 | 
            +
            # Use CUDA tf32
         | 
| 71 | 
            +
            tf32: true # require >=ampere
         | 
| 72 | 
            +
             | 
| 73 | 
            +
            # No AMP (automatic mixed precision)
         | 
| 74 | 
            +
            bfloat16: true # require >=ampere
         | 
| 75 | 
            +
            float16: true
         | 
| 76 | 
            +
             | 
| 77 | 
            +
            # Limit the memory for all available GPUs to this amount (if an integer, expressed in gigabytes); default: unset
         | 
| 78 | 
            +
            gpu_memory_limit: 20GiB
         | 
| 79 | 
            +
            # Do the LoRA/PEFT loading on CPU -- this is required if the base model is so large it takes up most or all of the available GPU VRAM, e.g. during a model and LoRA merge
         | 
| 80 | 
            +
            lora_on_cpu: true
         | 
| 81 | 
            +
             | 
| 82 | 
            +
            # A list of one or more datasets to finetune the model with
         | 
| 83 | 
            +
            datasets:
         | 
| 84 | 
            +
              # HuggingFace dataset repo | s3://,gs:// path | "json" for local dataset, make sure to fill data_files
         | 
| 85 | 
            +
              - path: vicgalle/alpaca-gpt4
         | 
| 86 | 
            +
              # The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
         | 
| 87 | 
            +
                type: alpaca # format | format:<prompt_style> (chat/instruct) | <prompt_strategies>.load_<load_fn>
         | 
| 88 | 
            +
                ds_type: # Optional[str] (json|arrow|parquet|text|csv) defines the datatype when path is a file
         | 
| 89 | 
            +
                data_files: # Optional[str] path to source data files
         | 
| 90 | 
            +
                shards: # Optional[int] number of shards to split data into
         | 
| 91 | 
            +
                name: # Optional[str] name of dataset configuration to load
         | 
| 92 | 
            +
                train_on_split: train # Optional[str] name of dataset split to load from
         | 
| 93 | 
            +
             | 
| 94 | 
            +
                # Optional[str] fastchat conversation type, only used with type: sharegpt
         | 
| 95 | 
            +
                conversation: # Options (see Conversation 'name'): https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
         | 
| 96 | 
            +
                field_human: # Optional[str]. Human key to use for conversation.
         | 
| 97 | 
            +
                field_model: # Optional[str]. Assistant key to use for conversation.
         | 
| 98 | 
            +
                # Add additional keys from your dataset as input or output roles
         | 
| 99 | 
            +
                roles:
         | 
| 100 | 
            +
                  input: # Optional[List[str]]. These will be masked based on train_on_input
         | 
| 101 | 
            +
                  output: # Optional[List[str]].
         | 
| 102 | 
            +
             | 
| 103 | 
            +
              # Custom user instruction prompt
         | 
| 104 | 
            +
              - path: repo
         | 
| 105 | 
            +
                type:
         | 
| 106 | 
            +
                  # The below are defaults. only set what's needed if you use a different column name.
         | 
| 107 | 
            +
                  system_prompt: ""
         | 
| 108 | 
            +
                  system_format: "{system}"
         | 
| 109 | 
            +
                  field_system: system
         | 
| 110 | 
            +
                  field_instruction: instruction
         | 
| 111 | 
            +
                  field_input: input
         | 
| 112 | 
            +
                  field_output: output
         | 
| 113 | 
            +
             | 
| 114 | 
            +
                  # Customizable to be single line or multi-line
         | 
| 115 | 
            +
                  # Use {instruction}/{input} as key to be replaced
         | 
| 116 | 
            +
                  # 'format' can include {input}
         | 
| 117 | 
            +
                  format: |-
         | 
| 118 | 
            +
                    User: {instruction} {input}
         | 
| 119 | 
            +
                    Assistant:
         | 
| 120 | 
            +
                  # 'no_input_format' cannot include {input}
         | 
| 121 | 
            +
                  no_input_format: "{instruction} "
         | 
| 122 | 
            +
             | 
| 123 | 
            +
                  # For `completion` datsets only, uses the provided field instead of `text` column
         | 
| 124 | 
            +
                  field:
         | 
| 125 | 
            +
             | 
| 126 | 
            +
            # If false, the datasets will not be shuffled and will keep their original order in `datasets`.
         | 
| 127 | 
            +
            # The same applies to the `test_datasets` option and the `pretraining_dataset` option. Default is true.
         | 
| 128 | 
            +
            shuffle_merged_datasets: true
         | 
| 129 | 
            +
             | 
| 130 | 
            +
            # A list of one or more datasets to eval the model with.
         | 
| 131 | 
            +
            # You can use either test_datasets, or val_set_size, but not both.
         | 
| 132 | 
            +
            test_datasets:
         | 
| 133 | 
            +
              - path: /workspace/data/eval.jsonl
         | 
| 134 | 
            +
                ds_type: json
         | 
| 135 | 
            +
                # You need to specify a split. For "json" datasets the default split is called "train".
         | 
| 136 | 
            +
                split: train
         | 
| 137 | 
            +
                type: completion
         | 
| 138 | 
            +
                data_files:
         | 
| 139 | 
            +
                  - /workspace/data/eval.jsonl
         | 
| 140 | 
            +
             | 
| 141 | 
            +
            # use RL training: 'dpo', 'ipo', 'kto_pair'
         | 
| 142 | 
            +
            rl:
         | 
| 143 | 
            +
             | 
| 144 | 
            +
            # Saves the desired chat template to the tokenizer_config.json for easier inferencing
         | 
| 145 | 
            +
            # Currently supports chatml and inst (mistral/mixtral)
         | 
| 146 | 
            +
            chat_template: chatml
         | 
| 147 | 
            +
            # Changes the default system message
         | 
| 148 | 
            +
            default_system_message: You are a helpful assistant. Please give a long and detailed answer. # Currently only supports chatml.
         | 
| 149 | 
            +
            # Axolotl attempts to save the dataset as an arrow after packing the data together so
         | 
| 150 | 
            +
            # subsequent training attempts load faster, relative path
         | 
| 151 | 
            +
            dataset_prepared_path: data/last_run_prepared
         | 
| 152 | 
            +
            # Push prepared dataset to hub
         | 
| 153 | 
            +
            push_dataset_to_hub: # repo path
         | 
| 154 | 
            +
            # The maximum number of processes to use while preprocessing your input dataset. This defaults to `os.cpu_count()`
         | 
| 155 | 
            +
            # if not set.
         | 
| 156 | 
            +
            dataset_processes: # defaults to os.cpu_count() if not set
         | 
| 157 | 
            +
            # Keep dataset in memory while preprocessing
         | 
| 158 | 
            +
            # Only needed if cached dataset is taking too much storage
         | 
| 159 | 
            +
            dataset_keep_in_memory:
         | 
| 160 | 
            +
            # push checkpoints to hub
         | 
| 161 | 
            +
            hub_model_id: # private repo path to push finetuned model
         | 
| 162 | 
            +
            # how to push checkpoints to hub
         | 
| 163 | 
            +
            # https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
         | 
| 164 | 
            +
            hub_strategy:
         | 
| 165 | 
            +
            # Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
         | 
| 166 | 
            +
            # Required to be true when used in combination with `push_dataset_to_hub`
         | 
| 167 | 
            +
            hf_use_auth_token: # boolean
         | 
| 168 | 
            +
            # How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc. 0 for no eval.
         | 
| 169 | 
            +
            val_set_size: 0.04
         | 
| 170 | 
            +
            # Num shards for whole dataset
         | 
| 171 | 
            +
            dataset_shard_num:
         | 
| 172 | 
            +
            # Index of shard to use for whole dataset
         | 
| 173 | 
            +
            dataset_shard_idx:
         | 
| 174 | 
            +
             | 
| 175 | 
            +
            # The maximum length of an input to train with, this should typically be less than 2048
         | 
| 176 | 
            +
            # as most models have a token/context limit of 2048
         | 
| 177 | 
            +
            sequence_len: 2048
         | 
| 178 | 
            +
            # Pad inputs so each step uses constant sized buffers
         | 
| 179 | 
            +
            # This will reduce memory fragmentation and may prevent OOMs, by re-using memory more efficiently
         | 
| 180 | 
            +
            pad_to_sequence_len:
         | 
| 181 | 
            +
            # Use efficient multi-packing with block diagonal attention and per sequence position_ids. Recommend set to 'true'
         | 
| 182 | 
            +
            sample_packing:
         | 
| 183 | 
            +
            # Set to 'false' if getting errors during eval with sample_packing on.
         | 
| 184 | 
            +
            eval_sample_packing:
         | 
| 185 | 
            +
            # You can set these packing optimizations AFTER starting a training at least once.
         | 
| 186 | 
            +
            # The trainer will provide recommended values for these values.
         | 
| 187 | 
            +
            sample_packing_eff_est:
         | 
| 188 | 
            +
            total_num_tokens:
         | 
| 189 | 
            +
             | 
| 190 | 
            +
            # Passed through to transformers when loading the model when launched without accelerate
         | 
| 191 | 
            +
            # Use `sequential` when training w/ model parallelism to limit memory
         | 
| 192 | 
            +
            device_map:
         | 
| 193 | 
            +
            # Defines the max memory usage per gpu on the system. Passed through to transformers when loading the model.
         | 
| 194 | 
            +
            max_memory:
         | 
| 195 | 
            +
             | 
| 196 | 
            +
            # If you want to use 'lora' or 'qlora' or leave blank to train all parameters in original model
         | 
| 197 | 
            +
            adapter: lora
         | 
| 198 | 
            +
            # If you already have a lora model trained that you want to load, put that here.
         | 
| 199 | 
            +
            # This means after training, if you want to test the model, you should set this to the value of `output_dir`.
         | 
| 200 | 
            +
            # Note that if you merge an adapter to the base model, a new subdirectory `merged` will be created under the `output_dir`.
         | 
| 201 | 
            +
            lora_model_dir:
         | 
| 202 | 
            +
             | 
| 203 | 
            +
            # LoRA hyperparameters
         | 
| 204 | 
            +
            # For more details about the following options, see:
         | 
| 205 | 
            +
            # https://www.anyscale.com/blog/fine-tuning-llms-lora-or-full-parameter-an-in-depth-analysis-with-llama-2
         | 
| 206 | 
            +
            lora_r: 8
         | 
| 207 | 
            +
            lora_alpha: 16
         | 
| 208 | 
            +
            lora_dropout: 0.05
         | 
| 209 | 
            +
            lora_target_modules:
         | 
| 210 | 
            +
              - q_proj
         | 
| 211 | 
            +
              - v_proj
         | 
| 212 | 
            +
            #  - k_proj
         | 
| 213 | 
            +
            #  - o_proj
         | 
| 214 | 
            +
            #  - gate_proj
         | 
| 215 | 
            +
            #  - down_proj
         | 
| 216 | 
            +
            #  - up_proj
         | 
| 217 | 
            +
            lora_target_linear: # If true, will target all linear modules
         | 
| 218 | 
            +
            peft_layers_to_transform: # The layer indices to transform, otherwise, apply to all layers
         | 
| 219 | 
            +
             | 
| 220 | 
            +
            # If you added new tokens to the tokenizer, you may need to save some LoRA modules because they need to know the new tokens.
         | 
| 221 | 
            +
            # For LLaMA and Mistral, you need to save `embed_tokens` and `lm_head`. It may vary for other models.
         | 
| 222 | 
            +
            # `embed_tokens` converts tokens to embeddings, and `lm_head` converts embeddings to token probabilities.
         | 
| 223 | 
            +
            # https://github.com/huggingface/peft/issues/334#issuecomment-1561727994
         | 
| 224 | 
            +
            lora_modules_to_save:
         | 
| 225 | 
            +
            #  - embed_tokens
         | 
| 226 | 
            +
            #  - lm_head
         | 
| 227 | 
            +
             | 
| 228 | 
            +
            lora_fan_in_fan_out: false
         | 
| 229 | 
            +
             | 
| 230 | 
            +
            peft:
         | 
| 231 | 
            +
              # Configuration options for loftq initialization for LoRA
         | 
| 232 | 
            +
              # https://huggingface.co/docs/peft/developer_guides/quantization#loftq-initialization
         | 
| 233 | 
            +
              loftq_config:
         | 
| 234 | 
            +
                loftq_bits:  # typically 4 bits
         | 
| 235 | 
            +
             | 
| 236 | 
            +
            # ReLoRA configuration
         | 
| 237 | 
            +
            # Must use either 'lora' or 'qlora' adapter, and does not support fsdp or deepspeed
         | 
| 238 | 
            +
            relora_steps: # Number of steps per ReLoRA restart
         | 
| 239 | 
            +
            relora_warmup_steps: # Number of per-restart warmup steps
         | 
| 240 | 
            +
            relora_anneal_steps: # Number of anneal steps for each relora cycle
         | 
| 241 | 
            +
            relora_prune_ratio: # threshold for optimizer magnitude when pruning
         | 
| 242 | 
            +
            relora_cpu_offload: # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
         | 
| 243 | 
            +
             | 
| 244 | 
            +
            # wandb configuration if you're using it
         | 
| 245 | 
            +
            # Make sure your `WANDB_API_KEY` environment variable is set (recommended) or you login to wandb with `wandb login`.
         | 
| 246 | 
            +
            wandb_mode: # "offline" to save run metadata locally and not sync to the server, "disabled" to turn off wandb
         | 
| 247 | 
            +
            wandb_project: # Your wandb project name
         | 
| 248 | 
            +
            wandb_entity: # A wandb Team name if using a Team
         | 
| 249 | 
            +
            wandb_watch:
         | 
| 250 | 
            +
            wandb_name: # Set the name of your wandb run
         | 
| 251 | 
            +
            wandb_run_id: # Set the ID of your wandb run
         | 
| 252 | 
            +
            wandb_log_model: # "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only at the end of training
         | 
| 253 | 
            +
             | 
| 254 | 
            +
            # mlflow configuration if you're using it
         | 
| 255 | 
            +
            mlflow_tracking_uri: # URI to mlflow
         | 
| 256 | 
            +
            mlflow_experiment_name: # Your experiment name
         | 
| 257 | 
            +
            hf_mlflow_log_artifacts:  # set to true to copy each saved checkpoint on each save to mlflow artifact registry
         | 
| 258 | 
            +
             | 
| 259 | 
            +
            # Where to save the full-finetuned model to
         | 
| 260 | 
            +
            output_dir: ./completed-model
         | 
| 261 | 
            +
             | 
| 262 | 
            +
            # Whether to use torch.compile and which backend to use
         | 
| 263 | 
            +
            torch_compile:  # bool
         | 
| 264 | 
            +
            torch_compile_backend:  # Optional[str]
         | 
| 265 | 
            +
             | 
| 266 | 
            +
            # Training hyperparameters
         | 
| 267 | 
            +
             | 
| 268 | 
            +
            # If greater than 1, backpropagation will be skipped and the gradients will be accumulated for the given number of steps.
         | 
| 269 | 
            +
            gradient_accumulation_steps: 1
         | 
| 270 | 
            +
            # The number of samples to include in each batch. This is the number of samples sent to each GPU.
         | 
| 271 | 
            +
            micro_batch_size: 2
         | 
| 272 | 
            +
            eval_batch_size:
         | 
| 273 | 
            +
            num_epochs: 4
         | 
| 274 | 
            +
            warmup_steps: 100  # cannot use with warmup_ratio
         | 
| 275 | 
            +
            warmup_ratio: 0.05  # cannot use with warmup_steps
         | 
| 276 | 
            +
            learning_rate: 0.00003
         | 
| 277 | 
            +
            lr_quadratic_warmup:
         | 
| 278 | 
            +
            logging_steps:
         | 
| 279 | 
            +
            eval_steps: # Leave empty to eval at each epoch, integers for every N steps. decimal for fraction of total steps
         | 
| 280 | 
            +
            evals_per_epoch: # number of times per epoch to run evals, mutually exclusive with eval_steps
         | 
| 281 | 
            +
            save_strategy: # Set to `no` to skip checkpoint saves
         | 
| 282 | 
            +
            save_steps: # Leave empty to save at each epoch
         | 
| 283 | 
            +
            saves_per_epoch: # number of times per epoch to save a checkpoint, mutually exclusive with save_steps
         | 
| 284 | 
            +
            save_total_limit: # Checkpoints saved at a time
         | 
| 285 | 
            +
            # Maximum number of iterations to train for. It precedes num_epochs which means that
         | 
| 286 | 
            +
            # if both are set, num_epochs will not be guaranteed.
         | 
| 287 | 
            +
            # e.g., when 1 epoch is 1000 steps => `num_epochs: 2` and `max_steps: 100` will train for 100 steps
         | 
| 288 | 
            +
            max_steps:
         | 
| 289 | 
            +
             | 
| 290 | 
            +
            eval_table_size: # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
         | 
| 291 | 
            +
            eval_max_new_tokens: # Total number of tokens generated for predictions sent to wandb. Default is 128
         | 
| 292 | 
            +
            eval_causal_lm_metrics: # HF evaluate metrics used during evaluation. Default is ["sacrebleu", "comet", "ter", chrf]
         | 
| 293 | 
            +
             | 
| 294 | 
            +
            loss_watchdog_threshold: # High loss value, indicating the learning has broken down (a good estimate is ~2 times the loss at the start of training)
         | 
| 295 | 
            +
            loss_watchdog_patience: # Number of high-loss steps in a row before the trainer aborts (default: 3)
         | 
| 296 | 
            +
             | 
| 297 | 
            +
            # Save model as safetensors (require safetensors package)
         | 
| 298 | 
            +
            save_safetensors:
         | 
| 299 | 
            +
             | 
| 300 | 
            +
            # Whether to mask out or include the human's prompt from the training labels
         | 
| 301 | 
            +
            train_on_inputs: false
         | 
| 302 | 
            +
            # Group similarly sized data to minimize padding.
         | 
| 303 | 
            +
            # May be slower to start, as it must download and sort the entire dataset.
         | 
| 304 | 
            +
            # Note that training loss may have an oscillating pattern with this enabled.
         | 
| 305 | 
            +
            group_by_length: false
         | 
| 306 | 
            +
             | 
| 307 | 
            +
            # Whether to use gradient checkpointing https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing
         | 
| 308 | 
            +
            gradient_checkpointing: false
         | 
| 309 | 
            +
            # additional kwargs to pass to the trainer for gradient checkpointing
         | 
| 310 | 
            +
            # gradient_checkpointing_kwargs:
         | 
| 311 | 
            +
            #   use_reentrant: true
         | 
| 312 | 
            +
             | 
| 313 | 
            +
            # Stop training after this many evaluation losses have increased in a row
         | 
| 314 | 
            +
            # https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
         | 
| 315 | 
            +
            early_stopping_patience: 3
         | 
| 316 | 
            +
             | 
| 317 | 
            +
            # Specify a scheduler and kwargs to use with the optimizer
         | 
| 318 | 
            +
            lr_scheduler: # 'one_cycle' | 'log_sweep' | empty for cosine
         | 
| 319 | 
            +
            lr_scheduler_kwargs:
         | 
| 320 | 
            +
            cosine_min_lr_ratio: # decay lr to some percentage of the peak lr, e.g. cosine_min_lr_ratio=0.1 for 10% of peak lr
         | 
| 321 | 
            +
            cosine_constant_lr_ratio: # freeze lr at some percentage of the step, e.g. cosine_constant_lr_ratio=0.8 means start cosine_min_lr at 80% of training step (https://arxiv.org/pdf/2308.04014.pdf)
         | 
| 322 | 
            +
             | 
| 323 | 
            +
            # For one_cycle optim
         | 
| 324 | 
            +
            lr_div_factor: # Learning rate div factor
         | 
| 325 | 
            +
             | 
| 326 | 
            +
            # Specify optimizer
         | 
| 327 | 
            +
            # Valid values are driven by the Transformers OptimizerNames class, see:
         | 
| 328 | 
            +
            # https://github.com/huggingface/transformers/blob/95b374952dc27d8511541d6f5a4e22c9ec11fb24/src/transformers/training_args.py#L134
         | 
| 329 | 
            +
            #
         | 
| 330 | 
            +
            # Note that not all optimizers may be available in your environment, ex: 'adamw_anyprecision' is part of
         | 
| 331 | 
            +
            # torchdistx, 'adamw_bnb_8bit' is part of bnb.optim.Adam8bit, etc. When in doubt, it is recommended to start with the optimizer used
         | 
| 332 | 
            +
            # in the examples/ for your model and fine-tuning use case.
         | 
| 333 | 
            +
            #
         | 
| 334 | 
            +
            # Valid values for 'optimizer' include:
         | 
| 335 | 
            +
            # - adamw_hf
         | 
| 336 | 
            +
            # - adamw_torch
         | 
| 337 | 
            +
            # - adamw_torch_fused
         | 
| 338 | 
            +
            # - adamw_torch_xla
         | 
| 339 | 
            +
            # - adamw_apex_fused
         | 
| 340 | 
            +
            # - adafactor
         | 
| 341 | 
            +
            # - adamw_anyprecision
         | 
| 342 | 
            +
            # - sgd
         | 
| 343 | 
            +
            # - adagrad
         | 
| 344 | 
            +
            # - adamw_bnb_8bit
         | 
| 345 | 
            +
            # - lion_8bit
         | 
| 346 | 
            +
            # - lion_32bit
         | 
| 347 | 
            +
            # - paged_adamw_32bit
         | 
| 348 | 
            +
            # - paged_adamw_8bit
         | 
| 349 | 
            +
            # - paged_lion_32bit
         | 
| 350 | 
            +
            # - paged_lion_8bit
         | 
| 351 | 
            +
            # - galore_adamw
         | 
| 352 | 
            +
            # - galore_adamw_8bit
         | 
| 353 | 
            +
            # - galore_adafactor
         | 
| 354 | 
            +
            # - galore_adamw_layerwise
         | 
| 355 | 
            +
            # - galore_adamw_8bit_layerwise
         | 
| 356 | 
            +
            # - galore_adafactor_layerwise
         | 
| 357 | 
            +
            optimizer:
         | 
| 358 | 
            +
            # Dictionary of arguments to pass to the optimizer
         | 
| 359 | 
            +
            optim_args:
         | 
| 360 | 
            +
            # For Galore Optimizers the following optim_args are available
         | 
| 361 | 
            +
            # rank:  # type: int
         | 
| 362 | 
            +
            # update_proj_gap  # type: int
         | 
| 363 | 
            +
            # scale  # type: float
         | 
| 364 | 
            +
            # proj_type:  # type: str, default = std
         | 
| 365 | 
            +
             | 
| 366 | 
            +
            # The target modules to optimize, i.e. the module names that you would like to train, right now this is used only for GaLore algorithm
         | 
| 367 | 
            +
            optim_target_modules:
         | 
| 368 | 
            +
            # - self_attn  # for llama
         | 
| 369 | 
            +
            # - mlp
         | 
| 370 | 
            +
             | 
| 371 | 
            +
            # Specify weight decay
         | 
| 372 | 
            +
            weight_decay:
         | 
| 373 | 
            +
            # adamw hyperparams
         | 
| 374 | 
            +
            adam_beta1:
         | 
| 375 | 
            +
            adam_beta2:
         | 
| 376 | 
            +
            adam_epsilon:
         | 
| 377 | 
            +
            # Gradient clipping max norm
         | 
| 378 | 
            +
            max_grad_norm:
         | 
| 379 | 
            +
             | 
| 380 | 
            +
            # Augmentation techniques
         | 
| 381 | 
            +
            # NEFT https://arxiv.org/abs/2310.05914, set this to a number (paper default is 5) to add noise to embeddings
         | 
| 382 | 
            +
            # currently only supported on Llama and Mistral
         | 
| 383 | 
            +
            neftune_noise_alpha:
         | 
| 384 | 
            +
             | 
| 385 | 
            +
            # Whether to bettertransformers
         | 
| 386 | 
            +
            flash_optimum:
         | 
| 387 | 
            +
            # Whether to use xformers attention patch https://github.com/facebookresearch/xformers:
         | 
| 388 | 
            +
            xformers_attention:
         | 
| 389 | 
            +
            # Whether to use flash attention patch https://github.com/Dao-AILab/flash-attention:
         | 
| 390 | 
            +
            flash_attention:
         | 
| 391 | 
            +
            flash_attn_cross_entropy:  # Whether to use flash-attention cross entropy implementation - advanced use only
         | 
| 392 | 
            +
            flash_attn_rms_norm:  # Whether to use flash-attention rms norm implementation - advanced use only
         | 
| 393 | 
            +
            flash_attn_fuse_qkv: # Whether to fuse QKV into a single operation
         | 
| 394 | 
            +
            flash_attn_fuse_mlp: # Whether to fuse part of the MLP into a single operation
         | 
| 395 | 
            +
            # Whether to use scaled-dot-product attention
         | 
| 396 | 
            +
            # https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
         | 
| 397 | 
            +
            sdp_attention:
         | 
| 398 | 
            +
            # Shifted-sparse attention (only llama) - https://arxiv.org/pdf/2309.12307.pdf
         | 
| 399 | 
            +
            s2_attention:
         | 
| 400 | 
            +
            # Resume from a specific checkpoint dir
         | 
| 401 | 
            +
            resume_from_checkpoint:
         | 
| 402 | 
            +
            # If resume_from_checkpoint isn't set and you simply want it to start where it left off.
         | 
| 403 | 
            +
            # Be careful with this being turned on between different models.
         | 
| 404 | 
            +
            auto_resume_from_checkpoints: false
         | 
| 405 | 
            +
             | 
| 406 | 
            +
            # Don't mess with this, it's here for accelerate and torchrun
         | 
| 407 | 
            +
            local_rank:
         | 
| 408 | 
            +
             | 
| 409 | 
            +
            # Add or change special tokens.
         | 
| 410 | 
            +
            # If you add tokens here, you don't need to add them to the `tokens` list.
         | 
| 411 | 
            +
            special_tokens:
         | 
| 412 | 
            +
              # bos_token: "<s>"
         | 
| 413 | 
            +
              # eos_token: "</s>"
         | 
| 414 | 
            +
              # unk_token: "<unk>"
         | 
| 415 | 
            +
             | 
| 416 | 
            +
            # Add extra tokens.
         | 
| 417 | 
            +
            tokens:
         | 
| 418 | 
            +
             | 
| 419 | 
            +
            # FSDP
         | 
| 420 | 
            +
            fsdp:
         | 
| 421 | 
            +
            fsdp_config:
         | 
| 422 | 
            +
             | 
| 423 | 
            +
            # Deepspeed config path. e.g., deepspeed_configs/zero3.json
         | 
| 424 | 
            +
            deepspeed:
         | 
| 425 | 
            +
             | 
| 426 | 
            +
            # Advanced DDP Arguments
         | 
| 427 | 
            +
            ddp_timeout:
         | 
| 428 | 
            +
            ddp_bucket_cap_mb:
         | 
| 429 | 
            +
            ddp_broadcast_buffers:
         | 
| 430 | 
            +
             | 
| 431 | 
            +
            # Path to torch distx for optim 'adamw_anyprecision'
         | 
| 432 | 
            +
            torchdistx_path:
         | 
| 433 | 
            +
             | 
| 434 | 
            +
            # Set to HF dataset for type: 'completion' for streaming instead of pre-tokenize
         | 
| 435 | 
            +
            pretraining_dataset:
         | 
| 436 | 
            +
             | 
| 437 | 
            +
            # Debug mode
         | 
| 438 | 
            +
            debug:
         | 
| 439 | 
            +
             | 
| 440 | 
            +
            # Seed
         | 
| 441 | 
            +
            seed:
         | 
| 442 | 
            +
             | 
| 443 | 
            +
            # Allow overwrite yml config using from cli
         | 
| 444 | 
            +
            strict:
         | 
| 445 | 
             
            ```
         | 
    	
        docs/dataset-formats/conversation.qmd
    ADDED
    
    | @@ -0,0 +1,71 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            title: Conversation
         | 
| 3 | 
            +
            description: Conversation format for supervised fine-tuning.
         | 
| 4 | 
            +
            order: 1
         | 
| 5 | 
            +
            ---
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            ## Formats
         | 
| 8 | 
            +
             | 
| 9 | 
            +
            ### sharegpt
         | 
| 10 | 
            +
             | 
| 11 | 
            +
            conversations where `from` is `human`/`gpt`. (optional: first row with role `system` to override default system prompt)
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 14 | 
            +
            {"conversations": [{"from": "...", "value": "..."}]}
         | 
| 15 | 
            +
            ```
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            Note: `type: sharegpt` opens a special config `conversation:` that enables conversions to many Conversation types. See [the docs](../docs/config.qmd) for all config options.
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            ### pygmalion
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 22 | 
            +
            {"conversations": [{"role": "...", "value": "..."}]}
         | 
| 23 | 
            +
            ```
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            ### sharegpt.load_role
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            conversations where `role` is used instead of `from`
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 30 | 
            +
            {"conversations": [{"role": "...", "value": "..."}]}
         | 
| 31 | 
            +
            ```
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            ### sharegpt.load_guanaco
         | 
| 34 | 
            +
             | 
| 35 | 
            +
            conversations where `from` is `prompter` `assistant` instead of default sharegpt
         | 
| 36 | 
            +
             | 
| 37 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 38 | 
            +
            {"conversations": [{"from": "...", "value": "..."}]}
         | 
| 39 | 
            +
            ```
         | 
| 40 | 
            +
             | 
| 41 | 
            +
            ### sharegpt_jokes
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            creates a chat where bot is asked to tell a joke, then explain why the joke is funny
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 46 | 
            +
            {"conversations": [{"title": "...", "text": "...", "explanation": "..."}]}
         | 
| 47 | 
            +
            ```
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            ## How to add custom prompts for instruction-tuning
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            For a dataset that is preprocessed for instruction purposes:
         | 
| 52 | 
            +
             | 
| 53 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 54 | 
            +
            {"input": "...", "output": "..."}
         | 
| 55 | 
            +
            ```
         | 
| 56 | 
            +
             | 
| 57 | 
            +
            You can use this example in your YAML config:
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            ```{.yaml filename="config.yaml"}
         | 
| 60 | 
            +
            datasets:
         | 
| 61 | 
            +
              - path: repo
         | 
| 62 | 
            +
                type:
         | 
| 63 | 
            +
                  system_prompt: ""
         | 
| 64 | 
            +
                  field_system: system
         | 
| 65 | 
            +
                  field_instruction: input
         | 
| 66 | 
            +
                  field_output: output
         | 
| 67 | 
            +
                  format: "[INST] {instruction} [/INST]"
         | 
| 68 | 
            +
                  no_input_format: "[INST] {instruction} [/INST]"
         | 
| 69 | 
            +
            ```
         | 
| 70 | 
            +
             | 
| 71 | 
            +
            See full config options under [here](../docs/config.qmd).
         | 
    	
        docs/dataset-formats/index.qmd
    ADDED
    
    | @@ -0,0 +1,14 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            title: Dataset Formats
         | 
| 3 | 
            +
            description: Supported dataset formats.
         | 
| 4 | 
            +
            listing:
         | 
| 5 | 
            +
              fields: [title, description]
         | 
| 6 | 
            +
              type: table
         | 
| 7 | 
            +
              sort-ui: false
         | 
| 8 | 
            +
              filter-ui: false
         | 
| 9 | 
            +
              max-description-length: 250
         | 
| 10 | 
            +
            ---
         | 
| 11 | 
            +
             | 
| 12 | 
            +
            Axolotl supports a variety of dataset formats.  It is recommended to use a JSONL format.  The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
         | 
| 13 | 
            +
             | 
| 14 | 
            +
            Below are these various formats organized by task:
         | 
    	
        docs/dataset-formats/inst_tune.qmd
    ADDED
    
    | @@ -0,0 +1,165 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            title: Instruction Tuning
         | 
| 3 | 
            +
            description: Instruction tuning formats for supervised fine-tuning.
         | 
| 4 | 
            +
            order: 2
         | 
| 5 | 
            +
            ---
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            ## alpaca
         | 
| 8 | 
            +
             | 
| 9 | 
            +
            instruction; input(optional)
         | 
| 10 | 
            +
             | 
| 11 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 12 | 
            +
            {"instruction": "...", "input": "...", "output": "..."}
         | 
| 13 | 
            +
            ```
         | 
| 14 | 
            +
             | 
| 15 | 
            +
            ## jeopardy
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            question and answer
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 20 | 
            +
            {"question": "...", "category": "...", "answer": "..."}
         | 
| 21 | 
            +
            ```
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            ## oasst
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            instruction
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 28 | 
            +
            {"INSTRUCTION": "...", "RESPONSE": "..."}
         | 
| 29 | 
            +
            ```
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            ## gpteacher
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            instruction; input(optional)
         | 
| 34 | 
            +
             | 
| 35 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 36 | 
            +
            {"instruction": "...", "input": "...", "response": "..."}
         | 
| 37 | 
            +
            ```
         | 
| 38 | 
            +
             | 
| 39 | 
            +
            ## reflection
         | 
| 40 | 
            +
             | 
| 41 | 
            +
            instruction with reflect; input(optional)
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 44 | 
            +
            {"instruction": "...", "input": "...", "output": "...", "reflection": "...", "corrected": "..."}
         | 
| 45 | 
            +
            ```
         | 
| 46 | 
            +
             | 
| 47 | 
            +
            ## explainchoice
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            question, choices, (solution OR explanation)
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 52 | 
            +
            {"question": "...", "choices": ["..."], "solution": "...", "explanation": "..."}
         | 
| 53 | 
            +
            ```
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            ## concisechoice
         | 
| 56 | 
            +
             | 
| 57 | 
            +
            question, choices, (solution OR explanation)
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 60 | 
            +
            {"question": "...", "choices": ["..."], "solution": "...", "explanation": "..."}
         | 
| 61 | 
            +
            ```
         | 
| 62 | 
            +
             | 
| 63 | 
            +
            ## summarizetldr
         | 
| 64 | 
            +
             | 
| 65 | 
            +
            article and summary
         | 
| 66 | 
            +
             | 
| 67 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 68 | 
            +
            {"article": "...", "summary": "..."}
         | 
| 69 | 
            +
            ```
         | 
| 70 | 
            +
             | 
| 71 | 
            +
            ## alpaca_chat
         | 
| 72 | 
            +
             | 
| 73 | 
            +
            basic instruct for alpaca chat
         | 
| 74 | 
            +
             | 
| 75 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 76 | 
            +
            {"instruction": "...", "input": "...", "response": "..."}
         | 
| 77 | 
            +
            ```
         | 
| 78 | 
            +
             | 
| 79 | 
            +
            ## alpaca_chat.load_qa
         | 
| 80 | 
            +
             | 
| 81 | 
            +
            question and answer for alpaca chat
         | 
| 82 | 
            +
             | 
| 83 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 84 | 
            +
            {"question": "...", "answer": "..."}
         | 
| 85 | 
            +
            ```
         | 
| 86 | 
            +
             | 
| 87 | 
            +
            ## alpaca_chat.load_concise
         | 
| 88 | 
            +
             | 
| 89 | 
            +
            question and answer for alpaca chat, for concise answers
         | 
| 90 | 
            +
             | 
| 91 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 92 | 
            +
            {"instruction": "...", "input": "...", "response": "..."}
         | 
| 93 | 
            +
            ```
         | 
| 94 | 
            +
             | 
| 95 | 
            +
            ## alpaca_chat.load_camel_ai
         | 
| 96 | 
            +
             | 
| 97 | 
            +
            question and answer for alpaca chat, for load_camel_ai
         | 
| 98 | 
            +
             | 
| 99 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 100 | 
            +
            {"message_1": "...", "message_2": "..."}
         | 
| 101 | 
            +
            ```
         | 
| 102 | 
            +
             | 
| 103 | 
            +
            ## alpaca_w_system.load_open_orca
         | 
| 104 | 
            +
             | 
| 105 | 
            +
            support for open orca datasets with included system prompts, instruct
         | 
| 106 | 
            +
             | 
| 107 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 108 | 
            +
            {"system_prompt": "...", "question": "...", "response": "..."}
         | 
| 109 | 
            +
            ```
         | 
| 110 | 
            +
             | 
| 111 | 
            +
            ## context_qa
         | 
| 112 | 
            +
             | 
| 113 | 
            +
            in context question answering from an article
         | 
| 114 | 
            +
             | 
| 115 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 116 | 
            +
            {"article": "...", "question": "...", "answer": "..."}
         | 
| 117 | 
            +
            ```
         | 
| 118 | 
            +
             | 
| 119 | 
            +
            ## context_qa.load_v2
         | 
| 120 | 
            +
             | 
| 121 | 
            +
            in context question answering (alternate)
         | 
| 122 | 
            +
             | 
| 123 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 124 | 
            +
            {"context": "...", "question": "...", "answer": "..."}
         | 
| 125 | 
            +
            ```
         | 
| 126 | 
            +
             | 
| 127 | 
            +
            ## context_qa.load_404
         | 
| 128 | 
            +
             | 
| 129 | 
            +
            in context question answering from an article, with default response for no answer from context
         | 
| 130 | 
            +
             | 
| 131 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 132 | 
            +
            {"article": "...", "unanswerable_question": "..."}
         | 
| 133 | 
            +
            ```
         | 
| 134 | 
            +
             | 
| 135 | 
            +
            ## creative_acr.load_answer
         | 
| 136 | 
            +
             | 
| 137 | 
            +
            instruction and revision
         | 
| 138 | 
            +
             | 
| 139 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 140 | 
            +
            {"instruction": "...", "revision": "..."}
         | 
| 141 | 
            +
            ```
         | 
| 142 | 
            +
             | 
| 143 | 
            +
            ## creative_acr.load_critique
         | 
| 144 | 
            +
             | 
| 145 | 
            +
            critique
         | 
| 146 | 
            +
             | 
| 147 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 148 | 
            +
            {"scores": "...", "critiques": "...", "instruction": "...", "answer": "..."}
         | 
| 149 | 
            +
            ```
         | 
| 150 | 
            +
             | 
| 151 | 
            +
            ## creative_acr.load_revise
         | 
| 152 | 
            +
             | 
| 153 | 
            +
            critique and revise
         | 
| 154 | 
            +
             | 
| 155 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 156 | 
            +
            {"scores": "...", "critiques": "...", "instruction": "...", "answer": "...", "revision": "..."}
         | 
| 157 | 
            +
            ```
         | 
| 158 | 
            +
             | 
| 159 | 
            +
            ## metharme
         | 
| 160 | 
            +
             | 
| 161 | 
            +
            instruction, adds additional eos tokens
         | 
| 162 | 
            +
             | 
| 163 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 164 | 
            +
            {"prompt": "...", "generation": "..."}
         | 
| 165 | 
            +
            ```
         | 
    	
        docs/dataset-formats/pretraining.qmd
    ADDED
    
    | @@ -0,0 +1,26 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            title: Pre-training
         | 
| 3 | 
            +
            description: Data format for a pre-training completion task.
         | 
| 4 | 
            +
            order: 3
         | 
| 5 | 
            +
            ---
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            For pretraining, there is no prompt template or roles.  The only required field is `text`:
         | 
| 8 | 
            +
             | 
| 9 | 
            +
            ```{.json filename="data.jsonl"}
         | 
| 10 | 
            +
            {"text": "first row"}
         | 
| 11 | 
            +
            {"text": "second row"}
         | 
| 12 | 
            +
            ...
         | 
| 13 | 
            +
            ```
         | 
| 14 | 
            +
             | 
| 15 | 
            +
            :::{.callout-note}
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            ### Streaming is recommended for large datasets
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            Axolotl usually loads the entire dataset into memory. This will be challenging for large datasets. Use the following config to enable streaming:
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            ```{.yaml filename="config.yaml"}
         | 
| 22 | 
            +
            pretraining_dataset: # hf path only
         | 
| 23 | 
            +
            ...
         | 
| 24 | 
            +
            ```
         | 
| 25 | 
            +
             | 
| 26 | 
            +
            :::
         | 
    	
        docs/dataset-formats/template_free.qmd
    ADDED
    
    | @@ -0,0 +1,7 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            title: Template-Free
         | 
| 3 | 
            +
            description: Construct prompts without a template.
         | 
| 4 | 
            +
            order: 4
         | 
| 5 | 
            +
            ---
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            See [these docs](../input_output.qmd).
         | 
    	
        docs/dataset-formats/tokenized.qmd
    ADDED
    
    | @@ -0,0 +1,12 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            title: Custom Pre-Tokenized Dataset
         | 
| 3 | 
            +
            description: How to use a custom pre-tokenized dataset.
         | 
| 4 | 
            +
            order: 5
         | 
| 5 | 
            +
            ---
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            - Do not pass a `type:` in your axolotl config.
         | 
| 8 | 
            +
            - Columns in Dataset must be exactly `input_ids`, `attention_mask`, `labels`
         | 
| 9 | 
            +
             | 
| 10 | 
            +
            ```{.yaml filename="config.yml"}
         | 
| 11 | 
            +
            - path: ...
         | 
| 12 | 
            +
            ```
         | 
    	
        docs/fsdp_qlora.qmd
    CHANGED
    
    | @@ -1,5 +1,5 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
            -
            title: FDSP + QLoRA
         | 
| 3 | 
             
            description: Use FSDP with QLoRA to fine-tune large LLMs on consumer GPUs.
         | 
| 4 | 
             
            format:
         | 
| 5 | 
             
              html:
         | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
            +
            title: "FDSP + QLoRA"
         | 
| 3 | 
             
            description: Use FSDP with QLoRA to fine-tune large LLMs on consumer GPUs.
         | 
| 4 | 
             
            format:
         | 
| 5 | 
             
              html:
         | 
    	
        docs/input_output.qmd
    CHANGED
    
    | @@ -91,8 +91,9 @@ format into a jsonl file (below is the first row from the file | |
| 91 |  | 
| 92 | 
             
            ```bash
         | 
| 93 | 
             
            $ head -n1 output.jsonl | python -m json.tool
         | 
|  | |
| 94 |  | 
| 95 | 
            -
            {.cell-output .cell-output-stdout}
         | 
| 96 | 
             
                {
         | 
| 97 | 
             
                    "segments": [
         | 
| 98 | 
             
                        {
         | 
| @@ -113,7 +114,7 @@ $ head -n1 output.jsonl | python -m json.tool | |
| 113 | 
             
                        }
         | 
| 114 | 
             
                    ]
         | 
| 115 | 
             
                }
         | 
| 116 | 
            -
             | 
| 117 |  | 
| 118 | 
             
            Set `label:false` when you want to mask a segment of text so that the
         | 
| 119 | 
             
            model isn't trained on it. Some things to keep in mind:
         | 
| @@ -238,8 +239,9 @@ version is repeated below for reference): | |
| 238 |  | 
| 239 | 
             
            ```bash
         | 
| 240 | 
             
            $ head -n1 output.jsonl | python -m json.tool
         | 
|  | |
| 241 |  | 
| 242 | 
            -
            {.cell-output .cell-output-stdout}
         | 
| 243 | 
             
                {
         | 
| 244 | 
             
                    "segments": [
         | 
| 245 | 
             
                        {
         | 
| @@ -260,4 +262,4 @@ $ head -n1 output.jsonl | python -m json.tool | |
| 260 | 
             
                        }
         | 
| 261 | 
             
                    ]
         | 
| 262 | 
             
                }
         | 
| 263 | 
            -
             | 
|  | |
| 91 |  | 
| 92 | 
             
            ```bash
         | 
| 93 | 
             
            $ head -n1 output.jsonl | python -m json.tool
         | 
| 94 | 
            +
            ```
         | 
| 95 |  | 
| 96 | 
            +
            :::{.cell-output .cell-output-stdout}
         | 
| 97 | 
             
                {
         | 
| 98 | 
             
                    "segments": [
         | 
| 99 | 
             
                        {
         | 
|  | |
| 114 | 
             
                        }
         | 
| 115 | 
             
                    ]
         | 
| 116 | 
             
                }
         | 
| 117 | 
            +
            :::
         | 
| 118 |  | 
| 119 | 
             
            Set `label:false` when you want to mask a segment of text so that the
         | 
| 120 | 
             
            model isn't trained on it. Some things to keep in mind:
         | 
|  | |
| 239 |  | 
| 240 | 
             
            ```bash
         | 
| 241 | 
             
            $ head -n1 output.jsonl | python -m json.tool
         | 
| 242 | 
            +
            ```
         | 
| 243 |  | 
| 244 | 
            +
            :::{.cell-output .cell-output-stdout}
         | 
| 245 | 
             
                {
         | 
| 246 | 
             
                    "segments": [
         | 
| 247 | 
             
                        {
         | 
|  | |
| 262 | 
             
                        }
         | 
| 263 | 
             
                    ]
         | 
| 264 | 
             
                }
         | 
| 265 | 
            +
            :::
         |