The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `4` More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. gradient_accumulation_steps: 4 gradient_accumulation_steps: 4 Params using prompt template alpaca: base_model: baichuan-inc/Baichuan2-7B-Base data_path: ../../data/belle_dolphine/p14.jsonl output_dir: ../out/lora/p14 batch_size: 32 micro_batch_size: 2 num_epochs: 1 learning_rate: 0.0004 cutoff_len: 4096 val_set_size: 0 lr_scheduler: cosine warmup_steps: 100 lora_r: 16 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['gate_proj', 'down_proj', 'up_proj'] train_on_inputs: False add_eos_token: False group_by_length: False wandb_project: lora-moe wandb_run_name: belle_dolphine-p14 wandb_watch: wandb_log_model: resume_from_checkpoint: False gradient_accumulation_steps: 4 gradient_accumulation_steps: 4 Loading checkpoint shards: 0%| | 0/2 [00:00 It should be 1 2 None pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 8.57s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 9.16s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 8.65s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 9.23s/it] trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 Map: 0%| | 0/217273 [00:00 It should be 1 2 None Map: 0%| | 477/217273 [00:00<04:06, 877.81 examples/s] Map: 0%| | 279/217273 [00:00<03:53, 929.86 examples/s] Map: 0%| | 568/217273 [00:00<04:04, 887.50 examples/s] Map: 0%| | 375/217273 [00:00<03:50, 940.93 examples/s]pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None Map: 0%| | 661/217273 [00:00<04:00, 899.15 examples/s] Map: 0%| | 476/217273 [00:00<03:46, 956.82 examples/s] Map: 0%| | 753/217273 [00:00<03:59, 903.09 examples/s] Map: 0%| | 611/217273 [00:00<03:53, 926.64 examples/s] Map: 0%| | 845/217273 [00:00<03:59, 904.79 examples/s] Map: 0%| | 707/217273 [00:00<03:51, 934.70 examples/s] Map: 0%| | 936/217273 [00:01<03:58, 905.97 examples/s] Map: 0%| | 802/217273 [00:00<03:51, 933.82 examples/s] Map: 0%| | 897/217273 [00:00<03:51, 936.60 examples/s] Map: 0%| | 1052/217273 [00:01<04:40, 769.66 examples/s] Map: 1%| | 1150/217273 [00:01<04:23, 820.10 examples/s] Map: 0%| | 1000/217273 [00:01<04:47, 751.11 examples/s] Map: 1%| | 1282/217273 [00:01<04:17, 838.23 examples/s] Map: 1%| | 1097/217273 [00:01<04:28, 803.73 examples/s] Map: 1%| | 1378/217273 [00:01<04:09, 865.36 examples/s] Map: 1%| | 1197/217273 [00:01<04:13, 851.56 examples/s] Map: 1%| | 1471/217273 [00:01<04:05, 878.26 examples/s] Map: 1%| | 1294/217273 [00:01<04:05, 879.07 examples/s] Map: 1%| | 1387/217273 [00:01<04:03, 885.78 examples/s] Map: 1%| | 1603/217273 [00:01<04:06, 876.40 examples/s]trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 Map: 1%| | 1481/217273 [00:01<04:00, 898.80 examples/s] Map: 1%| | 1702/217273 [00:01<03:58, 904.13 examples/s] Map: 1%| | 1575/217273 [00:01<03:57, 906.98 examples/s]trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 Map: 0%| | 0/217273 [00:00