jameshwadedow commited on
Commit
4103281
Β·
1 Parent(s): 70ba9c2

[feat] tell user that button doesn't work

Browse files
Files changed (2) hide show
  1. axolotl-config.md +411 -0
  2. src/axolotl_ui/app.py +6 -2
axolotl-config.md CHANGED
@@ -0,0 +1,411 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This is the huggingface model that contains *.pt, *.safetensors, or *.bin files
2
+ # This can also be a relative path to a model on disk
3
+ base_model: ./llama-7b-hf
4
+ # You can specify an ignore pattern if the model repo contains more than 1 model type (*.pt, etc)
5
+ base_model_ignore_patterns:
6
+ # If the base_model repo on hf hub doesn't include configuration .json files,
7
+ # You can set that here, or leave this empty to default to base_model
8
+ base_model_config: ./llama-7b-hf
9
+ # You can specify to choose a specific model revision from huggingface hub
10
+ model_revision:
11
+ # Optional tokenizer configuration override in case you want to use a different tokenizer
12
+ # than the one defined in the base model
13
+ tokenizer_config:
14
+ # If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too
15
+ model_type: AutoModelForCausalLM
16
+ # Corresponding tokenizer for the model AutoTokenizer is a good choice
17
+ tokenizer_type: AutoTokenizer
18
+ # Trust remote code for untrusted source
19
+ trust_remote_code:
20
+ # use_fast option for tokenizer loading from_pretrained, default to True
21
+ tokenizer_use_fast:
22
+ # Whether to use the legacy tokenizer setting, defaults to True
23
+ tokenizer_legacy:
24
+ # Resize the model embeddings when new tokens are added to multiples of 32
25
+ # This is reported to improve training speed on some models
26
+ resize_token_embeddings_to_32x:
27
+
28
+ # Used to identify which the model is based on
29
+ is_falcon_derived_model:
30
+ is_llama_derived_model:
31
+ # Please note that if you set this to true, `padding_side` will be set to "left" by default
32
+ is_mistral_derived_model:
33
+ is_qwen_derived_model:
34
+
35
+ # optional overrides to the base model configuration
36
+ model_config:
37
+ # RoPE Scaling https://github.com/huggingface/transformers/pull/24653
38
+ rope_scaling:
39
+ type: # linear | dynamic
40
+ factor: # float
41
+
42
+ # optional overrides to the bnb 4bit quantization configuration
43
+ # https://huggingface.co/docs/transformers/main/main_classes/quantization#transformers.BitsAndBytesConfig
44
+ bnb_config_kwargs:
45
+ # These are default values
46
+ llm_int8_has_fp16_weight: false
47
+ bnb_4bit_quant_type: nf4
48
+ bnb_4bit_use_double_quant: true
49
+
50
+
51
+ # Whether you are training a 4-bit GPTQ quantized model
52
+ gptq: true
53
+ gptq_groupsize: 128 # group size
54
+ gptq_model_v1: false # v1 or v2
55
+
56
+ # This will attempt to quantize the model down to 8 bits and use adam 8 bit optimizer
57
+ load_in_8bit: true
58
+ # Use bitsandbytes 4 bit
59
+ load_in_4bit:
60
+
61
+ # Use CUDA bf16
62
+ bf16: true # bool or 'full' for `bf16_full_eval`. require >=ampere
63
+ # Use CUDA fp16
64
+ fp16: true
65
+ # Use CUDA tf32
66
+ tf32: true # require >=ampere
67
+
68
+ # No AMP (automatic mixed precision)
69
+ bfloat16: true # require >=ampere
70
+ float16: true
71
+
72
+ # Limit the memory for all available GPUs to this amount (if an integer, expressed in gigabytes); default: unset
73
+ gpu_memory_limit: 20GiB
74
+ # Do the LoRA/PEFT loading on CPU -- this is required if the base model is so large it takes up most or all of the available GPU VRAM, e.g. during a model and LoRA merge
75
+ lora_on_cpu: true
76
+
77
+ # A list of one or more datasets to finetune the model with
78
+ datasets:
79
+ # HuggingFace dataset repo | s3://,gs:// path | "json" for local dataset, make sure to fill data_files
80
+ - path: vicgalle/alpaca-gpt4
81
+ # The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
82
+ type: alpaca # format | format:<prompt_style> (chat/instruct) | <prompt_strategies>.load_<load_fn>
83
+ ds_type: # Optional[str] (json|arrow|parquet|text|csv) defines the datatype when path is a file
84
+ data_files: # Optional[str] path to source data files
85
+ shards: # Optional[int] number of shards to split data into
86
+ name: # Optional[str] name of dataset configuration to load
87
+ train_on_split: train # Optional[str] name of dataset split to load from
88
+
89
+ # Optional[str] fastchat conversation type, only used with type: sharegpt
90
+ conversation: # Options (see Conversation 'name'): https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
91
+ field_human: # Optional[str]. Human key to use for conversation.
92
+ field_model: # Optional[str]. Assistant key to use for conversation.
93
+
94
+ # Custom user instruction prompt
95
+ - path: repo
96
+ type:
97
+ # The below are defaults. only set what's needed if you use a different column name.
98
+ system_prompt: ""
99
+ system_format: "{system}"
100
+ field_system: system
101
+ field_instruction: instruction
102
+ field_input: input
103
+ field_output: output
104
+
105
+ # Customizable to be single line or multi-line
106
+ # Use {instruction}/{input} as key to be replaced
107
+ # 'format' can include {input}
108
+ format: |-
109
+ User: {instruction} {input}
110
+ Assistant:
111
+ # 'no_input_format' cannot include {input}
112
+ no_input_format: "{instruction} "
113
+
114
+ # For `completion` datsets only, uses the provided field instead of `text` column
115
+ field:
116
+
117
+ # A list of one or more datasets to eval the model with.
118
+ # You can use either test_datasets, or val_set_size, but not both.
119
+ test_datasets:
120
+ - path: /workspace/data/eval.jsonl
121
+ ds_type: json
122
+ # You need to specify a split. For "json" datasets the default split is called "train".
123
+ split: train
124
+ type: completion
125
+ data_files:
126
+ - /workspace/data/eval.jsonl
127
+
128
+ # use RL training: dpo, ipo, kto_pair
129
+ rl:
130
+
131
+ # Saves the desired chat template to the tokenizer_config.json for easier inferencing
132
+ # Currently supports chatml and inst (mistral/mixtral)
133
+ chat_template: chatml
134
+ # Changes the default system message
135
+ default_system_message: You are a helpful assistant. Please give a long and detailed answer. # Currently only supports chatml.
136
+ # Axolotl attempts to save the dataset as an arrow after packing the data together so
137
+ # subsequent training attempts load faster, relative path
138
+ dataset_prepared_path: data/last_run_prepared
139
+ # Push prepared dataset to hub
140
+ push_dataset_to_hub: # repo path
141
+ # The maximum number of processes to use while preprocessing your input dataset. This defaults to `os.cpu_count()`
142
+ # if not set.
143
+ dataset_processes: # defaults to os.cpu_count() if not set
144
+ # Keep dataset in memory while preprocessing
145
+ # Only needed if cached dataset is taking too much storage
146
+ dataset_keep_in_memory:
147
+ # push checkpoints to hub
148
+ hub_model_id: # repo path to push finetuned model
149
+ # how to push checkpoints to hub
150
+ # https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
151
+ hub_strategy:
152
+ # Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
153
+ # Required to be true when used in combination with `push_dataset_to_hub`
154
+ hf_use_auth_token: # boolean
155
+ # How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc. 0 for no eval.
156
+ val_set_size: 0.04
157
+ # Num shards for whole dataset
158
+ dataset_shard_num:
159
+ # Index of shard to use for whole dataset
160
+ dataset_shard_idx:
161
+
162
+ # The maximum length of an input to train with, this should typically be less than 2048
163
+ # as most models have a token/context limit of 2048
164
+ sequence_len: 2048
165
+ # Pad inputs so each step uses constant sized buffers
166
+ # This will reduce memory fragmentation and may prevent OOMs, by re-using memory more efficiently
167
+ pad_to_sequence_len:
168
+ # Use efficient multi-packing with block diagonal attention and per sequence position_ids. Recommend set to 'true'
169
+ sample_packing:
170
+ # Set to 'false' if getting errors during eval with sample_packing on.
171
+ eval_sample_packing:
172
+ # You can set these packing optimizations AFTER starting a training at least once.
173
+ # The trainer will provide recommended values for these values.
174
+ sample_packing_eff_est:
175
+ total_num_tokens:
176
+
177
+ # Passed through to transformers when loading the model when launched without accelerate
178
+ # Use `sequential` when training w/ model parallelism to limit memory
179
+ device_map:
180
+ # Defines the max memory usage per gpu on the system. Passed through to transformers when loading the model.
181
+ max_memory:
182
+
183
+ # If you want to use 'lora' or 'qlora' or leave blank to train all parameters in original model
184
+ adapter: lora
185
+ # If you already have a lora model trained that you want to load, put that here.
186
+ # This means after training, if you want to test the model, you should set this to the value of `output_dir`.
187
+ # Note that if you merge an adapter to the base model, a new subdirectory `merged` will be created under the `output_dir`.
188
+ lora_model_dir:
189
+
190
+ # LoRA hyperparameters
191
+ # For more details about the following options, see:
192
+ # https://www.anyscale.com/blog/fine-tuning-llms-lora-or-full-parameter-an-in-depth-analysis-with-llama-2
193
+ lora_r: 8
194
+ lora_alpha: 16
195
+ lora_dropout: 0.05
196
+ lora_target_modules:
197
+ - q_proj
198
+ - v_proj
199
+ # - k_proj
200
+ # - o_proj
201
+ # - gate_proj
202
+ # - down_proj
203
+ # - up_proj
204
+ lora_target_linear: # If true, will target all linear modules
205
+ peft_layers_to_transform: # The layer indices to transform, otherwise, apply to all layers
206
+
207
+ # If you added new tokens to the tokenizer, you may need to save some LoRA modules because they need to know the new tokens.
208
+ # For LLaMA and Mistral, you need to save `embed_tokens` and `lm_head`. It may vary for other models.
209
+ # `embed_tokens` converts tokens to embeddings, and `lm_head` converts embeddings to token probabilities.
210
+ # https://github.com/huggingface/peft/issues/334#issuecomment-1561727994
211
+ lora_modules_to_save:
212
+ # - embed_tokens
213
+ # - lm_head
214
+
215
+ lora_fan_in_fan_out: false
216
+
217
+ peft:
218
+ # Configuration options for loftq initialization for LoRA
219
+ # https://huggingface.co/docs/peft/developer_guides/quantization#loftq-initialization
220
+ loftq_config:
221
+ loftq_bits: # typically 4 bits
222
+
223
+ # ReLoRA configuration
224
+ # Must use either 'lora' or 'qlora' adapter, and does not support fsdp or deepspeed
225
+ relora_steps: # Number of steps per ReLoRA restart
226
+ relora_warmup_steps: # Number of per-restart warmup steps
227
+ relora_cpu_offload: # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
228
+
229
+ # wandb configuration if you're using it
230
+ # Make sure your `WANDB_API_KEY` environment variable is set (recommended) or you login to wandb with `wandb login`.
231
+ wandb_mode: # "offline" to save run metadata locally and not sync to the server, "disabled" to turn off wandb
232
+ wandb_project: # Your wandb project name
233
+ wandb_entity: # A wandb Team name if using a Team
234
+ wandb_watch:
235
+ wandb_name: # Set the name of your wandb run
236
+ wandb_run_id: # Set the ID of your wandb run
237
+ wandb_log_model: # "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only at the end of training
238
+
239
+ # mlflow configuration if you're using it
240
+ mlflow_tracking_uri: # URI to mlflow
241
+ mlflow_experiment_name: # Your experiment name
242
+
243
+ # Where to save the full-finetuned model to
244
+ output_dir: ./completed-model
245
+
246
+ # Whether to use torch.compile and which backend to use
247
+ torch_compile: # bool
248
+ torch_compile_backend: # Optional[str]
249
+
250
+ # Training hyperparameters
251
+
252
+ # If greater than 1, backpropagation will be skipped and the gradients will be accumulated for the given number of steps.
253
+ gradient_accumulation_steps: 1
254
+ # The number of samples to include in each batch. This is the number of samples sent to each GPU.
255
+ micro_batch_size: 2
256
+ eval_batch_size:
257
+ num_epochs: 4
258
+ warmup_steps: 100 # cannot use with warmup_ratio
259
+ warmup_ratio: 0.05 # cannot use with warmup_steps
260
+ learning_rate: 0.00003
261
+ lr_quadratic_warmup:
262
+ logging_steps:
263
+ eval_steps: # Leave empty to eval at each epoch, integers for every N steps. decimal for fraction of total steps
264
+ evals_per_epoch: # number of times per epoch to run evals, mutually exclusive with eval_steps
265
+ save_strategy: # Set to `no` to skip checkpoint saves
266
+ save_steps: # Leave empty to save at each epoch
267
+ saves_per_epoch: # number of times per epoch to save a checkpoint, mutually exclusive with save_steps
268
+ save_total_limit: # Checkpoints saved at a time
269
+ # Maximum number of iterations to train for. It precedes num_epochs which means that
270
+ # if both are set, num_epochs will not be guaranteed.
271
+ # e.g., when 1 epoch is 1000 steps => `num_epochs: 2` and `max_steps: 100` will train for 100 steps
272
+ max_steps:
273
+
274
+ eval_table_size: # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
275
+ eval_table_max_new_tokens: # Total number of tokens generated for predictions sent to wandb. Default is 128
276
+
277
+ loss_watchdog_threshold: # High loss value, indicating the learning has broken down (a good estimate is ~2 times the loss at the start of training)
278
+ loss_watchdog_patience: # Number of high-loss steps in a row before the trainer aborts (default: 3)
279
+
280
+ # Save model as safetensors (require safetensors package)
281
+ save_safetensors:
282
+
283
+ # Whether to mask out or include the human's prompt from the training labels
284
+ train_on_inputs: false
285
+ # Group similarly sized data to minimize padding.
286
+ # May be slower to start, as it must download and sort the entire dataset.
287
+ # Note that training loss may have an oscillating pattern with this enabled.
288
+ group_by_length: false
289
+
290
+ # Whether to use gradient checkpointing https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing
291
+ gradient_checkpointing: false
292
+ # additional kwargs to pass to the trainer for gradient checkpointing
293
+ # gradient_checkpointing_kwargs:
294
+ # use_reentrant: false
295
+
296
+ # Stop training after this many evaluation losses have increased in a row
297
+ # https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
298
+ early_stopping_patience: 3
299
+
300
+ # Specify a scheduler and kwargs to use with the optimizer
301
+ lr_scheduler: # 'one_cycle' | 'log_sweep' | empty for cosine
302
+ lr_scheduler_kwargs:
303
+ cosine_min_lr_ratio: # decay lr to some percentage of the peak lr, e.g. cosine_min_lr_ratio=0.1 for 10% of peak lr
304
+
305
+ # For one_cycle optim
306
+ lr_div_factor: # Learning rate div factor
307
+
308
+ # For log_sweep optim
309
+ log_sweep_min_lr:
310
+ log_sweep_max_lr:
311
+
312
+ # Specify optimizer
313
+ # Valid values are driven by the Transformers OptimizerNames class, see:
314
+ # https://github.com/huggingface/transformers/blob/95b374952dc27d8511541d6f5a4e22c9ec11fb24/src/transformers/training_args.py#L134
315
+ #
316
+ # Note that not all optimizers may be available in your environment, ex: 'adamw_anyprecision' is part of
317
+ # torchdistx, 'adamw_bnb_8bit' is part of bnb.optim.Adam8bit, etc. When in doubt, it is recommended to start with the optimizer used
318
+ # in the examples/ for your model and fine-tuning use case.
319
+ #
320
+ # Valid values for 'optimizer' include:
321
+ # - adamw_hf
322
+ # - adamw_torch
323
+ # - adamw_torch_fused
324
+ # - adamw_torch_xla
325
+ # - adamw_apex_fused
326
+ # - adafactor
327
+ # - adamw_anyprecision
328
+ # - sgd
329
+ # - adagrad
330
+ # - adamw_bnb_8bit
331
+ # - lion_8bit
332
+ # - lion_32bit
333
+ # - paged_adamw_32bit
334
+ # - paged_adamw_8bit
335
+ # - paged_lion_32bit
336
+ # - paged_lion_8bit
337
+ optimizer:
338
+ # Specify weight decay
339
+ weight_decay:
340
+ # adamw hyperparams
341
+ adam_beta1:
342
+ adam_beta2:
343
+ adam_epsilon:
344
+ # Gradient clipping max norm
345
+ max_grad_norm:
346
+
347
+ # Augmentation techniques
348
+ # NEFT https://arxiv.org/abs/2310.05914, set this to a number (paper default is 5) to add noise to embeddings
349
+ # currently only supported on Llama and Mistral
350
+ neftune_noise_alpha:
351
+
352
+ # Whether to bettertransformers
353
+ flash_optimum:
354
+ # Whether to use xformers attention patch https://github.com/facebookresearch/xformers:
355
+ xformers_attention:
356
+ # Whether to use flash attention patch https://github.com/Dao-AILab/flash-attention:
357
+ flash_attention:
358
+ flash_attn_cross_entropy: # Whether to use flash-attention cross entropy implementation - advanced use only
359
+ flash_attn_rms_norm: # Whether to use flash-attention rms norm implementation - advanced use only
360
+ flash_attn_fuse_qkv: # Whether to fuse QKV into a single operation
361
+ flash_attn_fuse_mlp: # Whether to fuse part of the MLP into a single operation
362
+ # Whether to use scaled-dot-product attention
363
+ # https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
364
+ sdp_attention:
365
+ # Shifted-sparse attention (only llama) - https://arxiv.org/pdf/2309.12307.pdf
366
+ s2_attention:
367
+ # Resume from a specific checkpoint dir
368
+ resume_from_checkpoint:
369
+ # If resume_from_checkpoint isn't set and you simply want it to start where it left off.
370
+ # Be careful with this being turned on between different models.
371
+ auto_resume_from_checkpoints: false
372
+
373
+ # Don't mess with this, it's here for accelerate and torchrun
374
+ local_rank:
375
+
376
+ # Add or change special tokens.
377
+ # If you add tokens here, you don't need to add them to the `tokens` list.
378
+ special_tokens:
379
+ # bos_token: "<s>"
380
+ # eos_token: "</s>"
381
+ # unk_token: "<unk>"
382
+
383
+ # Add extra tokens.
384
+ tokens:
385
+
386
+ # FSDP
387
+ fsdp:
388
+ fsdp_config:
389
+
390
+ # Deepspeed config path. e.g., deepspeed_configs/zero3.json
391
+ deepspeed:
392
+
393
+ # Advanced DDP Arguments
394
+ ddp_timeout:
395
+ ddp_bucket_cap_mb:
396
+ ddp_broadcast_buffers:
397
+
398
+ # Path to torch distx for optim 'adamw_anyprecision'
399
+ torchdistx_path:
400
+
401
+ # Set to HF dataset for type: 'completion' for streaming instead of pre-tokenize
402
+ pretraining_dataset:
403
+
404
+ # Debug mode
405
+ debug:
406
+
407
+ # Seed
408
+ seed:
409
+
410
+ # Allow overwrite yml config using from cli
411
+ strict:
src/axolotl_ui/app.py CHANGED
@@ -1,6 +1,6 @@
1
  from pathlib import Path
2
 
3
- from shiny import App, Inputs, Outputs, Session, ui
4
  import shinyswatch
5
  from htmltools import HTML
6
 
@@ -103,7 +103,11 @@ app_ui = ui.page_fillable(
103
 
104
 
105
  def server(input: Inputs, output: Outputs, session: Session):
106
- return ()
 
 
 
 
107
 
108
 
109
  app = App(
 
1
  from pathlib import Path
2
 
3
+ from shiny import App, Inputs, Outputs, Session, ui, reactive
4
  import shinyswatch
5
  from htmltools import HTML
6
 
 
103
 
104
 
105
  def server(input: Inputs, output: Outputs, session: Session):
106
+ @reactive.Effect
107
+ @reactive.event(input.create_space)
108
+ def _():
109
+ ui.notification_show("This is not yet implemented.", type="warning")
110
+
111
 
112
 
113
  app = App(