Babel / Optimus /code /README.md
rynmurdock's picture
init
c5ca37a

A newer version of the Gradio SDK is available: 5.22.0

Upgrade

Set up Environment

Pull docker from Docker Hub at: chunyl/pytorch-transformers:v2

Edit the project path to the absolute path on your computer by changing the "SCRIPTPATH" in run_docker.sh

In this directory ("code"), and run docker

sh scripts/scripts_docker/run_docker.sh

Fine-tune Language Models

sh scripts/scripts_local/run_ft_lm_vae_optimus.sh

The main training script is run_lm_vae_training.py and conducts the fine-tuning loop, taking the following options (among others) as arguments:

  • --checkpoint_dir: the folder that the pre-trained Optimus is saved.
  • --gloabl_step_eval: it specifies the checkpoint (the steps that Optimus is trained).
  • --train_data_file and --eval_data_file: the path for training and testing datasets for the downstream fine-tuning.
  • --dataset: the dataset for fine-tuning. such as Penn
  • --num_train_epochs: number of training epochs (type=int); default 1.
  • --dim_target_kl: the hyper-paramter used in dimension-wise thresholding used in fine-tuning(type=float); default 0.5.
  • --beta: the maximum beta value used in cyclical annealing schedule used in fine-tuning(type=float); default 1.0.
  • --ratio_zero: the proportion of beta=0 in one period for fine-tuning(type=float); default 0.5
  • --ratio_increase: the proportion of beta that increases from 0 to the maximum value in one period in cyclical annealing schedule used in fine-tuning(type=float); default 0.25.

For more options, please see run_lm_vae_training.py and see the examples we provided in run_ft_lm_vae_optimus.sh, or more running scripts we used to run the code on a cluster.

Play with the latent space

sh scripts/scripts_local/eval_optimus_latent_space.sh

The main training script is run_latent_generation.py and evaluates the various ways to generate text conditioned on latent vectors, taking the following options (among others) as arguments:

  • --play_mode: The current scripts supports two ways to play with the pre-trained VAE models: [reconstrction, interpolation]