HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment

Paper Github License

This repo contains the model checkpoints of our ICML 2025 paper: Hierarchical Graph Tokenization for Molecule-Language Alignment, which has also been presented at ICML 2024 workshop on Foundation Models in the Wild. πŸ˜†πŸ˜†πŸ˜†

File Structures

The pretrained Hierarchical VQ-VAE model is stored in hivqvae.pth. The checkpoints of graph-language models based on llama2-7b-chat and vicuna-v1-3-7b are contained in /llama2 and /vicuna, respectively. Inside each directory, the remaining checkpoints are organized as (using vicuna as an example):

  • llava-hvqvae2-vicuna-v1-3-7b-pretrain: model after stage 1 pretraining;
  • graph-text-molgen: models finetuned using Mol-Instruction data under different tasks, e.g., forward reaction prediction;
  • molcap-llava-hvqvae2-vicuna-v1-3-7b-finetune_lora-50ep: model fintuned using CHEBI-20 dataset for molecular captioning;
  • MoleculeNet-llava-hvqvae2-vicuna-v1-3-7b-finetune_lora-large*: models finetuned via different classification-based molecular property prediction tasks;

Citation

If you find our model, paper and repo useful, please cite our paper:

@inproceedings{chen2025hierarchical,
title={Hierarchical Graph Tokenization for Molecule-Language Alignment},
author={Yongqiang Chen and Quanming Yao and Juzheng Zhang and James Cheng and Yatao Bian},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=wpbNczwAwV}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support