HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment
This repo contains the model checkpoints of our ICML 2025 paper: Hierarchical Graph Tokenization for Molecule-Language Alignment, which has also been presented at ICML 2024 workshop on Foundation Models in the Wild. πππ
File Structures
The pretrained Hierarchical VQ-VAE model is stored in hivqvae.pth
.
The checkpoints of graph-language models based on llama2-7b-chat and vicuna-v1-3-7b are contained in /llama2
and /vicuna
, respectively.
Inside each directory, the remaining checkpoints are organized as (using vicuna as an example):
llava-hvqvae2-vicuna-v1-3-7b-pretrain
: model after stage 1 pretraining;graph-text-molgen
: models finetuned using Mol-Instruction data under different tasks, e.g., forward reaction prediction;molcap-llava-hvqvae2-vicuna-v1-3-7b-finetune_lora-50ep
: model fintuned using CHEBI-20 dataset for molecular captioning;MoleculeNet-llava-hvqvae2-vicuna-v1-3-7b-finetune_lora-large*
: models finetuned via different classification-based molecular property prediction tasks;
Citation
If you find our model, paper and repo useful, please cite our paper:
@inproceedings{chen2025hierarchical,
title={Hierarchical Graph Tokenization for Molecule-Language Alignment},
author={Yongqiang Chen and Quanming Yao and Juzheng Zhang and James Cheng and Yatao Bian},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=wpbNczwAwV}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support