Papers
arxiv:2402.02164

Hierarchical Structure Enhances the Convergence and Generalizability of Linear Molecular Representation

Published on Feb 3, 2024
Authors:
,
,
,
,

Abstract

Language models demonstrate fundamental abilities in syntax, semantics, and reasoning, though their performance often depends significantly on the inputs they process. This study introduces TSIS (Simplified TSID) and its variants:TSISD (TSIS with Depth-First Search), TSISO (TSIS in Order), and TSISR (TSIS in Random), as integral components of the t-SMILES framework. These additions complete the framework's design, providing diverse approaches to molecular representation. Through comprehensive analysis and experiments employing deep generative models, including GPT, diffusion models, and reinforcement learning, the findings reveal that the hierarchical structure of t-SMILES is more straightforward to parse than initially anticipated. Furthermore, t-SMILES consistently outperforms other linear representations such as SMILES, SELFIES, and SAFE, demonstrating superior convergence speed and enhanced generalization capabilities.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.02164 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.02164 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.02164 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.