ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
Abstract
Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture reading orders of given word-level <PRE_TAG>node representations</POST_TAG> in a graph. We propose <PRE_TAG>Reading Order Equivariant Positional Encoding (ROPE)</POST_TAG>, a new <PRE_TAG>positional encoding</POST_TAG> technique designed to apprehend the sequential presentation of words in documents. ROPE generates unique reading order codes for neighboring words relative to the target word given a <PRE_TAG>word-level graph</POST_TAG> connectivity. We study two fundamental document entity extraction tasks including <PRE_TAG>word labeling</POST_TAG> and <PRE_TAG>word grouping</POST_TAG> on the public <PRE_TAG>FUNSD dataset</POST_TAG> and a large-scale payment dataset. We show that ROPE consistently improves existing GCNs with a margin up to 8.4% <PRE_TAG>F1-score</POST_TAG>.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper