Papers
arxiv:2108.07344

IsoScore: Measuring the Uniformity of Embedding Space Utilization

Published on Aug 16, 2021
Authors:
,
,

Abstract

The recent success of distributed word representations has led to an increased interest in analyzing the properties of their spatial distribution. Several studies have suggested that contextualized word embedding models do not isotropically project tokens into vector space. However, current methods designed to measure isotropy, such as average random cosine similarity and the partition score, have not been thoroughly analyzed and are not appropriate for measuring isotropy. We propose <PRE_TAG>IsoScore</POST_TAG>: a novel tool that quantifies the degree to which a point cloud uniformly utilizes the ambient <PRE_TAG>vector space</POST_TAG>. Using rigorously designed tests, we demonstrate that <PRE_TAG>IsoScore</POST_TAG> is the only tool available in the literature that accurately measures how uniformly distributed variance is across dimensions in vector space. Additionally, we use <PRE_TAG>IsoScore</POST_TAG> to challenge a number of recent conclusions in the NLP literature that have been derived using brittle metrics of isotropy. We caution future studies from using existing tools to measure isotropy in contextualized embedding space as resulting conclusions will be misleading or altogether inaccurate.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2108.07344 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2108.07344 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2108.07344 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.