Reference: `textgraphs` package

API by Adnen Kadri from the Noun Project

Package definitions for the `TextGraphs` library.

see copyright/license https://huggingface.co/spaces/DerwenAI/textgraphs/blob/main/README.md

`TextGraphs` class

Construct a lemma graph from the unstructured text source, then extract ranked phrases using a `textgraph` algorithm.

`infer_relations_async` method

infer_relations_async(pipe, debug=False)

Gather triples representing inferred relations and build edges, concurrently by running an async queue. https://stackoverflow.com/questions/52582685/using-asyncio-queue-for-producer-consumer-flow

Make sure to call beforehand: TextGraphs.collect_graph_elements()

pipe : textgraphs.pipe.Pipeline
configured pipeline for this document
debug : bool
debugging flag
returns : typing.List[textgraphs.elem.Edge]
a list of the inferred Edge objects

`init` method

[source]

__init__(factory=None, iri_base="https://github.com/DerwenAI/textgraphs/ns/")

Constructor.

factory : typing.Optional[textgraphs.pipe.PipelineFactory]
optional PipelineFactory used to configure components

`create_pipeline` method

[source]

create_pipeline(text_input)

Use the pipeline factory to create a pipeline (e.g., spaCy.Document) for each text input, which are typically paragraph-length.

text_input : str
raw text to be parsed by this pipeline
returns : textgraphs.pipe.Pipeline
a configured pipeline

`create_render` method

[source]

create_render()

Create an object for rendering the graph in PyVis HTML+JavaScript.

returns : textgraphs.vis.RenderPyVis
a configured RenderPyVis object for generating graph visualizations

`collect_graph_elements` method

[source]

collect_graph_elements(pipe, text_id=0, para_id=0, debug=False)

Collect the elements of a lemma graph from the results of running the textgraph algorithm. These elements include: parse dependencies, lemmas, entities, and noun chunks.

Make sure to call beforehand: TextGraphs.create_pipeline()

pipe : textgraphs.pipe.Pipeline
configured pipeline for this document
text_id : int
text (top-level document) identifier
para_id : int
paragraph identitifer
debug : bool
debugging flag

`construct_lemma_graph` method

[source]

construct_lemma_graph(debug=False)

Construct the base level of the lemma graph from the collected elements. This gets represented in NetworkX as a directed graph with parallel edges.

Make sure to call beforehand: TextGraphs.collect_graph_elements()

debug : bool
debugging flag

`perform_entity_linking` method

[source]

perform_entity_linking(pipe, debug=False)

Perform entity linking based on the KnowledgeGraph object.

Make sure to call beforehand: TextGraphs.collect_graph_elements()

pipe : textgraphs.pipe.Pipeline
configured pipeline for this document
debug : bool
debugging flag

`infer_relations` method

[source]

infer_relations(pipe, debug=False)

Gather triples representing inferred relations and build edges.

Make sure to call beforehand: TextGraphs.collect_graph_elements()

pipe : textgraphs.pipe.Pipeline
configured pipeline for this document
debug : bool
debugging flag
returns : typing.List[textgraphs.elem.Edge]
a list of the inferred Edge objects

`calc_phrase_ranks` method

[source]

calc_phrase_ranks(pr_alpha=0.85, debug=False)

Calculate the weights for each node in the lemma graph, then stack-rank the nodes so that entities have priority over lemmas.

Phrase ranks are normalized to sum to 1.0 and these now represent the ranked entities extracted from the document.

Make sure to call beforehand: TextGraphs.construct_lemma_graph()

pr_alpha : float
optional alpha parameter for the PageRank algorithm
debug : bool
debugging flag

`get_phrases` method

[source]

get_phrases()

Return the entities extracted from the document.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

yields :
extracted entities

`get_phrases_as_df` method

[source]

get_phrases_as_df()

Return the ranked extracted entities as a dataframe.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

returns : pandas.core.frame.DataFrame
a pandas.DataFrame of the extracted entities

`export_rdf` method

[source]

export_rdf(lang="en")

Extract the entities and relations which have IRIs as RDF triples.

lang : str
language identifier
returns : str
RDF triples N3 (Turtle) format as a string

`denormalize_iri` method

[source]

denormalize_iri(uri_ref)

Discern between a parsed entity and a linked entity.

returns : str
lemma_key for a parsed entity, the full IRI for a linked entity

`load_bootstrap_ttl` method

[source]

load_bootstrap_ttl(ttl_str, debug=False)

Parse a TTL string with an RDF semantic graph representation to load bootstrap definitions for the lemma graph prior to parsing, e.g., for synonyms.

ttl_str : str
RDF triples in TTL (Turtle/N3) format
debug : bool
debugging flag

`export_kuzu` method

[source]

export_kuzu(zip_name="lemma.zip", debug=False)

Export a labeled property graph for KùzuDB (openCypher).

debug : bool
debugging flag
returns : str
name of the generated ZIP file

`SimpleGraph` class

An in-memory graph used to build a `MultiDiGraph` in NetworkX.

`init` method

[source]

__init__()

Constructor.

`reset` method

[source]

reset()

Re-initialize the data structures, resetting all but the configuration.

`make_node` method

[source]

make_node(tokens, key, span, kind, text_id, para_id, sent_id, label=None, length=1, linked=True)

Lookup and return a Node object. By default, link matching keys into the same node. Otherwise instantiate a new node if it does not exist already.

tokens : typing.List[textgraphs.elem.Node]
list of parsed tokens
key : str
lemma key (invariant)
span : spacy.tokens.token.Token
token span for the parsed entity
kind : <enum 'NodeEnum'>
the kind of this Node object
text_id : int
text (top-level document) identifier
para_id : int
paragraph identitifer
sent_id : int
sentence identifier
label : typing.Optional[str]
node label (for a new object)
length : int
length of token span
linked : bool
flag for whether this links to an entity
returns : textgraphs.elem.Node
the constructed Node object

`make_edge` method

[source]

make_edge(src_node, dst_node, kind, rel, prob, key=None, debug=False)

Lookup an edge, creating a new one if it does not exist already, and increment the count if it does.

src_node : textgraphs.elem.Node
source node in the triple
dst_node : textgraphs.elem.Node
destination node in the triple
kind : <enum 'RelEnum'>
the kind of this Edge object
rel : str
relation label
prob : float
probability of this Edge within the graph
key : typing.Optional[str]
lemma key (invariant); generate a key if this is not provided
debug : bool
debugging flag
returns : typing.Optional[textgraphs.elem.Edge]
the constructed Edge object; this may be None if the input parameters indicate skipping the edge

`dump_lemma_graph` method

[source]

dump_lemma_graph()

Dump the lemma graph as a JSON string in node-link format, suitable for serialization and subsequent use in JavaScript, Neo4j, Graphistry, etc.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

returns : str
a JSON representation of the exported lemma graph in

`load_lemma_graph` method

[source]

load_lemma_graph(json_str, debug=False)

Load from a JSON string in a JSON representation of the exported lemma graph in node-link format

debug : bool
debugging flag

`Node` class

A data class representing one node, i.e., an extracted phrase.

`repr` method

[source]

__repr__()

`get_linked_label` method

[source]

get_linked_label()

When this node has a linked entity, return that IRI. Otherwise return its label value.

returns : typing.Optional[str]
a label for the linked entity

`get_name` method

[source]

get_name()

Return a brief name for the graphical depiction of this Node.

returns : str
brief label to be used in a graph

`get_stacked_count` method

[source]

get_stacked_count()

Return a modified count, to redact verbs and linked entities from the stack-rank partitions.

returns : int
count, used for re-ranking extracted entities

`get_pos` method

[source]

get_pos()

Generate a position span for OpenNRE.

returns : typing.Tuple[int, int]
a position span needed for OpenNRE relation extraction

`Edge` class

A data class representing an edge between two nodes.

`repr` method

[source]

__repr__()

`EnumBase` class

A mixin for Enum codecs.

`NodeEnum` class

Enumeration for the kinds of node categories

`RelEnum` class

Enumeration for the kinds of edge relations

`PipelineFactory` class

Factory pattern for building a pipeline, which is one of the more expensive operations with `spaCy`

`init` method

[source]

__init__(spacy_model="en_core_web_sm", ner=None, kg=<textgraphs.pipe.KnowledgeGraph object at 0x130529960>, infer_rels=[])

Constructor which instantiates the spaCy pipelines:

tok_pipe -- regular generator for parsed tokens
ner_pipe -- with entities merged
aux_pipe -- spotlight entity linking

which will be needed for parsing and entity linking.

spacy_model : str
the specific model to use in spaCy pipelines
ner : typing.Optional[textgraphs.pipe.Component]
optional custom NER component
kg : textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linking
infer_rels : typing.List[textgraphs.pipe.InferRel]
a list of components for inferring relations

`create_pipeline` method

[source]

create_pipeline(text_input)

Instantiate the document pipelines needed to parse the input text.

text_input : str
raw text to be parsed
returns : textgraphs.pipe.Pipeline
a configured Pipeline object

`Pipeline` class

Manage parsing of a document, which is assumed to be paragraph-sized.

`init` method

[source]

__init__(text_input, tok_pipe, ner_pipe, aux_pipe, kg, infer_rels)

Constructor.

text_input : str
raw text to be parsed
tok_pipe : spacy.language.Language
the spaCy.Language pipeline used for tallying individual tokens
ner_pipe : spacy.language.Language
the spaCy.Language pipeline used for tallying named entities
aux_pipe : spacy.language.Language
the spaCy.Language pipeline used for auxiliary components (e.g., DBPedia Spotlight)
kg : textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linking
infer_rels : typing.List[textgraphs.pipe.InferRel]
a list of components for inferring relations

`get_lemma_key` classmethod

[source]

get_lemma_key(span, placeholder=False)

Compose a unique, invariant lemma key for the given span.

span : typing.Union[spacy.tokens.span.Span, spacy.tokens.token.Token]
span of tokens within the lemma
placeholder : bool
flag for whether to create a placeholder
returns : str
a composed lemma key

`get_ent_lemma_keys` method

[source]

get_ent_lemma_keys()

Iterate through the fully qualified lemma keys for an extracted entity.

yields :
the lemma keys within an extracted entity

`link_noun_chunks` method

[source]

link_noun_chunks(nodes, debug=False)

Link any noun chunks which are not already subsumed by named entities.

nodes : dict
dictionary of Node objects in the graph
debug : bool
debugging flag
returns : typing.List[textgraphs.elem.NounChunk]
a list of identified noun chunks which are novel

`iter_entity_pairs` method

[source]

iter_entity_pairs(pipe_graph, max_skip, debug=True)

Iterator for entity pairs for which the algorithm infers relations.

pipe_graph : networkx.classes.multigraph.MultiGraph
a networkx.MultiGraph representation of the graph, reused for graph algorithms
max_skip : int
maximum distance between entities for inferred relations
debug : bool
debugging flag
yields :
pairs of entities within a range, e.g., to use for relation extraction

`Component` class

Abstract base class for a `spaCy` pipeline component.

`augment_pipe` method

[source]

augment_pipe(factory)

Encapsulate a spaCy call to add_pipe() configuration.

factory : PipelineFactory
a PipelineFactory used to configure components

`NERSpanMarker` class

Configures a `spaCy` pipeline component for `SpanMarkerNER`

`init` method

[source]

__init__(ner_model="tomaarsen/span-marker-roberta-large-ontonotes5")

Constructor.

ner_model : str
model to be used in SpanMarker

`augment_pipe` method

[source]

augment_pipe(factory)

Encapsulate a spaCy call to add_pipe() configuration.

factory : textgraphs.pipe.PipelineFactory
the PipelineFactory used to configure this pipeline component

`NounChunk` class

A data class representing one noun chunk, i.e., a candidate as an extracted phrase.

`repr` method

[source]

__repr__()

`KnowledgeGraph` class

Base class for a knowledge graph interface.

`augment_pipe` method

[source]

augment_pipe(factory)

Encapsulate a spaCy call to add_pipe() configuration.

factory : PipelineFactory
a PipelineFactory used to configure components

`remap_ner` method

[source]

remap_ner(label)

Remap the OntoTypes4 values from NER output to more general-purpose IRIs.

label : typing.Optional[str]
input NER label, an OntoTypes4 value
returns : typing.Optional[str]
an IRI for the named entity

`normalize_prefix` method

[source]

normalize_prefix(iri, debug=False)

Normalize the given IRI to use standard namespace prefixes.

iri : str
input IRI, in fully-qualified domain representation
debug : bool
debugging flag
returns : str
the compact IRI representation, using an RDF namespace prefix

`perform_entity_linking` method

[source]

perform_entity_linking(graph, pipe, debug=False)

Perform entity linking based on "spotlight" and other services.

graph : textgraphs.graph.SimpleGraph
source graph
pipe : Pipeline
configured pipeline for the current document
debug : bool
debugging flag

`resolve_rel_iri` method

[source]

resolve_rel_iri(rel, lang="en", debug=False)

Resolve a rel string from a relation extraction model which has been trained on this knowledge graph.

rel : str
relation label, generation these source from Wikidata for many RE projects
lang : str
language identifier
debug : bool
debugging flag
returns : typing.Optional[str]
a resolved IRI

`KGSearchHit` class

A data class representing a hit from a knowledge graph search.

`repr` method

[source]

__repr__()

`KGWikiMedia` class

Manage access to WikiMedia-related APIs.

`init` method

[source]

__init__(spotlight_api="https://api.dbpedia-spotlight.org/en", dbpedia_search_api="https://lookup.dbpedia.org/api/search", dbpedia_sparql_api="https://dbpedia.org/sparql", wikidata_api="https://www.wikidata.org/w/api.php", ner_map=OrderedDict([('CARDINAL', {'iri': 'http://dbpedia.org/resource/Cardinal_number', 'definition': 'Numerals that do not fall under another type', 'label': 'cardinal number'}), ('DATE', {'iri': 'http://dbpedia.org/ontology/date', 'definition': 'Absolute or relative dates or periods', 'label': 'date'}), ('EVENT', {'iri': 'http://dbpedia.org/ontology/Event', 'definition': 'Named hurricanes, battles, wars, sports events, etc.', 'label': 'event'}), ('FAC', {'iri': 'http://dbpedia.org/ontology/Infrastructure', 'definition': 'Buildings, airports, highways, bridges, etc.', 'label': 'infrastructure'}), ('GPE', {'iri': 'http://dbpedia.org/ontology/Country', 'definition': 'Countries, cities, states', 'label': 'country'}), ('LANGUAGE', {'iri': 'http://dbpedia.org/ontology/Language', 'definition': 'Any named language', 'label': 'language'}), ('LAW', {'iri': 'http://dbpedia.org/ontology/Law', 'definition': 'Named documents made into laws', 'label': 'law'}), ('LOC', {'iri': 'http://dbpedia.org/ontology/Place', 'definition': 'Non-GPE locations, mountain ranges, bodies of water', 'label': 'place'}), ('MONEY', {'iri': 'http://dbpedia.org/resource/Money', 'definition': 'Monetary values, including unit', 'label': 'money'}), ('NORP', {'iri': 'http://dbpedia.org/ontology/nationality', 'definition': 'Nationalities or religious or political groups', 'label': 'nationality'}), ('ORDINAL', {'iri': 'http://dbpedia.org/resource/Ordinal_number', 'definition': 'Ordinal number, i.e., first, second, etc.', 'label': 'ordinal number'}), ('ORG', {'iri': 'http://dbpedia.org/ontology/Organisation', 'definition': 'Companies, agencies, institutions, etc.', 'label': 'organization'}), ('PERCENT', {'iri': 'http://dbpedia.org/resource/Percentage', 'definition': 'Percentage', 'label': 'percentage'}), ('PERSON', {'iri': 'http://dbpedia.org/ontology/Person', 'definition': 'People, including fictional', 'label': 'person'}), ('PRODUCT', {'iri': 'http://dbpedia.org/ontology/product', 'definition': 'Vehicles, weapons, foods, etc. (Not services)', 'label': 'product'}), ('QUANTITY', {'iri': 'http://dbpedia.org/resource/Quantity', 'definition': 'Measurements, as of weight or distance', 'label': 'quantity'}), ('TIME', {'iri': 'http://dbpedia.org/ontology/time', 'definition': 'Times smaller than a day', 'label': 'time'}), ('WORK OF ART', {'iri': 'http://dbpedia.org/resource/Work_of_art', 'definition': 'Titles of books, songs, etc.', 'label': 'work of art'})]), ns_prefix=OrderedDict([('dbc', 'http://dbpedia.org/resource/Category:'), ('dbt', 'http://dbpedia.org/resource/Template:'), ('dbr', 'http://dbpedia.org/resource/'), ('yago', 'http://dbpedia.org/class/yago/'), ('dbd', 'http://dbpedia.org/datatype/'), ('dbo', 'http://dbpedia.org/ontology/'), ('dbp', 'http://dbpedia.org/property/'), ('units', 'http://dbpedia.org/units/'), ('dbpedia-commons', 'http://commons.dbpedia.org/resource/'), ('dbpedia-wikicompany', 'http://dbpedia.openlinksw.com/wikicompany/'), ('dbpedia-wikidata', 'http://wikidata.dbpedia.org/resource/'), ('wd', 'http://www.wikidata.org/'), ('wd_ent', 'http://www.wikidata.org/entity/'), ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'), ('schema', 'https://schema.org/'), ('owl', 'http://www.w3.org/2002/07/owl#')]), min_alias=0.8, min_similarity=0.9)

Constructor.

spotlight_api : str
DBPedia Spotlight API or equivalent local service
dbpedia_search_api : str
DBPedia Search API or equivalent local service
dbpedia_sparql_api : str
DBPedia SPARQL API or equivalent local service
wikidata_api : str
Wikidata Search API or equivalent local service
ner_map : dict
named entity map for standardizing IRIs
ns_prefix : dict
RDF namespace prefixes
min_alias : float
minimum alias probability threshold for accepting linked entities
min_similarity : float
minimum label similarity threshold for accepting linked entities

`augment_pipe` method

[source]

augment_pipe(factory)

Encapsulate a spaCy call to add_pipe() configuration.

factory : textgraphs.pipe.PipelineFactory
a PipelineFactory used to configure components

`remap_ner` method

[source]

remap_ner(label)

Remap the OntoTypes4 values from NER output to more general-purpose IRIs.

label : typing.Optional[str]
input NER label, an OntoTypes4 value
returns : typing.Optional[str]
an IRI for the named entity

`normalize_prefix` method

[source]

normalize_prefix(iri, debug=False)

Normalize the given IRI using the standard DBPedia namespace prefixes.

iri : str
input IRI, in fully-qualified domain representation
debug : bool
debugging flag
returns : str
the compact IRI representation, using an RDF namespace prefix

`perform_entity_linking` method

[source]

perform_entity_linking(graph, pipe, debug=False)

Perform entity linking based on DBPedia Spotlight and other services.

graph : textgraphs.graph.SimpleGraph
source graph
pipe : textgraphs.pipe.Pipeline
configured pipeline for the current document
debug : bool
debugging flag

`resolve_rel_iri` method

[source]

resolve_rel_iri(rel, lang="en", debug=False)

Resolve a rel string from a relation extraction model which has been trained on this knowledge graph, which defaults to using the WikiMedia graphs.

rel : str
relation label, generation these source from Wikidata for many RE projects
lang : str
language identifier
debug : bool
debugging flag
returns : typing.Optional[str]
a resolved IRI

`wikidata_search` method

[source]

wikidata_search(query, lang="en", debug=False)

Query the Wikidata search API.

query : str
query string
lang : str
language identifier
debug : bool
debugging flag
returns : typing.Optional[textgraphs.elem.KGSearchHit]
search hit, if any

`dbpedia_search_entity` method

[source]

dbpedia_search_entity(query, lang="en", debug=False)

Perform a DBPedia API search.

query : str
query string
lang : str
language identifier
debug : bool
debugging flag
returns : typing.Optional[textgraphs.elem.KGSearchHit]
search hit, if any

`dbpedia_sparql_query` method

[source]

dbpedia_sparql_query(sparql, debug=False)

Perform a SPARQL query on DBPedia.

sparql : str
SPARQL query string
debug : bool
debugging flag
returns : dict
dictionary of query results

`dbpedia_wikidata_equiv` method

[source]

dbpedia_wikidata_equiv(dbpedia_iri, debug=False)

Perform a SPARQL query on DBPedia to find an equivalent Wikidata entity.

dbpedia_iri : str
IRI in DBpedia
debug : bool
debugging flag
returns : typing.Optional[str]
equivalent IRI in Wikidata

`LinkedEntity` class

A data class representing one linked entity.

`repr` method

[source]

__repr__()

`InferRel` class

Abstract base class for a relation extraction model wrapper.

`gen_triples_async` method

[source]

gen_triples_async(pipe, queue, debug=False)

Infer relations as triples produced to a queue concurrently.

pipe : Pipeline
configured pipeline for the current document
queue : asyncio.queues.Queue
queue of inference tasks to be performed
debug : bool
debugging flag

`gen_triples` method

[source]

gen_triples(pipe, debug=False)

Infer relations as triples through a generator iteratively.

pipe : Pipeline
configured pipeline for the current document
debug : bool
debugging flag
yields :
generated triples

`InferRel_OpenNRE` class

Perform relation extraction based on the `OpenNRE` model. https://github.com/thunlp/OpenNRE

`init` method

[source]

__init__(model="wiki80_cnn_softmax", max_skip=11, min_prob=0.9)

Constructor.

model : str
the specific model to be used in OpenNRE
max_skip : int
maximum distance between entities for inferred relations
min_prob : float
minimum probability threshold for accepting an inferred relation

`gen_triples` method

[source]

gen_triples(pipe, debug=False)

Iterate on entity pairs to drive OpenNRE, inferring relations represented as triples which get produced by a generator.

pipe : textgraphs.pipe.Pipeline
configured pipeline for the current document
debug : bool
debugging flag
yields :
generated triples as candidates for inferred relations

`InferRel_Rebel` class

Perform relation extraction based on the `REBEL` model. https://github.com/Babelscape/rebel https://huggingface.co/spaces/Babelscape/mrebel-demo

`init` method

[source]

__init__(lang="en_XX", mrebel_model="Babelscape/mrebel-large")

Constructor.

lang : str
language identifier
mrebel_model : str
tokenizer model to be used

`tokenize_sent` method

[source]

tokenize_sent(text)

Apply the tokenizer manually, since we need to extract special tokens.

text : str
input text for the sentence to be tokenized
returns : str
extracted tokens

`extract_triplets_typed` method

[source]

extract_triplets_typed(text)

Parse the generated text and extract its triplets.

text : str
input text for the sentence to use in inference
returns : list
a list of extracted triples

`gen_triples` method

[source]

gen_triples(pipe, debug=False)

Drive REBEL to infer relations for each sentence, represented as triples which get produced by a generator.

pipe : textgraphs.pipe.Pipeline
configured pipeline for the current document
debug : bool
debugging flag
yields :
generated triples as candidates for inferred relations

`RenderPyVis` class

Render the lemma graph as a `PyVis` network.

`init` method

[source]

__init__(graph, kg)

Constructor.

graph : textgraphs.graph.SimpleGraph
source graph to be visualized
kg : textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linking

`render_lemma_graph` method

[source]

render_lemma_graph(debug=True)

Prepare the structure of the NetworkX graph to use for building and returning a PyVis network to render.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

debug : bool
debugging flag
returns : pyvis.network.Network
<a pyvis.network.Network interactive visualization

`draw_communities` method

[source]

draw_communities(spring_distance=1.4, debug=False)

Cluster the communities in the lemma graph, then draw a NetworkX graph of the notes with a specific color for each community.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

spring_distance : float
NetworkX parameter used to separate clusters visually
debug : bool
debugging flag
returns : typing.Dict[int, int]
a map of the calculated communities

`generate_wordcloud` method

[source]

generate_wordcloud(background="black")

Generate a tag cloud from the given phrases.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

background : str
background color for the rendering
returns : wordcloud.wordcloud.WordCloud
the rendering as a wordcloud.WordCloud object, which can be used to generate PNG images, etc.

`NodeStyle` class

Dataclass used for styling PyVis nodes.

`setattr` method

[source]

__setattr__(name, value)

`GraphOfRelations` class

Attempt to reproduce results published in "INGRAM: Inductive Knowledge Graph Embedding via Relation Graphs" https://arxiv.org/abs/2305.19987

`init` method

[source]

__init__(source)

Constructor.

source : textgraphs.graph.SimpleGraph
source graph to be transformed

`load_ingram` method

[source]

load_ingram(json_file, debug=False)

Load data for a source graph, as illustrated in lee2023ingram

json_file : pathlib.Path
path for the JSON dataset to load
debug : bool
debugging flag

`seeds` method

[source]

seeds(debug=False)

Prep data for the topological transform illustrated in lee2023ingram

debug : bool
debugging flag

`trace_source_graph` method

[source]

trace_source_graph()

Output a "seed" representation of the source graph.

`construct_gor` method

[source]

construct_gor(debug=False)

Perform the topological transform described by lee2023ingram, constructing a graph of relations (GOR) and calculating affinity scores between entities in the GOR based on their definitions:

we measure the affinity between two relations by considering how many entities are shared between them and how frequently they share the same entity

debug : bool
debugging flag

`tally_frequencies` classmethod

[source]

tally_frequencies(counter)

Tally the frequency of shared entities.

counter : collections.Counter
counter data collection for the rel_b/entity pairs
returns : int
tallied values for one relation

`get_affinity_scores` method

[source]

get_affinity_scores(debug=False)

Reproduce metrics based on the example published in lee2023ingram

debug : bool
debugging flag
returns : typing.Dict[tuple, float]
the calculated affinity scores

`trace_metrics` method

[source]

trace_metrics(scores)

Compare the calculated affinity scores with results from a published example.

scores : typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)
returns : pandas.core.frame.DataFrame
a pandas.DataFrame where the rows compare expected vs. observed affinity scores

`render_gor_plt` method

[source]

render_gor_plt(scores)

Visualize the graph of relations using matplotlib

scores : typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)

`render_gor_pyvis` method

[source]

render_gor_pyvis(scores)

Visualize the graph of relations interactively using PyVis

scores : typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)
returns : pyvis.network.Network
a pyvis.networkNetwork representation of the transformed graph

`TransArc` class

A data class representing one transformed rel-node-rel triple in a graph of relations.

`repr` method

[source]

__repr__()

`RelDir` class

Enumeration for the directions of a relation.

`SheafSeed` class

A data class representing a node from the source graph plus its partial edge, based on a Sheaf Theory decomposition of a graph.

`repr` method

[source]

__repr__()

`Affinity` class

A data class representing the affinity scores from one entity in the transformed graph of relations.

NB: there are much more efficient ways to calculate these affinity scores using sparse tensor algebra; this approach illustrates the process -- for research and debugging.

`repr` method

[source]

__repr__()

module functions

`calc_quantile_bins` function

[source]

calc_quantile_bins(num_rows)

Calculate the bins to use for a quantile stripe, using numpy.linspace

num_rows : int
number of rows in the target dataframe
returns : numpy.ndarray
calculated bins, as a numpy.ndarray

`get_repo_version` function

[source]

get_repo_version()

Access the Git repository information and return items to identify the version/commit running in production.

returns : typing.Tuple[str, str]
version tag and commit hash

`root_mean_square` function

[source]

root_mean_square(values)

Calculate the root mean square of the values in the given list.

values : typing.List[float]
list of values to use in the RMS calculation
returns : float
RMS metric as a float

`stripe_column` function

[source]

stripe_column(values, bins)

Stripe a column in a dataframe, by interpolating quantiles into a set of discrete indexes.

values : list
list of values to stripe
bins : int
quantile bins; see calc_quantile_bins()
returns : numpy.ndarray
the striped column values, as a numpy.ndarray

Reference: textgraphs package

Construct a lemma graph from the unstructured text source, then extract ranked phrases using a textgraph algorithm.

An in-memory graph used to build a MultiDiGraph in NetworkX.

A data class representing one node, i.e., an extracted phrase.

A data class representing an edge between two nodes.

Factory pattern for building a pipeline, which is one of the more expensive operations with spaCy

Manage parsing of a document, which is assumed to be paragraph-sized.

Abstract base class for a spaCy pipeline component.

Configures a spaCy pipeline component for SpanMarkerNER

A data class representing one noun chunk, i.e., a candidate as an extracted phrase.

Base class for a knowledge graph interface.

A data class representing a hit from a knowledge graph search.

Manage access to WikiMedia-related APIs.

A data class representing one linked entity.

Abstract base class for a relation extraction model wrapper.

Perform relation extraction based on the OpenNRE model. https://github.com/thunlp/OpenNRE

Perform relation extraction based on the REBEL model. https://github.com/Babelscape/rebel https://huggingface.co/spaces/Babelscape/mrebel-demo

Render the lemma graph as a PyVis network.

Dataclass used for styling PyVis nodes.

Attempt to reproduce results published in "INGRAM: Inductive Knowledge Graph Embedding via Relation Graphs" https://arxiv.org/abs/2305.19987

A data class representing one transformed rel-node-rel triple in a graph of relations.

A data class representing a node from the source graph plus its partial edge, based on a Sheaf Theory decomposition of a graph.

NB: there are much more efficient ways to calculate these affinity scores using sparse tensor algebra; this approach illustrates the process -- for research and debugging.

Reference: `textgraphs` package

Construct a lemma graph from the unstructured text source, then extract ranked phrases using a `textgraph` algorithm.

An in-memory graph used to build a `MultiDiGraph` in NetworkX.

Factory pattern for building a pipeline, which is one of the more expensive operations with `spaCy`

Abstract base class for a `spaCy` pipeline component.

Configures a `spaCy` pipeline component for `SpanMarkerNER`

Perform relation extraction based on the `OpenNRE` model. https://github.com/thunlp/OpenNRE

Perform relation extraction based on the `REBEL` model. https://github.com/Babelscape/rebel https://huggingface.co/spaces/Babelscape/mrebel-demo

Render the lemma graph as a `PyVis` network.