Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.43.2
Reference: textgraphs
package

see copyright/license https://huggingface.co/spaces/DerwenAI/textgraphs/blob/main/README.md
TextGraphs
class
Construct a lemma graph from the unstructured text source,
then extract ranked phrases using a textgraph
algorithm.
infer_relations_async
method
infer_relations_async(pipe, debug=False)
Gather triples representing inferred relations and build edges, concurrently by running an async queue. https://stackoverflow.com/questions/52582685/using-asyncio-queue-for-producer-consumer-flow
Make sure to call beforehand: TextGraphs.collect_graph_elements()
pipe
:textgraphs.pipe.Pipeline
configured pipeline for this documentdebug
:bool
debugging flagreturns :
typing.List[textgraphs.elem.Edge]
a list of the inferredEdge
objects
__init__
method
__init__(factory=None, iri_base="https://github.com/DerwenAI/textgraphs/ns/")
Constructor.
factory
:typing.Optional[textgraphs.pipe.PipelineFactory]
optionalPipelineFactory
used to configure components
create_pipeline
method
create_pipeline(text_input)
Use the pipeline factory to create a pipeline (e.g., spaCy.Document
)
for each text input, which are typically paragraph-length.
text_input
:str
raw text to be parsed by this pipelinereturns :
textgraphs.pipe.Pipeline
a configured pipeline
create_render
method
create_render()
Create an object for rendering the graph in PyVis
HTML+JavaScript.
- returns :
textgraphs.vis.RenderPyVis
a configuredRenderPyVis
object for generating graph visualizations
collect_graph_elements
method
collect_graph_elements(pipe, text_id=0, para_id=0, debug=False)
Collect the elements of a lemma graph from the results of running
the textgraph
algorithm. These elements include: parse dependencies,
lemmas, entities, and noun chunks.
Make sure to call beforehand: TextGraphs.create_pipeline()
pipe
:textgraphs.pipe.Pipeline
configured pipeline for this documenttext_id
:int
text (top-level document) identifierpara_id
:int
paragraph identitiferdebug
:bool
debugging flag
construct_lemma_graph
method
construct_lemma_graph(debug=False)
Construct the base level of the lemma graph from the collected
elements. This gets represented in NetworkX
as a directed graph
with parallel edges.
Make sure to call beforehand: TextGraphs.collect_graph_elements()
debug
:bool
debugging flag
perform_entity_linking
method
perform_entity_linking(pipe, debug=False)
Perform entity linking based on the KnowledgeGraph
object.
Make sure to call beforehand: TextGraphs.collect_graph_elements()
pipe
:textgraphs.pipe.Pipeline
configured pipeline for this documentdebug
:bool
debugging flag
infer_relations
method
infer_relations(pipe, debug=False)
Gather triples representing inferred relations and build edges.
Make sure to call beforehand: TextGraphs.collect_graph_elements()
pipe
:textgraphs.pipe.Pipeline
configured pipeline for this documentdebug
:bool
debugging flagreturns :
typing.List[textgraphs.elem.Edge]
a list of the inferredEdge
objects
calc_phrase_ranks
method
calc_phrase_ranks(pr_alpha=0.85, debug=False)
Calculate the weights for each node in the lemma graph, then stack-rank the nodes so that entities have priority over lemmas.
Phrase ranks are normalized to sum to 1.0 and these now represent the ranked entities extracted from the document.
Make sure to call beforehand: TextGraphs.construct_lemma_graph()
pr_alpha
:float
optionalalpha
parameter for the PageRank algorithmdebug
:bool
debugging flag
get_phrases
method
get_phrases()
Return the entities extracted from the document.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
- yields :
extracted entities
get_phrases_as_df
method
get_phrases_as_df()
Return the ranked extracted entities as a dataframe.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
- returns :
pandas.core.frame.DataFrame
apandas.DataFrame
of the extracted entities
export_rdf
method
export_rdf(lang="en")
Extract the entities and relations which have IRIs as RDF triples.
lang
:str
language identifierreturns :
str
RDF triples N3 (Turtle) format as a string
denormalize_iri
method
denormalize_iri(uri_ref)
Discern between a parsed entity and a linked entity.
- returns :
str
lemma_key for a parsed entity, the full IRI for a linked entity
load_bootstrap_ttl
method
load_bootstrap_ttl(ttl_str, debug=False)
Parse a TTL string with an RDF semantic graph representation to load bootstrap definitions for the lemma graph prior to parsing, e.g., for synonyms.
ttl_str
:str
RDF triples in TTL (Turtle/N3) formatdebug
:bool
debugging flag
export_kuzu
method
export_kuzu(zip_name="lemma.zip", debug=False)
Export a labeled property graph for KùzuDB (openCypher).
debug
:bool
debugging flagreturns :
str
name of the generated ZIP file
SimpleGraph
class
An in-memory graph used to build a MultiDiGraph
in NetworkX.
__init__
method
__init__()
Constructor.
reset
method
reset()
Re-initialize the data structures, resetting all but the configuration.
make_node
method
make_node(tokens, key, span, kind, text_id, para_id, sent_id, label=None, length=1, linked=True)
Lookup and return a Node
object.
By default, link matching keys into the same node.
Otherwise instantiate a new node if it does not exist already.
tokens
:typing.List[textgraphs.elem.Node]
list of parsed tokenskey
:str
lemma key (invariant)span
:spacy.tokens.token.Token
token span for the parsed entitykind
:<enum 'NodeEnum'>
the kind of thisNode
objecttext_id
:int
text (top-level document) identifierpara_id
:int
paragraph identitifersent_id
:int
sentence identifierlabel
:typing.Optional[str]
node label (for a new object)length
:int
length of token spanlinked
:bool
flag for whether this links to an entityreturns :
textgraphs.elem.Node
the constructedNode
object
make_edge
method
make_edge(src_node, dst_node, kind, rel, prob, key=None, debug=False)
Lookup an edge, creating a new one if it does not exist already, and increment the count if it does.
src_node
:textgraphs.elem.Node
source node in the tripledst_node
:textgraphs.elem.Node
destination node in the triplekind
:<enum 'RelEnum'>
the kind of thisEdge
objectrel
:str
relation labelprob
:float
probability of thisEdge
within the graphkey
:typing.Optional[str]
lemma key (invariant); generate a key if this is not provideddebug
:bool
debugging flagreturns :
typing.Optional[textgraphs.elem.Edge]
the constructedEdge
object; this may beNone
if the input parameters indicate skipping the edge
dump_lemma_graph
method
dump_lemma_graph()
Dump the lemma graph as a JSON string in node-link format, suitable for serialization and subsequent use in JavaScript, Neo4j, Graphistry, etc.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
- returns :
str
a JSON representation of the exported lemma graph in
load_lemma_graph
method
load_lemma_graph(json_str, debug=False)
Load from a JSON string in a JSON representation of the exported lemma graph in node-link format
debug
:bool
debugging flag
Node
class
A data class representing one node, i.e., an extracted phrase.
__repr__
method
__repr__()
get_linked_label
method
get_linked_label()
When this node has a linked entity, return that IRI.
Otherwise return its label
value.
- returns :
typing.Optional[str]
a label for the linked entity
get_name
method
get_name()
Return a brief name for the graphical depiction of this Node.
- returns :
str
brief label to be used in a graph
get_stacked_count
method
get_stacked_count()
Return a modified count, to redact verbs and linked entities from the stack-rank partitions.
- returns :
int
count, used for re-ranking extracted entities
get_pos
method
get_pos()
Generate a position span for OpenNRE
.
- returns :
typing.Tuple[int, int]
a position span needed forOpenNRE
relation extraction
Edge
class
A data class representing an edge between two nodes.
__repr__
method
__repr__()
EnumBase
class
A mixin for Enum codecs.
NodeEnum
class
Enumeration for the kinds of node categories
RelEnum
class
Enumeration for the kinds of edge relations
PipelineFactory
class
Factory pattern for building a pipeline, which is one of the more
expensive operations with spaCy
__init__
method
__init__(spacy_model="en_core_web_sm", ner=None, kg=<textgraphs.pipe.KnowledgeGraph object at 0x130529960>, infer_rels=[])
Constructor which instantiates the spaCy
pipelines:
tok_pipe
-- regular generator for parsed tokensner_pipe
-- with entities mergedaux_pipe
-- spotlight entity linking
which will be needed for parsing and entity linking.
spacy_model
:str
the specific model to use inspaCy
pipelinesner
:typing.Optional[textgraphs.pipe.Component]
optional custom NER componentkg
:textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linkinginfer_rels
:typing.List[textgraphs.pipe.InferRel]
a list of components for inferring relations
create_pipeline
method
create_pipeline(text_input)
Instantiate the document pipelines needed to parse the input text.
text_input
:str
raw text to be parsedreturns :
textgraphs.pipe.Pipeline
a configuredPipeline
object
Pipeline
class
Manage parsing of a document, which is assumed to be paragraph-sized.
__init__
method
__init__(text_input, tok_pipe, ner_pipe, aux_pipe, kg, infer_rels)
Constructor.
text_input
:str
raw text to be parsedtok_pipe
:spacy.language.Language
thespaCy.Language
pipeline used for tallying individual tokensner_pipe
:spacy.language.Language
thespaCy.Language
pipeline used for tallying named entitiesaux_pipe
:spacy.language.Language
thespaCy.Language
pipeline used for auxiliary components (e.g.,DBPedia Spotlight
)kg
:textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linkinginfer_rels
:typing.List[textgraphs.pipe.InferRel]
a list of components for inferring relations
get_lemma_key
classmethod
get_lemma_key(span, placeholder=False)
Compose a unique, invariant lemma key for the given span.
span
:typing.Union[spacy.tokens.span.Span, spacy.tokens.token.Token]
span of tokens within the lemmaplaceholder
:bool
flag for whether to create a placeholderreturns :
str
a composed lemma key
get_ent_lemma_keys
method
get_ent_lemma_keys()
Iterate through the fully qualified lemma keys for an extracted entity.
- yields :
the lemma keys within an extracted entity
link_noun_chunks
method
link_noun_chunks(nodes, debug=False)
Link any noun chunks which are not already subsumed by named entities.
nodes
:dict
dictionary ofNode
objects in the graphdebug
:bool
debugging flagreturns :
typing.List[textgraphs.elem.NounChunk]
a list of identified noun chunks which are novel
iter_entity_pairs
method
iter_entity_pairs(pipe_graph, max_skip, debug=True)
Iterator for entity pairs for which the algorithm infers relations.
pipe_graph
:networkx.classes.multigraph.MultiGraph
anetworkx.MultiGraph
representation of the graph, reused for graph algorithmsmax_skip
:int
maximum distance between entities for inferred relationsdebug
:bool
debugging flagyields :
pairs of entities within a range, e.g., to use for relation extraction
Component
class
Abstract base class for a spaCy
pipeline component.
augment_pipe
method
augment_pipe(factory)
Encapsulate a spaCy
call to add_pipe()
configuration.
factory
:PipelineFactory
aPipelineFactory
used to configure components
NERSpanMarker
class
Configures a spaCy
pipeline component for SpanMarkerNER
__init__
method
__init__(ner_model="tomaarsen/span-marker-roberta-large-ontonotes5")
Constructor.
ner_model
:str
model to be used inSpanMarker
augment_pipe
method
augment_pipe(factory)
Encapsulate a spaCy
call to add_pipe()
configuration.
factory
:textgraphs.pipe.PipelineFactory
thePipelineFactory
used to configure this pipeline component
NounChunk
class
A data class representing one noun chunk, i.e., a candidate as an extracted phrase.
__repr__
method
__repr__()
KnowledgeGraph
class
Base class for a knowledge graph interface.
augment_pipe
method
augment_pipe(factory)
Encapsulate a spaCy
call to add_pipe()
configuration.
factory
:PipelineFactory
aPipelineFactory
used to configure components
remap_ner
method
remap_ner(label)
Remap the OntoTypes4 values from NER output to more general-purpose IRIs.
label
:typing.Optional[str]
input NER label, anOntoTypes4
valuereturns :
typing.Optional[str]
an IRI for the named entity
normalize_prefix
method
normalize_prefix(iri, debug=False)
Normalize the given IRI to use standard namespace prefixes.
iri
:str
input IRI, in fully-qualified domain representationdebug
:bool
debugging flagreturns :
str
the compact IRI representation, using an RDF namespace prefix
perform_entity_linking
method
perform_entity_linking(graph, pipe, debug=False)
Perform entity linking based on "spotlight" and other services.
graph
:textgraphs.graph.SimpleGraph
source graphpipe
:Pipeline
configured pipeline for the current documentdebug
:bool
debugging flag
resolve_rel_iri
method
resolve_rel_iri(rel, lang="en", debug=False)
Resolve a rel
string from a relation extraction model which has
been trained on this knowledge graph.
rel
:str
relation label, generation these source from Wikidata for many RE projectslang
:str
language identifierdebug
:bool
debugging flagreturns :
typing.Optional[str]
a resolved IRI
KGSearchHit
class
A data class representing a hit from a knowledge graph search.
__repr__
method
__repr__()
KGWikiMedia
class
Manage access to WikiMedia-related APIs.
__init__
method
__init__(spotlight_api="https://api.dbpedia-spotlight.org/en", dbpedia_search_api="https://lookup.dbpedia.org/api/search", dbpedia_sparql_api="https://dbpedia.org/sparql", wikidata_api="https://www.wikidata.org/w/api.php", ner_map=OrderedDict([('CARDINAL', {'iri': 'http://dbpedia.org/resource/Cardinal_number', 'definition': 'Numerals that do not fall under another type', 'label': 'cardinal number'}), ('DATE', {'iri': 'http://dbpedia.org/ontology/date', 'definition': 'Absolute or relative dates or periods', 'label': 'date'}), ('EVENT', {'iri': 'http://dbpedia.org/ontology/Event', 'definition': 'Named hurricanes, battles, wars, sports events, etc.', 'label': 'event'}), ('FAC', {'iri': 'http://dbpedia.org/ontology/Infrastructure', 'definition': 'Buildings, airports, highways, bridges, etc.', 'label': 'infrastructure'}), ('GPE', {'iri': 'http://dbpedia.org/ontology/Country', 'definition': 'Countries, cities, states', 'label': 'country'}), ('LANGUAGE', {'iri': 'http://dbpedia.org/ontology/Language', 'definition': 'Any named language', 'label': 'language'}), ('LAW', {'iri': 'http://dbpedia.org/ontology/Law', 'definition': 'Named documents made into laws', 'label': 'law'}), ('LOC', {'iri': 'http://dbpedia.org/ontology/Place', 'definition': 'Non-GPE locations, mountain ranges, bodies of water', 'label': 'place'}), ('MONEY', {'iri': 'http://dbpedia.org/resource/Money', 'definition': 'Monetary values, including unit', 'label': 'money'}), ('NORP', {'iri': 'http://dbpedia.org/ontology/nationality', 'definition': 'Nationalities or religious or political groups', 'label': 'nationality'}), ('ORDINAL', {'iri': 'http://dbpedia.org/resource/Ordinal_number', 'definition': 'Ordinal number, i.e., first, second, etc.', 'label': 'ordinal number'}), ('ORG', {'iri': 'http://dbpedia.org/ontology/Organisation', 'definition': 'Companies, agencies, institutions, etc.', 'label': 'organization'}), ('PERCENT', {'iri': 'http://dbpedia.org/resource/Percentage', 'definition': 'Percentage', 'label': 'percentage'}), ('PERSON', {'iri': 'http://dbpedia.org/ontology/Person', 'definition': 'People, including fictional', 'label': 'person'}), ('PRODUCT', {'iri': 'http://dbpedia.org/ontology/product', 'definition': 'Vehicles, weapons, foods, etc. (Not services)', 'label': 'product'}), ('QUANTITY', {'iri': 'http://dbpedia.org/resource/Quantity', 'definition': 'Measurements, as of weight or distance', 'label': 'quantity'}), ('TIME', {'iri': 'http://dbpedia.org/ontology/time', 'definition': 'Times smaller than a day', 'label': 'time'}), ('WORK OF ART', {'iri': 'http://dbpedia.org/resource/Work_of_art', 'definition': 'Titles of books, songs, etc.', 'label': 'work of art'})]), ns_prefix=OrderedDict([('dbc', 'http://dbpedia.org/resource/Category:'), ('dbt', 'http://dbpedia.org/resource/Template:'), ('dbr', 'http://dbpedia.org/resource/'), ('yago', 'http://dbpedia.org/class/yago/'), ('dbd', 'http://dbpedia.org/datatype/'), ('dbo', 'http://dbpedia.org/ontology/'), ('dbp', 'http://dbpedia.org/property/'), ('units', 'http://dbpedia.org/units/'), ('dbpedia-commons', 'http://commons.dbpedia.org/resource/'), ('dbpedia-wikicompany', 'http://dbpedia.openlinksw.com/wikicompany/'), ('dbpedia-wikidata', 'http://wikidata.dbpedia.org/resource/'), ('wd', 'http://www.wikidata.org/'), ('wd_ent', 'http://www.wikidata.org/entity/'), ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'), ('schema', 'https://schema.org/'), ('owl', 'http://www.w3.org/2002/07/owl#')]), min_alias=0.8, min_similarity=0.9)
Constructor.
spotlight_api
:str
DBPedia Spotlight
API or equivalent local servicedbpedia_search_api
:str
DBPedia Search
API or equivalent local servicedbpedia_sparql_api
:str
DBPedia SPARQL
API or equivalent local servicewikidata_api
:str
Wikidata Search
API or equivalent local servicener_map
:dict
named entity map for standardizing IRIsns_prefix
:dict
RDF namespace prefixesmin_alias
:float
minimum alias probability threshold for accepting linked entitiesmin_similarity
:float
minimum label similarity threshold for accepting linked entities
augment_pipe
method
augment_pipe(factory)
Encapsulate a spaCy
call to add_pipe()
configuration.
factory
:textgraphs.pipe.PipelineFactory
aPipelineFactory
used to configure components
remap_ner
method
remap_ner(label)
Remap the OntoTypes4 values from NER output to more general-purpose IRIs.
label
:typing.Optional[str]
input NER label, anOntoTypes4
valuereturns :
typing.Optional[str]
an IRI for the named entity
normalize_prefix
method
normalize_prefix(iri, debug=False)
Normalize the given IRI using the standard DBPedia namespace prefixes.
iri
:str
input IRI, in fully-qualified domain representationdebug
:bool
debugging flagreturns :
str
the compact IRI representation, using an RDF namespace prefix
perform_entity_linking
method
perform_entity_linking(graph, pipe, debug=False)
Perform entity linking based on DBPedia Spotlight
and other services.
graph
:textgraphs.graph.SimpleGraph
source graphpipe
:textgraphs.pipe.Pipeline
configured pipeline for the current documentdebug
:bool
debugging flag
resolve_rel_iri
method
resolve_rel_iri(rel, lang="en", debug=False)
Resolve a rel
string from a relation extraction model which has
been trained on this knowledge graph, which defaults to using the
WikiMedia
graphs.
rel
:str
relation label, generation these source from Wikidata for many RE projectslang
:str
language identifierdebug
:bool
debugging flagreturns :
typing.Optional[str]
a resolved IRI
wikidata_search
method
wikidata_search(query, lang="en", debug=False)
Query the Wikidata search API.
query
:str
query stringlang
:str
language identifierdebug
:bool
debugging flagreturns :
typing.Optional[textgraphs.elem.KGSearchHit]
search hit, if any
dbpedia_search_entity
method
dbpedia_search_entity(query, lang="en", debug=False)
Perform a DBPedia API search.
query
:str
query stringlang
:str
language identifierdebug
:bool
debugging flagreturns :
typing.Optional[textgraphs.elem.KGSearchHit]
search hit, if any
dbpedia_sparql_query
method
dbpedia_sparql_query(sparql, debug=False)
Perform a SPARQL query on DBPedia.
sparql
:str
SPARQL query stringdebug
:bool
debugging flagreturns :
dict
dictionary of query results
dbpedia_wikidata_equiv
method
dbpedia_wikidata_equiv(dbpedia_iri, debug=False)
Perform a SPARQL query on DBPedia to find an equivalent Wikidata entity.
dbpedia_iri
:str
IRI in DBpediadebug
:bool
debugging flagreturns :
typing.Optional[str]
equivalent IRI in Wikidata
LinkedEntity
class
A data class representing one linked entity.
__repr__
method
__repr__()
InferRel
class
Abstract base class for a relation extraction model wrapper.
gen_triples_async
method
gen_triples_async(pipe, queue, debug=False)
Infer relations as triples produced to a queue concurrently.
pipe
:Pipeline
configured pipeline for the current documentqueue
:asyncio.queues.Queue
queue of inference tasks to be performeddebug
:bool
debugging flag
gen_triples
method
gen_triples(pipe, debug=False)
Infer relations as triples through a generator iteratively.
pipe
:Pipeline
configured pipeline for the current documentdebug
:bool
debugging flagyields :
generated triples
InferRel_OpenNRE
class
Perform relation extraction based on the OpenNRE
model.
https://github.com/thunlp/OpenNRE
__init__
method
__init__(model="wiki80_cnn_softmax", max_skip=11, min_prob=0.9)
Constructor.
model
:str
the specific model to be used inOpenNRE
max_skip
:int
maximum distance between entities for inferred relationsmin_prob
:float
minimum probability threshold for accepting an inferred relation
gen_triples
method
gen_triples(pipe, debug=False)
Iterate on entity pairs to drive OpenNRE
, inferring relations
represented as triples which get produced by a generator.
pipe
:textgraphs.pipe.Pipeline
configured pipeline for the current documentdebug
:bool
debugging flagyields :
generated triples as candidates for inferred relations
InferRel_Rebel
class
Perform relation extraction based on the REBEL
model.
https://github.com/Babelscape/rebel
https://huggingface.co/spaces/Babelscape/mrebel-demo
__init__
method
__init__(lang="en_XX", mrebel_model="Babelscape/mrebel-large")
Constructor.
lang
:str
language identifiermrebel_model
:str
tokenizer model to be used
tokenize_sent
method
tokenize_sent(text)
Apply the tokenizer manually, since we need to extract special tokens.
text
:str
input text for the sentence to be tokenizedreturns :
str
extracted tokens
extract_triplets_typed
method
extract_triplets_typed(text)
Parse the generated text and extract its triplets.
text
:str
input text for the sentence to use in inferencereturns :
list
a list of extracted triples
gen_triples
method
gen_triples(pipe, debug=False)
Drive REBEL
to infer relations for each sentence, represented as
triples which get produced by a generator.
pipe
:textgraphs.pipe.Pipeline
configured pipeline for the current documentdebug
:bool
debugging flagyields :
generated triples as candidates for inferred relations
RenderPyVis
class
Render the lemma graph as a PyVis
network.
__init__
method
__init__(graph, kg)
Constructor.
graph
:textgraphs.graph.SimpleGraph
source graph to be visualizedkg
:textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linking
render_lemma_graph
method
render_lemma_graph(debug=True)
Prepare the structure of the NetworkX
graph to use for building
and returning a PyVis
network to render.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
debug
:bool
debugging flagreturns :
pyvis.network.Network
<apyvis.network.Network
interactive visualization
draw_communities
method
draw_communities(spring_distance=1.4, debug=False)
Cluster the communities in the lemma graph, then draw a
NetworkX
graph of the notes with a specific color for each
community.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
spring_distance
:float
NetworkX
parameter used to separate clusters visuallydebug
:bool
debugging flagreturns :
typing.Dict[int, int]
a map of the calculated communities
generate_wordcloud
method
generate_wordcloud(background="black")
Generate a tag cloud from the given phrases.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
background
:str
background color for the renderingreturns :
wordcloud.wordcloud.WordCloud
the rendering as awordcloud.WordCloud
object, which can be used to generate PNG images, etc.
NodeStyle
class
Dataclass used for styling PyVis nodes.
__setattr__
method
__setattr__(name, value)
GraphOfRelations
class
Attempt to reproduce results published in "INGRAM: Inductive Knowledge Graph Embedding via Relation Graphs" https://arxiv.org/abs/2305.19987
__init__
method
__init__(source)
Constructor.
source
:textgraphs.graph.SimpleGraph
source graph to be transformed
load_ingram
method
load_ingram(json_file, debug=False)
Load data for a source graph, as illustrated in lee2023ingram
json_file
:pathlib.Path
path for the JSON dataset to loaddebug
:bool
debugging flag
seeds
method
seeds(debug=False)
Prep data for the topological transform illustrated in lee2023ingram
debug
:bool
debugging flag
trace_source_graph
method
trace_source_graph()
Output a "seed" representation of the source graph.
construct_gor
method
construct_gor(debug=False)
Perform the topological transform described by lee2023ingram, constructing a graph of relations (GOR) and calculating affinity scores between entities in the GOR based on their definitions:
we measure the affinity between two relations by considering how many entities are shared between them and how frequently they share the same entity
debug
:bool
debugging flag
tally_frequencies
classmethod
tally_frequencies(counter)
Tally the frequency of shared entities.
counter
:collections.Counter
counter
data collection for the rel_b/entity pairsreturns :
int
tallied values for one relation
get_affinity_scores
method
get_affinity_scores(debug=False)
Reproduce metrics based on the example published in lee2023ingram
debug
:bool
debugging flagreturns :
typing.Dict[tuple, float]
the calculated affinity scores
trace_metrics
method
trace_metrics(scores)
Compare the calculated affinity scores with results from a published example.
scores
:typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)returns :
pandas.core.frame.DataFrame
apandas.DataFrame
where the rows compare expected vs. observed affinity scores
render_gor_plt
method
render_gor_plt(scores)
Visualize the graph of relations using matplotlib
scores
:typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)
render_gor_pyvis
method
render_gor_pyvis(scores)
Visualize the graph of relations interactively using PyVis
scores
:typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)returns :
pyvis.network.Network
apyvis.networkNetwork
representation of the transformed graph
TransArc
class
A data class representing one transformed rel-node-rel triple in a graph of relations.
__repr__
method
__repr__()
RelDir
class
Enumeration for the directions of a relation.
SheafSeed
class
A data class representing a node from the source graph plus its partial edge, based on a Sheaf Theory decomposition of a graph.
__repr__
method
__repr__()
Affinity
class
A data class representing the affinity scores from one entity in the transformed graph of relations.
NB: there are much more efficient ways to calculate these affinity scores using sparse tensor algebra; this approach illustrates the process -- for research and debugging.
__repr__
method
__repr__()
module functions
calc_quantile_bins
function
calc_quantile_bins(num_rows)
Calculate the bins to use for a quantile stripe,
using numpy.linspace
num_rows
:int
number of rows in the target dataframereturns :
numpy.ndarray
calculated bins, as anumpy.ndarray
get_repo_version
function
get_repo_version()
Access the Git repository information and return items to identify the version/commit running in production.
- returns :
typing.Tuple[str, str]
version tag and commit hash
root_mean_square
function
root_mean_square(values)
Calculate the root mean square of the values in the given list.
values
:typing.List[float]
list of values to use in the RMS calculationreturns :
float
RMS metric as a float
stripe_column
function
stripe_column(values, bins)
Stripe a column in a dataframe, by interpolating quantiles into a set of discrete indexes.
values
:list
list of values to stripebins
:int
quantile bins; seecalc_quantile_bins()
returns :
numpy.ndarray
the striped column values, as anumpy.ndarray