Spaces:
Sleeping
Sleeping
!!! note | |
To run this notebook in JupyterLab, load [`examples/ex2_0.ipynb`](https://github.com/DerwenAI/textgraphs/blob/main/examples/ex2_0.ipynb) | |
# bootstrap the _lemma graph_ with RDF triples | |
Show how to bootstrap definitions in a _lemma graph_ by loading RDF, e.g., for synonyms. | |
## environment | |
```python | |
from icecream import ic | |
from pyinstrument import Profiler | |
import pyvis | |
import textgraphs | |
``` | |
```python | |
%load_ext watermark | |
``` | |
```python | |
%watermark | |
``` | |
Last updated: 2024-01-16T17:35:59.608787-08:00 | |
Python implementation: CPython | |
Python version : 3.10.11 | |
IPython version : 8.20.0 | |
Compiler : Clang 13.0.0 (clang-1300.0.29.30) | |
OS : Darwin | |
Release : 21.6.0 | |
Machine : x86_64 | |
Processor : i386 | |
CPU cores : 8 | |
Architecture: 64bit | |
```python | |
%watermark --iversions | |
``` | |
pyvis : 0.3.2 | |
textgraphs: 0.5.0 | |
sys : 3.10.11 (v3.10.11:7d4cc5aa85, Apr 4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)] | |
## load the bootstrap definitions | |
Define the bootstrap RDF triples in N3/Turtle format: we define an entity `Werner` as a synonym for `Werner Herzog` by using the [`skos:broader`](https://www.w3.org/TR/skos-reference/#semantic-relations) relation. Keep in mind that this entity may also refer to other Werners... | |
```python | |
TTL_STR: str = """ | |
@base <https://github.com/DerwenAI/textgraphs/ns/> . | |
@prefix dbo: <http://dbpedia.org/ontology/> . | |
@prefix skos: <http://www.w3.org/2004/02/skos/core#> . | |
<entity/werner_PROPN> a dbo:Person ; | |
skos:prefLabel "Werner"@en . | |
<entity/werner_PROPN_herzog_PROPN> a dbo:Person ; | |
skos:prefLabel "Werner Herzog"@en. | |
dbo:Person skos:definition "People, including fictional"@en ; | |
skos:prefLabel "person"@en . | |
<entity/werner_PROPN_herzog_PROPN> skos:broader <entity/werner_PROPN> . | |
""" | |
``` | |
Provide the source text | |
```python | |
SRC_TEXT: str = """ | |
Werner Herzog is a remarkable filmmaker and an intellectual originally from Germany, the son of Dietrich Herzog. | |
After the war, Werner fled to America to become famous. | |
""" | |
``` | |
set up the statistical stack profiling | |
```python | |
profiler: Profiler = Profiler() | |
profiler.start() | |
``` | |
set up the `TextGraphs` pipeline | |
```python | |
tg: textgraphs.TextGraphs = textgraphs.TextGraphs( | |
factory = textgraphs.PipelineFactory( | |
kg = textgraphs.KGWikiMedia( | |
spotlight_api = textgraphs.DBPEDIA_SPOTLIGHT_API, | |
dbpedia_search_api = textgraphs.DBPEDIA_SEARCH_API, | |
dbpedia_sparql_api = textgraphs.DBPEDIA_SPARQL_API, | |
wikidata_api = textgraphs.WIKIDATA_API, | |
min_alias = textgraphs.DBPEDIA_MIN_ALIAS, | |
min_similarity = textgraphs.DBPEDIA_MIN_SIM, | |
), | |
), | |
) | |
``` | |
load the bootstrap definitions | |
```python | |
tg.load_bootstrap_ttl( | |
TTL_STR, | |
debug = False, | |
) | |
``` | |
parse the input text | |
```python | |
pipe: textgraphs.Pipeline = tg.create_pipeline( | |
SRC_TEXT.strip(), | |
) | |
tg.collect_graph_elements( | |
pipe, | |
debug = False, | |
) | |
tg.construct_lemma_graph( | |
debug = False, | |
) | |
``` | |
## visualize the lemma graph | |
```python | |
render: textgraphs.RenderPyVis = tg.create_render() | |
pv_graph: pyvis.network.Network = render.render_lemma_graph( | |
debug = False, | |
) | |
``` | |
initialize the layout parameters | |
```python | |
pv_graph.force_atlas_2based( | |
gravity = -38, | |
central_gravity = 0.01, | |
spring_length = 231, | |
spring_strength = 0.7, | |
damping = 0.8, | |
overlap = 0, | |
) | |
pv_graph.show_buttons(filter_ = [ "physics" ]) | |
pv_graph.toggle_physics(True) | |
``` | |
```python | |
pv_graph.prep_notebook() | |
pv_graph.show("tmp.fig04.html") | |
``` | |
tmp.fig04.html | |
 | |
Notice how the `Werner` and `Werner Herzog` nodes are now linked? This synonym from the bootstrap definitions above provided means to link more portions of the _lemma graph_ than the demo in `ex0_0` with the same input text. | |
## statistical stack profile instrumentation | |
```python | |
profiler.stop() | |
``` | |
<pyinstrument.session.Session at 0x1522e2110> | |
```python | |
profiler.print() | |
``` | |
_ ._ __/__ _ _ _ _ _/_ Recorded: 17:35:59 Samples: 2846 | |
/_//_/// /_\ / //_// / //_'/ // Duration: 4.111 CPU time: 3.294 | |
/ _/ v4.6.1 | |
Program: /Users/paco/src/textgraphs/venv/lib/python3.10/site-packages/ipykernel_launcher.py -f /Users/paco/Library/Jupyter/runtime/kernel-4365d4ba-2d4d-4d4b-83e2-eb5ef8abfe26.json | |
4.111 IPythonKernel.dispatch_shell ipykernel/kernelbase.py:378 | |
└─ 4.075 IPythonKernel.execute_request ipykernel/kernelbase.py:721 | |
[9 frames hidden] ipykernel, IPython | |
3.995 ZMQInteractiveShell.run_ast_nodes IPython/core/interactiveshell.py:3394 | |
├─ 3.250 <module> ../ipykernel_4433/1372904243.py:1 | |
│ └─ 3.248 PipelineFactory.__init__ textgraphs/pipe.py:434 | |
│ └─ 3.232 load spacy/__init__.py:27 | |
│ [98 frames hidden] spacy, en_core_web_sm, catalogue, imp... | |
│ 0.496 tokenizer_factory spacy/language.py:110 | |
│ └─ 0.108 _validate_special_case spacy/tokenizer.pyx:573 | |
│ 0.439 <lambda> spacy/language.py:2170 | |
│ └─ 0.085 _validate_special_case spacy/tokenizer.pyx:573 | |
├─ 0.672 <module> ../ipykernel_4433/3257668275.py:1 | |
│ └─ 0.669 TextGraphs.create_pipeline textgraphs/doc.py:103 | |
│ └─ 0.669 PipelineFactory.create_pipeline textgraphs/pipe.py:508 | |
│ └─ 0.669 Pipeline.__init__ textgraphs/pipe.py:216 | |
│ └─ 0.669 English.__call__ spacy/language.py:1016 | |
│ [31 frames hidden] spacy, spacy_dbpedia_spotlight, reque... | |
└─ 0.055 <module> ../ipykernel_4433/72966960.py:1 | |
└─ 0.046 Network.prep_notebook pyvis/network.py:552 | |
[5 frames hidden] pyvis, jinja2 | |
## outro | |
_\[ more parts are in progress, getting added to this demo \]_ | |