|
|
|
|
|
|
|
<HTML> |
|
<HEAD> |
|
<TITLE>PROLOGDB(5WN) manual page</TITLE> |
|
</HEAD> |
|
<BODY> |
|
<A HREF="#toc">Table of Contents</A><P> |
|
|
|
<H2><A NAME="sect0" HREF="#toc0">NAME </A></H2> |
|
wn_pl - description of Prolog database files |
|
<H2><A NAME="sect1" HREF="#toc1">DESCRIPTION </A></H2> |
|
The files |
|
<B>wn_ </B><I>* </I><B>.pl </B> contain the WordNet database in a prolog-readable format. A prolog |
|
interface to WordNet is not implemented. <P> |
|
The prolog database is very large |
|
and may take many minutes to load into the Prolog workspace. A separate |
|
file has been created for each WordNet relation giving the user the ability |
|
to load only those parts of the database that they are interested. <P> |
|
See |
|
<B>FILES </B>, below, for a list of the database files and <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A> |
|
and <B><A HREF="wninput.5WN.html">wninput</B>(5WN)<B></B></A> |
|
|
|
for detailed descriptions of the various WordNet relations (referred to |
|
as <I>operators </I> in this manual page). |
|
<H3><A NAME="sect2" HREF="#toc2">File Format </A></H3> |
|
Each prolog database file |
|
contains information corresponding to the synsets and word senses contained |
|
in the WordNet database. In the prolog version of the database, the <I>synset_id |
|
</I>s (defined below) are used as unique synset identifiers. <P> |
|
Each line of |
|
a file contains an operator that corresponds to a WordNet relation. All |
|
lines with the same <I>operator </I> value are stored in the file <B>wn_ </B><I>operator |
|
</I><B>.pl </B>. <P> |
|
The general format of a line in a prolog database file is as follows: |
|
<P> |
|
<blockquote><I>operator<B>(<I>field1<B>,<I> ... <B>,<I>fieldn<B>). </B></I></B></I></B></I></B></I> <BR> |
|
</blockquote> |
|
<P> |
|
Each line contains the name of the |
|
operator, followed by a left parenthesis, a comma-separated list of fields, |
|
a right parenthesis, and a period. Note there are no spaces, and each |
|
line is terminated with a newline character. |
|
<H3><A NAME="sect3" HREF="#toc3">Operators </A></H3> |
|
Each WordNet relation |
|
is represented in a separate file by <I>operator </I> name. Some operators are |
|
reflexive (i.e. the "reverse" relation is implicit). So, for example, if |
|
<B>x </B> is a hypernym of <B>y </B>, <B>y </B> is necessarily a hyponym of <B>x </B>. In the prolog |
|
database, reflected pointers are usually implied for semantic relations. |
|
<P> |
|
Semantic relations are represented by a pair of <I>synset_id </I>s, in which |
|
the first <I>synset_id </I> is generally the source of the relation and the second |
|
is the target. If two pairs <I>synset_id </I><B>, </B><I>w_num </I> are present, the operator |
|
represents a lexical relation between word forms. <P> |
|
<B>s(<I>synset_id<B>,<I>w_num<B>,'<I>word<B>',<I>ss_type<B>,<I>sense_number<B>,<I>tag_count<B>). |
|
</B></I></B></I></B></I></B></I></B></I></B></I></B><BR> |
|
<blockquote>A <B>s </B> operator is present for every word sense in WordNet. In <B>wn_s.pl |
|
</B>, <I>w_num </I> specifies the word number for <I>word </I> in the synset. </blockquote> |
|
<P> |
|
<B>g(<I>synset_id<B>,'(<I>gloss<B>)'). |
|
</B></I></B></I></B><BR> |
|
<blockquote>The <B>g </B> operator specifies the gloss for a synset. </blockquote> |
|
<P> |
|
<B>hyp(<I>synset_id<B>,<I>synset_id<B>). |
|
</B></I></B></I></B><BR> |
|
<blockquote>The <B>hyp </B> operator specifies that the second synset is a hypernym of |
|
the first synset. This relation holds for nouns and verbs. The reflexive |
|
operator, hyponym, implies that the first synset is a hyponym of the second |
|
synset. </blockquote> |
|
<P> |
|
<B>ent(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR> |
|
<blockquote>The <B>ent </B> operator specifies that the |
|
second synset is an entailment of first synset. This relation only holds |
|
for verbs. </blockquote> |
|
<P> |
|
<B>sim(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR> |
|
<blockquote>The <B>sim </B> operator specifies that |
|
the second synset is similar in meaning to the first synset. This means |
|
that the second synset is a satellite the first synset, which is the cluster |
|
head. This relation only holds for adjective synsets contained in adjective |
|
clusters. </blockquote> |
|
<P> |
|
<B>mm(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR> |
|
<blockquote>The <B>mm </B> operator specifies that the |
|
second synset is a member meronym of the first synset. This relation only |
|
holds for nouns. The reflexive operator, member holonym, can be implied. |
|
</blockquote> |
|
<P> |
|
<B>ms(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR> |
|
<blockquote>The <B>ms </B> operator specifies that the second |
|
synset is a substance meronym of the first synset. This relation only |
|
holds for nouns. The reflexive operator, substance holonym, can be implied. |
|
</blockquote> |
|
<P> |
|
<B>mp(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR> |
|
<blockquote>The <B>mp </B> operator specifies that the second |
|
synset is a part meronym of the first synset. This relation only holds |
|
for nouns. The reflexive operator, part holonym, can be implied. </blockquote> |
|
<P> |
|
<B>cs(<I>synset_id<B>,<I>synset_id<B>). |
|
</B></I></B></I></B><BR> |
|
<blockquote>The <B>cs </B> operator specifies that the second synset is a cause of the |
|
first synset. This relation only holds for verbs. </blockquote> |
|
<P> |
|
<B>vgp(<I>synset_id<B>,<I>synset_id<B>). |
|
</B></I></B></I></B><BR> |
|
<blockquote>The <B>vgp </B> operator specifies verb synsets that are similar in meaning |
|
and should be grouped together when displayed in response to a grouped |
|
synset search. </blockquote> |
|
<P> |
|
<B>at(<I>synset_id<B>,<I>synset_id<B>). </B></I></B></I></B><BR> |
|
<blockquote>The <B>at </B> operator defines the |
|
attribute relation between noun and adjective synset pairs in which the |
|
adjective is a value of the noun. For each pair, both relations are listed |
|
(ie. each <I>synset_id </I> is both a source and target). </blockquote> |
|
<P> |
|
<B>ant(<I>synset_id<B>,<I>w_num<B>,<I>synset_id<B>,<I>w_num<B>). |
|
</B></I></B></I></B></I></B></I></B><BR> |
|
<blockquote>The <B>ant </B> operator specifies antonymous <I>word </I>s. This is a lexical relation |
|
that holds for all syntactic categories. For each antonymous pair, both |
|
relations are listed (ie. each <I>synset_id,w_num </I> pair is both a source and |
|
target word.) </blockquote> |
|
<P> |
|
<B>sa(<I>synset_id<B>,<I>w_num<B>,<I>synset_id<B>,<I>w_num<B>). </B></I></B></I></B></I></B></I></B><BR> |
|
<blockquote>The <B>sa </B> operator |
|
specifies that additional information about the first word can be obtained |
|
by seeing the second word. This operator is only defined for verbs and |
|
adjectives. There is no reflexive relation (ie. it cannot be inferred that |
|
the additional information about the second word can be obtained from |
|
the first word). </blockquote> |
|
<P> |
|
<B>ppl(<I>synset_id<B>,<I>w_num<B>,<I>synset_id<B>,<I>w_num<B>). </B></I></B></I></B></I></B></I></B><BR> |
|
<blockquote>The <B>ppl </B> operator |
|
specifies that the adjective first word is a participle of the verb second |
|
word. The reflexive operator can be implied. </blockquote> |
|
<P> |
|
<B>per(<I>synset_id<B>,<I>w_num<B>,<I>synset_id<B>,<I>w_num<B>). |
|
</B></I></B></I></B></I></B></I></B><BR> |
|
<blockquote>The <B>per </B> operator specifies two different relations based on the parts |
|
of speech involved. If the first word is in an adjective synset, that |
|
word pertains to either the noun or adjective second word. If the first |
|
word is in an adverb synset, that word is derived from the adjective second |
|
word. </blockquote> |
|
<P> |
|
<B>fr(<I>synset_id<B>,<I>f_num<B>,<I>w_num<B>). </B></I></B></I></B></I></B><BR> |
|
<blockquote>The <B>fr </B> operator specifies a generic |
|
sentence frame for one or all words in a synset. The operator is defined |
|
only for verbs. </blockquote> |
|
|
|
<H3><A NAME="sect4" HREF="#toc4">Field Definitions </A></H3> |
|
A <I>synset_id </I> is a nine byte field in |
|
which the first byte defines the syntactic category of the synset and |
|
the remaining eight bytes are a <I>synset_offset </I>, as defined in <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A> |
|
, |
|
indicating the byte offset in the <B>data. </B><I>pos </I> file that corresponds to the |
|
syntactic category. <P> |
|
The syntactic category is encoded as: <P> |
|
<blockquote><B>1 </B><tt> </tt> <tt> </tt> NOUN <BR> |
|
|
|
<B>2 </B><tt> </tt> <tt> </tt> VERB <BR> |
|
<B>3 </B><tt> </tt> <tt> </tt> ADJECTIVE <BR> |
|
<B>4 </B><tt> </tt> <tt> </tt> ADVERB <BR> |
|
</blockquote> |
|
<P> |
|
<I>w_num </I>, if present, indicates which word |
|
in the synset is being referred to. Word numbers are assigned to the <I>word |
|
</I> fields in a synset, from left to right, beginning with 1. When used to |
|
represent lexical WordNet relations <I>w_num </I> may be 0, indicating that the |
|
relation holds for all words in the synset indicated by the preceding |
|
<I>synset_id </I>. See <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A> |
|
for a discussion of semantic and lexical |
|
relations. <P> |
|
<I>ss_type </I> is a one character code indicating the synset type: |
|
<P> |
|
<blockquote><B>n </B><tt> </tt> <tt> </tt> NOUN <BR> |
|
<B>v </B><tt> </tt> <tt> </tt> VERB <BR> |
|
<B>a </B><tt> </tt> <tt> </tt> ADJECTIVE <BR> |
|
<B>s </B><tt> </tt> <tt> </tt> ADJECTIVE SATELLITE <BR> |
|
<B>r </B><tt> </tt> <tt> </tt> ADVERB <BR> |
|
</blockquote> |
|
<P> |
|
<I>sense_number |
|
</I> specifies the sense number of the word, within the part of speech encoded |
|
in the <I>synset_id </I>, in the WordNet database. <P> |
|
<I>word </I> is the ASCII text of |
|
the word as entered in the synset by the lexicographer, with spaces replaced |
|
by underscore characters (<B>_ </B>). The text of the word is case sensitive. |
|
An adjective <I>word </I> is immediately followed by a syntactic marker if one |
|
was specified in the lexicographer file. A syntactic marker is appended, |
|
in parentheses, onto <I>word </I> without any intervening spaces. See <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A> |
|
|
|
for a list of the syntactic markers for adjectives. <P> |
|
Each synset has a |
|
<I>gloss </I> that may contain a definition, one or more example sentences, or |
|
both. Note that glosses are enclosed in single forward quotes and parentheses: <B>'(<I>gloss<B>)' |
|
</B></I></B>. <P> |
|
<I>f_num </I> specifies the generic sentence frame number for word <I>w_num </I> in |
|
the synset indicated by <I>synset_id </I>. Note that when <I>w_num </I> is <B>0 </B>, the frame |
|
number applies to all words in the synset. If non-zero, the frame applies |
|
to that word in the synset. <P> |
|
In WordNet, sense numbers are assigned as |
|
described in <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A> |
|
. <I>tag_count </I> is the number of times the sense was |
|
tagged in the Semantic Concordances, and <B>0 </B> if it was not instantiated. |
|
|
|
<H2><A NAME="sect5" HREF="#toc5">NOTES </A></H2> |
|
Since single forward quotes are used to enclose character strings, |
|
single quote characters found in <I>word </I> and <I>gloss </I> fields are represented |
|
as two adjacent single quote characters. <P> |
|
The load time can be greatly |
|
reduced by creating "object language" versions of the files, an option |
|
that is supported by some implementations, such as Quintus Prolog. |
|
<H2><A NAME="sect6" HREF="#toc6">ENVIRONMENT |
|
VARIABLES (UNIX) </A></H2> |
|
|
|
<DL> |
|
|
|
<DT><B>WNHOME</B> </DT> |
|
<DD>Base directory for WordNet. Default is <B>/usr/local/WordNet-3.0 |
|
</B>. </DD> |
|
</DL> |
|
|
|
<H2><A NAME="sect7" HREF="#toc7">REGISTRY (WINDOWS) </A></H2> |
|
|
|
<DL> |
|
|
|
<DT><B>HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome</B> </DT> |
|
<DD>Base directory |
|
for WordNet. Default is <B>C:\Program Files\WordNet\3.0 </B>. </DD> |
|
</DL> |
|
|
|
<H2><A NAME="sect8" HREF="#toc8">FILES </A></H2> |
|
All files are |
|
in <B>WNHOME/prolog </B> on Unix platforms and <B>WNHome\prolog </B> on Windows platforms |
|
|
|
<DL> |
|
|
|
<DT><B>wn_s.pl</B> </DT> |
|
<DD>synset pointers </DD> |
|
|
|
<DT><B>wn_g.pl</B> </DT> |
|
<DD>gloss pointers </DD> |
|
|
|
<DT><B>wn_hyp.pl</B> </DT> |
|
<DD>hypernym pointers |
|
</DD> |
|
|
|
<DT><B>wn_ent.pl</B> </DT> |
|
<DD>entailment pointers </DD> |
|
|
|
<DT><B>wn_sim.pl</B> </DT> |
|
<DD>similar pointers </DD> |
|
|
|
<DT><B>wn_mm.pl</B> </DT> |
|
<DD>member |
|
meronym pointers </DD> |
|
|
|
<DT><B>wn_ms.pl</B> </DT> |
|
<DD>substance meronym pointers </DD> |
|
|
|
<DT><B>wn_mp.pl</B> </DT> |
|
<DD>part meronym |
|
pointers </DD> |
|
|
|
<DT><B>wn_cs.pl</B> </DT> |
|
<DD>cause pointers </DD> |
|
|
|
<DT><B>wn_vgp.pl</B> </DT> |
|
<DD>grouped verb pointers </DD> |
|
|
|
<DT><B>wn_at.pl</B> |
|
</DT> |
|
<DD>attribute pointers </DD> |
|
|
|
<DT><B>wn_ant.pl</B> </DT> |
|
<DD>antonym pointers </DD> |
|
|
|
<DT><B>wn_sa.pl</B> </DT> |
|
<DD>see also pointers |
|
</DD> |
|
|
|
<DT><B>wn_ppl.pl</B> </DT> |
|
<DD>participle pointers </DD> |
|
|
|
<DT><B>wn_per.pl</B> </DT> |
|
<DD>pertainym pointers </DD> |
|
|
|
<DT><B>wn_fr.pl</B> </DT> |
|
<DD>frame |
|
pointers </DD> |
|
</DL> |
|
|
|
<H2><A NAME="sect9" HREF="#toc9">SEE ALSO </A></H2> |
|
<B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A> |
|
, <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A> |
|
, <B><A HREF="wngroups.7WN.html">wngroups</B>(7WN)</A> |
|
, <B><A HREF="wnpkgs.7WN.html">wnpkgs</B>(7WN)</A> |
|
. |
|
<P> |
|
|
|
<HR><P> |
|
<A NAME="toc"><B>Table of Contents</B></A><P> |
|
<UL> |
|
<LI><A NAME="toc0" HREF="#sect0">NAME</A></LI> |
|
<LI><A NAME="toc1" HREF="#sect1">DESCRIPTION</A></LI> |
|
<UL> |
|
<LI><A NAME="toc2" HREF="#sect2">File Format</A></LI> |
|
<LI><A NAME="toc3" HREF="#sect3">Operators</A></LI> |
|
<LI><A NAME="toc4" HREF="#sect4">Field Definitions</A></LI> |
|
</UL> |
|
<LI><A NAME="toc5" HREF="#sect5">NOTES</A></LI> |
|
<LI><A NAME="toc6" HREF="#sect6">ENVIRONMENT VARIABLES (UNIX)</A></LI> |
|
<LI><A NAME="toc7" HREF="#sect7">REGISTRY (WINDOWS)</A></LI> |
|
<LI><A NAME="toc8" HREF="#sect8">FILES</A></LI> |
|
<LI><A NAME="toc9" HREF="#sect9">SEE ALSO</A></LI> |
|
</UL> |
|
</BODY></HTML> |
|
|