seonglae
/

openie5

Model card Files Files and versions Community

openie5 / WordNet-3.0 /doc /html /senseidx.5WN.html

seonglae

feat: wordnet 3.0 added for standalone

cb1c1cb almost 2 years ago

raw

history blame contribute delete

8.54 kB

	<!-- manual page source format generated by PolyglotMan v3.0.3a12, -->
	<!-- available via anonymous ftp from ftp.cs.berkeley.edu:/ucb/people/phelps/tcltk/rman.tar.Z -->

	<HTML>
	<HEAD>
	<TITLE>SENSEIDX(5WN) manual page</TITLE>
	</HEAD>
	<BODY>
	<A HREF="#toc">Table of Contents</A><P>

	<H2><A NAME="sect0" HREF="#toc0">NAME </A></H2>
	index.sense, sense.idx - WordNet's sense index
	<H2><A NAME="sect1" HREF="#toc1">DESCRIPTION </A></H2>
	The WordNet
	sense index provides an alternate method for accessing synsets and word
	senses in the WordNet database. It is useful to applications that retrieve
	synsets or other information related to a specific sense in WordNet, rather
	than all the senses of a word or collocation. It can also be used with
	tools like <B>grep </B> and Perl to find all senses of a word in one or more
	parts of speech. A specific WordNet sense, encoded as a <I>sense_key </I>, can
	be used as an index into this file to obtain its WordNet sense number,
	the database byte offset of the synset containing the sense, and the number
	of times it has been tagged in the semantic concordance texts. <P>
	Concatenating
	the <I>lemma </I> and <I>lex_sense </I> fields of a semantically tagged word (represented
	in a <B><wf  </B>... <B>> </B> attribute/value pair) in a semantic concordance file, using
	<B>% </B> as the concatenation character, creates the <I>sense_key </I> for that sense,
	which can in turn be used to search the sense index file. <P>
	A <I>sense_key
	</I> is the best way to represent a sense in semantic tagging or other systems
	that refer to WordNet senses. <I>sense_key </I>s are independent of WordNet sense
	numbers and <I>synset_offset </I>s, which vary between versions of the database.
	Using the sense index and a <I>sense_key </I>, the corresponding synset (via
	the <I>synset_offset </I>) and WordNet sense number can easily be obtained. A
	mapping from noun <I>sense_key </I>s in WordNet 1.6 to corresponding 2.0 <I>sense_key
	</I>s is provided with version 2.0, and is described in <B><A HREF="sensemap.5WN.html">sensemap</B>(5WN)</A>
	. <P>
	See
	<B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
	for a thorough discussion of the WordNet database files.
	<H3><A NAME="sect2" HREF="#toc2">File
	Format </A></H3>
	The sense index file lists all of the senses in the WordNet database
	with each line representing one sense. The file is in alphabetical order,
	fields are separated by one space, and each line is terminated with a
	newline character. <P>
	Each line is of the form: <P>
	<blockquote><I>sense_key  synset_offset  sense_number  tag_cnt
	</I> </blockquote>
	<P>
	<I>sense_key </I> is an encoding of the word sense. Programs can construct
	a sense key in this format and use it as a binary search key into the
	sense index file. The format of a <I>sense_key </I> is described below. <P>
	<I>synset_offset
	</I> is the byte offset that the synset containing the sense is found at in
	the database "data" file corresponding to the part of speech encoded in
	the <I>sense_key </I>. <I>synset_offset </I> is an 8 digit, zero-filled decimal integer,
	and can be used with <B><A HREF="fseek.3.html">fseek</B>(3)</A>
	to read a synset from the data file. When
	passed to the WordNet library function <B>read_synset() </B> along with the syntactic
	category, a data structure containing the parsed synset is returned. <P>
	<I>sense_number
	</I> is a decimal integer indicating the sense number of the word, within
	the part of speech encoded in <I>sense_key </I>, in the WordNet database. See
	<B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
	for information about how sense numbers are assigned. <P>
	<I>tag_cnt
	</I> represents the decimal number of times the sense is tagged in various
	semantic concordance texts. A <I>tag_cnt </I> of <B>0 </B> indicates that the sense
	has not been semantically tagged.
	<H3><A NAME="sect3" HREF="#toc3">Sense Key Encoding </A></H3>
	A <I>sense_key </I> is represented
	as: <P>
	<blockquote><I>lemma </I><B>% </B><I>lex_sense </I> </blockquote>
	<P>
	where <I>lex_sense </I> is encoded as: <P>
	<blockquote><I>ss_type<B>:<I>lex_filenum<B>:<I>lex_id<B>:<I>head_word<B>:<I>head_id
	</I></B></I></B></I></B></I></B></I> </blockquote>
	<P>
	<I>lemma </I> is the ASCII text of the word or collocation as found in the
	WordNet database index file corresponding to <I>pos </I>. <I>lemma </I> is in lower case,
	and collocations are formed by joining individual words with an underscore
	(<B>_ </B>) character. <P>
	<I>ss_type </I> is a one digit decimal integer representing the
	synset type for the sense. See <FONT SIZE=-1><B>Synset Type </B></FONT>
	below for a listing of the
	numbers corresponding to each synset type. <P>
	<I>lex_filenum </I> is a two digit
	decimal integer representing the name of the lexicographer file containing
	the synset for the sense. See <B><A HREF="lexnames.5WN.html">lexnames</B>(5WN)</A>
	for the list of lexicographer
	file names and their corresponding numbers. <P>
	<I>lex_id </I> is a two digit decimal
	integer that, when appended onto <I>lemma </I>, uniquely identifies a sense within
	a lexicographer file. <I>lex_id </I> numbers usually start with <B>00 </B>, and are incremented
	as additional senses of the word are added to the same file, although
	there is no requirement that the numbers be consecutive or begin with
	<B>00 </B>. Note that a value of <B>00 </B> is the default, and therefore is not present
	in lexicographer files. Only non-default <I>lex_id </I> values must be explicitly
	assigned in lexicographer files. See <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A>
	for information on the
	format of lexicographer files. <P>
	<I>head_word </I> is only present if the sense
	is in an adjective satellite synset. It is the lemma of the first word
	of the satellite's head synset. <P>
	<I>head_id </I> is a two digit decimal integer
	that, when appended onto <I>head_word </I>, uniquely identifies the sense of
	<I>head_word </I> within a lexicographer file, as described for <I>lex_id </I>. There
	is a value in this field only if <I>head_word </I> is present.
	<H3><A NAME="sect4" HREF="#toc4">Synset Type </A></H3>
	The
	synset type is encoded as follows: <P>
	<blockquote><B>1 </B><tt> </tt> <tt> </tt> NOUN <BR>
	<B>2 </B><tt> </tt> <tt> </tt> VERB <BR>
	<B>3 </B><tt> </tt> <tt> </tt> ADJECTIVE <BR>
	<B>4 </B><tt> </tt> <tt> </tt> ADVERB
	<BR>
	<B>5 </B><tt> </tt> <tt> </tt> ADJECTIVE SATELLITE <BR>
	</blockquote>

	<H2><A NAME="sect5" HREF="#toc5">NOTES </A></H2>
	For non-satellite senses the <I>head_word
	</I> and <I>head_id </I> fields have no values, however the field separator character
	(<B>: </B>) is present.
	<H2><A NAME="sect6" HREF="#toc6">ENVIRONMENT VARIABLES (UNIX) </A></H2>

	<DL>

	<DT><B>WNHOME</B> </DT>
	<DD>Base directory
	for WordNet. Default is <B>/usr/local/WordNet-3.0 </B>. </DD>

	<DT><B>WNSEARCHDIR</B> </DT>
	<DD>Directory in
	which the WordNet database has been installed. Default is <B>WNHOME/dict
	</B>. </DD>
	</DL>

	<H2><A NAME="sect7" HREF="#toc7">REGISTRY (WINDOWS) </A></H2>

	<DL>

	<DT><B>HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome</B> </DT>
	<DD>Base directory
	for WordNet. Default is <B>C:\Program Files\WordNet\3.0 </B>. </DD>
	</DL>

	<H2><A NAME="sect8" HREF="#toc8">FILES </A></H2>

	<DL>

	<DT><B>index.sense</B> </DT>
	<DD>sense
	index </DD>
	</DL>

	<H2><A NAME="sect9" HREF="#toc9">SEE ALSO </A></H2>
	<B><A HREF="binsrch.3WN.html">binsrch</B>(3WN)</A>
	, <B><A HREF="wnsearch.3WN.html">wnsearch</B>(3WN)</A>
	, <B><A HREF="lexnames.5WN.html">lexnames</B>(5WN)</A>
	, <B><A HREF="wnintro.5WN.html">wnintro</B>(5WN)</A>
	,
	<B><A HREF="sensemap.5WN.html">sensemap</B>(5WN)</A>
	, <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
	, <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A>
	. <P>

	<HR><P>
	<A NAME="toc"><B>Table of Contents</B></A><P>
	<UL>
	<LI><A NAME="toc0" HREF="#sect0">NAME</A></LI>
	<LI><A NAME="toc1" HREF="#sect1">DESCRIPTION</A></LI>
	<UL>
	<LI><A NAME="toc2" HREF="#sect2">File Format</A></LI>
	<LI><A NAME="toc3" HREF="#sect3">Sense Key Encoding</A></LI>
	<LI><A NAME="toc4" HREF="#sect4">Synset Type</A></LI>
	</UL>
	<LI><A NAME="toc5" HREF="#sect5">NOTES</A></LI>
	<LI><A NAME="toc6" HREF="#sect6">ENVIRONMENT VARIABLES (UNIX)</A></LI>
	<LI><A NAME="toc7" HREF="#sect7">REGISTRY (WINDOWS)</A></LI>
	<LI><A NAME="toc8" HREF="#sect8">FILES</A></LI>
	<LI><A NAME="toc9" HREF="#sect9">SEE ALSO</A></LI>
	</UL>
	</BODY></HTML>