Morphological and Orthographic Tools for English
Tools for inflectional morphological analysis and generation, and for
determining the orthography of the indefinite article are now available.
The tools are
-
morpha
-
a fast and robust morphological analyser for English based on finite-state
techniques that returns the lemma and inflection type of a word, given the
word form and its part of speech. (The latter is optional but accuracy is
degraded if it is not present).
-
morphg
-
generates a word form given a specification of the lemma, part-of-speech,
and the type of inflection required. Morphg is derived automatically from
morpha, ensuring consistency and reversability of the tools.
An option controls British English or American English behaviour with
respect to consonant doubling.
-
ana
-
postprocesses text to insert the correct form of the indefinite article
(i.e. a or an). Ana encodes a set of rules keying off the
pronunciation of the next word (so an is produced if the following
word starts with a vowel sound, and a otherwise). The tool handles
plain text, part of speech-tagged text and SGML among other possible formats.
The tools are implemented using widely-available unix utilities, and are
free for research purposes; for any proposed commercial use please contact
John Carroll. Also, send an email if you would like to be notified of
new releases. New features in the works include derivational morphology
for deverbal nouns, and comparative and superlative forms of adjectives.
Recent changes:
|
September 2003: new version of morpha/g with pre-built binaries for
Linux, Solaris and Mac OS X, and a few classes of misanalysis fixed.
|
A description of the tools is published in
|
Minnen, G., J. Carroll and D. Pearce (2001) `Applied
morphological processing of English', Natural Language
Engineering, 7(3). 207-223.
Minnen, G., J. Carroll and D. Pearce (2000) `Robust, applied
morphological generation'. In Proceedings of the 1st International
Natural
Language Generation Conference, Mitzpe Ramon, Israel. 201-208.
|
Please refer to one of these papers when describing any research using the
tools.
The tools were produced as part of the UK EPSRC-funded PSET
project and are being further developed on the RASP project.
Back to John
A. Carroll's homepage