A simple parsing problem

We will begin this chapter by considering a very simple problem; namely, how we might set about parsing the English sentence 'MediCenter employed nurses' given only Grammar1 from Chapter 4. The problem is simple for two reasons:
(1) The example sentence contains none of the phenomena that make parsing written English less than straightforward.
(2) By choosing an example from written English, we get a head start, as the datum comes to us partially preparsed. Let us briefly consider this second point. Written English employs a convention whereby words are set off from each other by spaces.

But spoken English is not like that - words are not, in general, separated by silences, the obvious temporal analogue of white space on paper. Nor are their boundaries otherwise marked. So, for spoken English, indeed the spoken form of any natural language, the problem is significantly harder: the written analogue would be parsing 'M e d i C e n t r e e m p l o y e d n u r s e s'. Now the parser not only has to do what our parser will have to do, but also determine the appropriate word boundaries in the string. If that does not seem intuitively very difficult, then consider 'T h e n e a r l y i n g o n e s e r r e d'. Actually, the problem with speech is even worse, since one effect of running spoken words together is to change or even omit the sounds that appear word-peripherally. The parsing of speech is, thankfully, a topic that falls outside the remit of this book. But the problem of a parser having to determine the position of syntactically relevant boundaries in the absence of 'white spaces' does not simply go away when we turn our attention from speech back to the written form. Consider the following correctly written Turkish sentence:


c,o:plu:klerimizdekilerdenmiydi

which translates into English as 'Was it from those that were in our garbage cans?' Clearly, a parser for written Turkish is going to have to determine the boundaries of those syntactic and semantic elements whose counterparts in written English would be words set off with white space.

With these matters in mind, let us now turn back to parsing our three-word example. For convenience, we repeat Grammar1 in its entirety here.




Rule {simple sentence formation} S -> NP VP. Rule {transitive verb} VP -> V NP. Rule {intransitive verb} VP -> V. Word Dr. Chan: <cat> = NP. Word nurses: <cat> = NP. Word MediCenter: <cat> = NP. Word patients: <cat> = NP. Word died: <cat> = V. Word employed: <cat> = V.

Exercise 5.1

Send us a comment.



[Contents] [Previous] [Next]
This document was translated by troff2html v0.21 on October 22, 1996.