Briscoe, E. and J. Carroll (1995) `Developing and evaluating a probabilistic LR parser of part-of-speech and punctuation labels'. In Proceedings of the 4th ACL/SIGPARSE International Workshop on Parsing Technologies, Prague, Czech Republic. 48-58.
Reprinted in G. Sampson and D. McCarthy (eds.), Corpus Linguistics: Readings in a Widening Discipline, Continuum, 2004. 267-275.
Extended version published as `A probabilistic LR parser of part-of-speech and punctuation labels', in J. Thomas and M. Short (eds.), Using Corpora for Language Research, Longman, 1996. 135-150.

We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.

Download from arXiv.org e-Print archive.

[Back]