Carroll, J. and D. Weir (1997) `Encoding frequency information in lexicalized grammars'. In Proceedings of the 5th ACL/SIGPARSE International Workshop on Parsing Technologies, MIT, Cambridge, MA. 8-17.
Revised version in H. Bunt and A. Nijholt (eds.), Advances in Probabilistic and Other Parsing Technologies, Dordrecht: Kluwer, 2000. 13-28.
Further revised version published as `Encoding frequency information in stochastic parsing models', in R. Bod, R. Scha and K. Sima'an (eds.), Data-Oriented Parsing, CSLI Press, 2002.

We address the issue of how to associate frequency information with lexicalized grammar formalisms, using Lexicalized Tree Adjoining Grammar as a representative framework. We consider systematically a number of alternative probabilistic frameworks, evaluating their adequacy from both a theoretical and empirical perspective using data from existing large treebanks. We also propose three orthogonal approaches for backing off probability estimates to cope with the large number of parameters involved.

Download from arXiv.org e-Print archive.

[Back]