Oepen, S. and J. Carroll (2000) `Parser engineering and performance profiling', Natural Language Engineering, 6(1). 81-97.
We describe and argue for a strategy of performance profiling and
comparison in the engineering of parsing systems for wide-coverage
linguistic grammars.
A performance profile is a precise, rich, and structured snapshot of
system (and grammar) behaviour at a given development point.
The aim is to characterize system performance at a very detailed
technical level, but at the same time to abstract away from
idiosyncracies of particular processors.
Profiles are obtained with minimal effort by applying a specialized
profiling tool to a set of structured reference data (taken from both
existing test suites and corpora), in conjunction with a uniform format
for test data and processing results.
The resulting profiles can be analyzed and visualized at various
levels of granularity in order to highlight different aspects of system
performance, thus providing a solid empirical basis for system
refinement and optimization.
Since profiles are stored in a database, comparison with earlier
versions, different parameter settings, or other processing systems is
straightforward.
We apply several salient performance metrics in a contrastive
discussion of various (one-pass, bottom-up, chart-based) parsing
strategies (viz. passive vs. active and uni- vs. bidirectional
approaches).
Based on insights gained from detailed performance profiles, we outline
and evaluate a novel `hyper-active' parsing strategy.
We also present preliminary profiles for techniques for `packing' of
local ambiguities with respect to (partial) subsumption of feature
structures.
Article published in Natural Language Engineering. Download pre-final pdf version.