John A. Carroll

Current and Past Research Projects and Workshops


Current projects

  • The Ergonomics of Electronic Patient Records: an interdisciplinary development of methodologies for understanding and exploiting free text to enhance the utility of primary care electronic patient records (Wellcome Trust).

  • Ranking Word Senses for Disambiguation: Models and Applications is concerned with developing ways of estimating the frequency distributions of senses of words from raw (unannotated) text (EPSRC).

  • Part of the DELPH-IN (Deep Linguistic Processing with HPSG) collaboration, and affiliated with the LOGON machine translation project in Norway.


Past projects

  • COGENT: Controlled Generation of Text is investigating wide-coverage generation and developing reflective techniques for controlling it effectively. As well as furthering the understanding of wide-coverage generation, the project will deliver a substantial and novel resource to support future research in this area, and practical implementations of wide-coverage controllable generators (EPSRC).

  • MEANING - Developing Multilingual Web-scale Language Technologies: collecting and analysing language data from the WWW on a large scale, building more comprehensive multilingual lexical knowledge bases to support improved word sense disambiguation (EU 5th Framework).

  • DEEP THOUGHT - Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction is concerned with devising methods for combining robust shallow methods for language analysis with deep semantic processing. The approach will be demonstrated in business intelligence, automated email processing and document production support applications (EU 5th Framework).

  • Robust Accurate Statistical Parsing (RASP): integrating and extending several strands of research on robust statistical parsing and automated grammar and lexicon induction, to produce a new parsing toolkit (EPSRC).

  • PSET: Practical Simplification of English Text: building a computer system which takes in English newspaper text across the WWW, and outputs a simplified version with broadly similar meaning (with, for example, uncommon or unusual words replaced with more common or familiar synonyms, and difficult to follow syntactic constructs replaced with simpler ones); the system will be evaluated with people suffering from aphasia which impairs their comprehension of written English (EPSRC).

  • LEXSYS: Analysis of Naturally-occurring English Text with Stochastic Lexicalized Grammars: developing a robust wide-coverage parsing system for English text, exploiting a combination of: statistical techniques involving online corpora; inheritance hierarchies for imposing structure on NLP data; and lexicalised grammars (EPSRC).

  • SPARKLE (Shallow PARsing and Knowledge extraction for Language Engineering): developing shallow parsing technology in 4 European languages together with corpus-based lexical acquisition techniques, and deploying parsers in multilingual information retrieval and speech dialogue systems (EU 4th Framework).

  • ILD (Integrated Language Database): producing a prototype system for rapid and efficient development of multilingual language dictionaries from corpus data (DTI/EPSRC under the SALT programme, at Cambridge University).


Workshops


Back to John A. Carroll home page