Semantic Analysis and Information Extraction

project

Using document annotation and n-gram analysis to extract, link, and retrieve information from corpora. This has been used by a variety of clients, particularly to condense, summarise, and index large volumes of reports.

Among other tools, we’ve particularly enjoyed using GATE.

Achievements include:

  • Large scale extraction of information (200+ documents collected over 5 years)
  • Derivation of an index of issues with annotations in context for future reference
  • Quantitative analysis of term and issue frequency

analysis information-extraction text-analysis semantic-analysis