Thursday 3 March 2022

Information extraction

 

  1. Information extraction is the process of extracting specific (pre-specified) information from textual sources.
    One of the most trivial examples is when your email extracts only the data from the message for you
    to add in your Calendar.

  2. Gathering detailed structured data from texts, information extraction enables:


  • The automation of tasks such as smart content classification, integrated search, management and delivery;

  • Data-driven activities such as mining for patterns and trends, uncovering hidden relationships, etc.



How Does Information Extraction Work?

typically, for structured information to be extracted from unstructured texts, the following main subtasks are involved:

  • Pre-processing of the text – this is where the text is prepared for processing with the help of computational
    linguistics tools such as tokenization, sentence splitting, morphological analysis, etc.

  • Finding and classifying concepts – this is where mentions of people, things, locations, events and other pre-specified
    types of concepts are detected and classified.

  • Connecting the concepts – this is the task of identifying relationships between the extracted concepts.

  • Unifying – this subtask is about presenting the extracted data into a standard form.

  • Getting rid of the noise – this subtask involves eliminating duplicate data.

  • Enriching your knowledge base – this is where the extracted knowledge is ingested in your database for further use.

Typical Information Extraction Applications

Information extraction can be applied to a wide range of textual sources: from emails and Web pages to reports, presentations,
legal documents and scientific papers. The technology successfully solves challenges related to content management and knowledge
discovery in the areas of:

  • Business intelligence (for enabling analysts to gather structured information from multiple sources);

  • Financial investigation (for analysis and discovery of hidden relationships);

  • Scientific research (for automated references discovery or relevant papers suggestion);

  • Media monitoring (for mentions of companies, brands, people);

  • Healthcare records management (for structuring and summarizing patients records);

  • Pharma research (for drug discovery, adverse effects discovery and clinical trials automated analysis).

No comments:

Post a Comment

The Future of Web Development: Why Next.js is Going Viral

  Are you ready to level up your web development game? Look no further than Next.js, the latest sensation in the world of web development th...