Information extraction

Information extraction is the process of extracting specific (pre-specified) information from textual sources.
One of the most trivial examples is when your email extracts only the data from the message for you
to add in your Calendar.
Gathering detailed structured data from texts, information extraction enables:

The automation of tasks such as smart content classification, integrated search, management and delivery;
Data-driven activities such as mining for patterns and trends, uncovering hidden relationships, etc.

How Does Information Extraction Work?

typically, for structured information to be extracted from unstructured texts, the following main subtasks are involved:

Pre-processing of the text – this is where the text is prepared for processing with the help of computational
linguistics tools such as tokenization, sentence splitting, morphological analysis, etc.
Finding and classifying concepts – this is where mentions of people, things, locations, events and other pre-specified
types of concepts are detected and classified.
Connecting the concepts – this is the task of identifying relationships between the extracted concepts.
Unifying – this subtask is about presenting the extracted data into a standard form.
Getting rid of the noise – this subtask involves eliminating duplicate data.
Enriching your knowledge base – this is where the extracted knowledge is ingested in your database for further use.

Typical Information Extraction Applications

Information extraction can be applied to a wide range of textual sources: from emails and Web pages to reports, presentations,
legal documents and scientific papers. The technology successfully solves challenges related to content management and knowledge
discovery in the areas of:

Business intelligence (for enabling analysts to gather structured information from multiple sources);
Financial investigation (for analysis and discovery of hidden relationships);
Scientific research (for automated references discovery or relevant papers suggestion);
Media monitoring (for mentions of companies, brands, people);
Healthcare records management (for structuring and summarizing patients records);
Pharma research (for drug discovery, adverse effects discovery and clinical trials automated analysis).

Coding Club Of Competitive Programmers

Search This Blog

Information extraction

How Does Information Extraction Work?

Typical Information Extraction Applications

Comments

Post a Comment

Popular posts from this blog

Write a code simulating ARP /RARP protocols

Write a JSP which insert the details of the 3 or 4 users who register with the web site by using registration form. Authenticate the user when he submits the login form using the user name and password from the database

Create a socket for HTTP for web page upload and download