Attensity’s Entity Extraction automatically identifies and extracts key entities from any text data source, in multiple languages, with no setup or manual creation of rules required.
Attensity Entity Extraction automatically identifies and extracts more than 35 key entities – such as people, dates, places, companies or other things – from any text data source, in multiple languages. This ability to automatically identify and classify relevant entities makes Attensity one of the most powerful text analysis and extraction tools on the market. Using Attensity, developers can maximize and extend the value of their applications by enabling end-users to quickly find the most important pieces of information within large volumes of documents.
And, by combining Entity Extraction with Attensity Triples, you get the best of both worlds – “nouns and verbs” for complete automated extraction of entities, relations, and events, and allowing you to use contextual information to disambiguate entities.
Attensity can be integrated into virtually any application that processes textual information, enabling users to create relevant, meaningful structured data from unstructured data to mine large volumes of text for relevant information and quickly identify trends in data sets, including monitoring trends and movements associated with people, places, dates, organizations, etc.
Extraction and Classification
Attensity leverages a core understanding of natural language processing – language-aware tokenization, part-of-speech tagging and noun phrase identification – to automatically extract and classify all entities.
Variant Identification and Grouping
Variant identification and grouping allow Attensity to accurately classify all relevant entities in a document, even one-word entities, and to provide true counts reflecting the number and location of ALL appearances of a given entity. For example, Attensity recognizes that the appearance of the word “Smith” in the example below refers to the earlier identified person “Joe Smith.”
Normalization takes much of the guesswork out of metadata creation, search, data mining and link analysis processes by creating standard formats (e.g., ISO) for certain entity categories such as dates or measurements.
The entities extracted by Attensity are given relevance scores reflecting their importance to the document as a whole, making Attensity an essential part of any data categorization solution.
Arabic, Bokmål, Catalan, Croatian, Czech, Danish, Dutch, English, Farsi, Finnish, French, German, Italian, Japanese, Korean, Nynorsk, Portuguese, Russian, Serbian, Simplified Chinese, Slovak, Slovenian, Spanish, Swedish and Traditional Chinese.
Entity Types Supported
Pre-defined entity types vary by language module. For example, the English language module includes:
ADDRESS, CITY, CONTINENT, COUNTRY, CURRENCY, DATE, DAY, DISTRICT, FACILITY, FEDERATION, HOLIDAY, MEASURE, MONTH, NOUN_GROUP, ORGANIZATION, PEOPLE, PERCENT, PERSON, PHONE, PLACE_OTHER, PLACE_REGION, POSITION PRODUCT, PROP_MISC, SPECIAL, SSN, STATE_PROVINCE, TICKER, TIME, TIME_PERIOD, URI, and YEAR. Sub-entities and sub-types are supported for ADDRESS, CITY, DATE, FACILITY, ORGANIZATION, PLACE_OTHER, PLACE_REGION, URI, COMMON_FACILITY, COMMON_ORGANIZATION, COMMON_PERSON, COMMON_PLACE_OTHER, and COMMON_PLACE_REGION