ITEA is the Eureka Cluster on software innovation
ITEA is the Eureka Cluster on software innovation
Please note that the ITEA Office will be closed from 25 December 2024 to 1 January 2025 inclusive.
ITEA 4 page header azure circular

Deduplication of entities based on Natural Language Processing

Project
18007 DEFRAUDify
Type
New product
Description

In the financial industry screenings of clients (a “know your customer” or KYC process) is required to verify identities and prevent risks. OSINT is an inevitable part of this process, which is often still performed manually. Automation is needed to eliminate the manual handling and reduce false positive results. NLP analysis based on named entity recognition in combination with relation extraction can be used to search for natural persons and their attributes in unstructured text.

Contact
Kit Buurman
Email
kit.buurman@tno.nl
Research area(s)
Natural Language Processing
Technical features

The NLP identification tool requires two inputs: 1) a name of the subject-of-interest and 2) a set of web page documents, such as an articles, blog posts or social media pages. The tool consists of four modules. First, the tool reads and prepares the data, then it generates a list of triples per document (subject, relation, object), for example (James, lives in, New York), where “James” was provided as the subject-of-interest. Based on the returned relation triples per document, a graph is created. These documents graphs are embedded and their similarity is compared to identify the various persons mentioned in the documents. A set of clusters is returned, where each cluster represents a single person. Based on these clusters, irrelevant documents can be filtered and information on the person of interest can be found more quickly.

Integration constraints

These can be tailored upon needs.

Targeted customer(s)

Any organization dealing with unstructured data (text) that need to identify entities that are described in those inputs.

Conditions for reuse

License conditions will be agreed for specific situations

Confidentiality
Public
Publication date
30-09-2023
Involved partners
TNO (NLD)

Images