Deduplication of entities based on Natural Language Processing

Project: 18007 DEFRAUDify
Type: New product
Description: In the financial industry screenings of clients (a “know your customer” or KYC process) is required to verify identities and prevent risks. OSINT is an inevitable part of this process, which is often still performed manually. Automation is needed to eliminate the manual handling and reduce false positive results. NLP analysis based on named entity recognition in combination with relation extraction can be used to search for natural persons and their attributes in unstructured text.
Contact: Kit Buurman
Email: kit.buurman@tno.nl
Research area(s): Natural Language Processing
Technical features: The NLP identification tool requires two inputs: 1) a name of the subject-of-interest and 2) a set of web page documents, such as an articles, blog posts or social media pages. The tool consists of four modules. First, the tool reads and prepares the data, then it generates a list of triples per document (subject, relation, object), for example (James, lives in, New York), where “James” was provided as the subject-of-interest. Based on the returned relation triples per document, a graph is created. These documents graphs are embedded and their similarity is compared to identify the various persons mentioned in the documents. A set of clusters is returned, where each cluster represents a single person. Based on these clusters, irrelevant documents can be filtered and information on the person of interest can be found more quickly.
Integration constraints: These can be tailored upon needs.
Targeted customer(s): Any organization dealing with unstructured data (text) that need to identify entities that are described in those inputs.
Conditions for reuse: License conditions will be agreed for specific situations
Confidentiality: Public
Publication date: 30-09-2023
Involved partners: TNO (NLD)

Deduplication of entities based on Natural Language Processing

Images