ITEA 4 page header azure circular

Methodology of evaluating duplicate question detection for a GenAI based chatbot service

Project
21016 DAIsy
Type
Enhancement
Description

The method assesses question caching by matching new queries to a question database to determine whether a validated answer can be safely reused. Duplicate detection is strict, requiring exact intent equivalence; any change in intent or detail is treated as a non-match to protect patient safety. Evaluation uses a synthetic dataset from existing questions, with positive and negative duplicate candidates generated via multiple methods. Performance is measured with MRR, Hit@K, and precision/recall, plus review to judge retrieval quality and reliability, noting label subjectivity as noise.

Contact
Martijn Krans
Email
martijn.krans@philips.com
Research area(s)
Generative AI and LLMs
Technical features

The system uses a two-stage pipeline. A bi-encoder–based semantic search using all-MiniLM-L6-v2 retrieves candidate questions by computing cosine similarity between embeddings. Final duplicate detection is performed via an LLM prompt, which assigns a discrete score (1–5) to a pair of questions. Only exact matches (score 5) are accepted for answer reuse. In our internal evaluation, this prompt-based approach outperforms fixed cosine-similarity-threshold baselines, which is particularly important from a safety standpoint.

Integration constraints

Solutions that use LLMs

Targeted customer(s)

Philips and any industry developing chatbots

Conditions for reuse

Originally to be used internally, licensing to be considered

Confidentiality
Public
Publication date
27-01-2026
Involved partners
Philips Electronics Nederland BV (NLD)