ITEA is the Eureka Cluster on software innovation
ITEA is the Eureka Cluster on software innovation
ITEA 4 page header azure circular

IoT Datasets

Project
20020 ENTA
Type
New library
Description

Realistic datasets that are properly labelled are needed to develop/train/validate/test AI models. Considerate amount of time and effort is needed to generate such datasets. Making available more such datasets will expedite the development of the IoT analytics. Moreover, they could also be used for cross-validation for solutions that have been developed with disparate datasets. Five labelled IoT datasets have been generated for use in training and testing AI-based IoT device discovery and threat detection models.

Contact
Biswajit Nandy
Email
bnandy@solananetworks.com
Research area(s)
Datasets for use in developing/training/validating/testing AI models for analyzing encrypted IoT traffic for various purposes such as IoT discovery and IoT behaviour analytics.
Technical features

Multiple datasets are being made available. They are realistically generated PCAPs covering various different IoT device types and operating conditions. The datasets cover consumer and industrial IoT devices. They can be used to train/test/evaluate models: to identify IoT device types and operating states (power, idle, active); to differentiate IoT and non-IoT devices, security state (normal or attack); and to identify attack types.

  1. NIMS BENIGN DATASET 2024-2 dataset comprises data captured from Consumer IoT devices, depicting three primary real-life states (Power-up, Idle, and Active) experienced by everyday users. By Dec 2024, the dataset has been accessed by 71+ users. (https://ieee-dataport.org/documents/nims-benign-dataset-2024-2)

  2. DALHOUSIE NIMS LAB IOT 2024 DATASET, generated under use case two, is available on the lEEE DataPort open library. This dataset presents real-world IoT device traffic captured under a scenario termed "Active," reflecting typical usage patterns encountered by everyday users. By Dec 2024, the dataset has been accessed by 630+ users. (https://ieee-dataport.org/documents/dalhousie-nims-lab-iot-2024-dataset)

  3. https://zenodo.org/records/14802737 This Encrypted Network Traffic Analysis: IoT/Non-IoT and cyberattack dataset consists of PCAP (Packet Capture) files containing network traffic from IoT devices. The traffic may include normal activity as well as cyberattacks originating from one or more IoT sensors. The dataset is designed to support research in network security, intrusion detection, and anomaly detection in IoT environments

Integration constraints

Ability to read PCAP files in standard format for use in AI model development.

Targeted customer(s)

AI models developers/researchers for IoT traffic analysis

Conditions for reuse

Dalhousie, MTP, and Solana Networks will make available the datasets during and after the end of the ENTA project.

Confidentiality
Public
Publication date
16-12-2024
Involved partners
Solana Networks (CAN)
Metodos y Tecnologia (ESP)
Dalhousie University (CAN)