Requirements for Training and Evaluation Dataset of Network and Host Intrusion Detection System

In the cyber domain, situational awareness of the critical assets is extremely important. For achieving comprehensive situational awareness, accurate sensor information is required. An important branch of sensors are Intrusion Detection Systems (IDS), especially anomaly based intrusion detection systems applying artificial intelligence or machine learning for anomaly detection.

This millennium has seen the transformation of industries due to the developments in data based modelling methods. The most crucial bottleneck for modelling the IDS is the absence of publicly available datasets compliant to modern equipment, system design standards and cyber threat landscape. The predominant dataset, the KDD Cup 1999, is still actively used in IDS modelling research despite the expressed criticism. Other, more recent datasets, tend to record data only either from the perimeters of the testbed environment’s network traffic or from the effects that malware has on a single host machine.

Our study focuses on forming a set of requirements for a holistic Network and Host Intrusion Detection System (NHIDS) dataset by reviewing existing and studied datasets within the field of IDS modelling. As a result, the requirements for state-of-the-art NHIDS dataset are presented to be utilised for research and development of NHIDS applying machine learning and artificial intelligence.


Petteri Nevavuori, Tero Kokkonen

Cite as

Nevavuori P., Kokkonen T. (2019) Requirements for Training and Evaluation Dataset of Network and Host Intrusion Detection System. In: Rocha Á., Adeli H., Reis L., Costanzo S. (eds) New Knowledge in Information Systems and Technologies. WorldCIST’19 2019. Advances in Intelligent Systems and Computing, vol 931. Springer, Cham