What is the CF-KNN version of TDTC?

The Naïve Bayes classifier is used to classify anomalous behavior, which is then refined to normal instances using the Certainty-Factor version of KNearest Neighbor (CF-KNN).

What is the certainty factor in the classification module?

The certainty-factor similarity measure in the classification module is based on the distribution proportion of classes in the training dataset to resolve imbalance data set issue.

What is the performance of TDTC two dimension reduction module?

TDTC two dimension reduction module performance is an offline task, which is applied once to obtain the transform vectors for incoming samples.

How many new attack types are included in the training set?

Since the test set contains 17 new attack types not included in the training set, the authors can evaluate the effectiveness of TCTD in detecting unknown or uncommon attacks.

What is the correlation coefficient assessment of the final features?

TheCorrelation Coefficient assessments of the final features shows that the transferred features at two layers of dimension reduction module are mostly independent, since ρ=0.

What is the role of TDTC in detecting residual attacks?

TDTC also can be deployed as an auxiliary service for digital forensics in IoT ecosystem, such as those discussed in [56] to detect residual attack patterns of IoT network layer.>

What is the computational complexity of Nave Bayes classifier?

The computational complexity of Naïve Bayes classifier of the classification module is determined as 𝑂(𝑒 × 𝑓), where e is the count of samples in dataset and f represents number of features.

What is the way to determine which eigenvalues are more useful?

in TDTC, one may decide which eigenvalues are more useful; thus, the ideal feature mapping matrix 𝑊 can be concluded and used for linear transformation of training and test dataset.

What is the optimal projection matrix for the TDTC dataset?

The projection matrix 𝑊 is calculated to maximize SB – see Eq. 6, and minimize SW – see Eq. 7.SB= ∑ (μc - x̅)(μc - x̅) T c (Eq.6) SW= ∑ ∑ (xi - μc)(xi - μc)

How many features are used in TDTC?

Therefore at this level, due to LDA optimum transformation, the first classifier of TDTC is equipped with only four features instead of 35.

What is the result value of each feature?

The resultvalue of each feature is mapped into an integer number, to avoid any bias, as shown in Eq.13 for each continuousvalued 𝑧.

(Open Access) A Two-Layer Dimension Reduction and Two-Tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks (2019) | Hamed Haddad Pajouh

Q: What are the contributions mentioned in the paper "A two-layer dimension reduction and two- tier classification model for anomaly- based intrusion detection in iot backbone networks" ?

In this paper, the authors present a novel model for intrusion detection based on two-layer dimension reduction and two-tier classification module, designed to detect malicious activities such as User to Root ( U2R ) and Remote to Local ( R2L ) attacks.

Q: What are the future works mentioned in the paper "A two-layer dimension reduction and two- tier classification model for anomaly- based intrusion detection in iot backbone networks" ?

Future research includes exploring the potential of nonparametric methods such as dimension reduction module and fuzzy clustering to achieve a better classification against U2R, R2L and other attacks. Another interesting future work could be extension of the proposed model to detect intrusions at other layers of the IoT architecture such as application and support layers, as well as other protocols running in the network layer.

Q: What is the supervised anomaly detection approach?

The supervised anomaly detection approach in [25] leverages both distance measure and density of clusters for intrusion detection.

Q: What is the way to detect anomalous activity?

Existing intrusion detection and prevention models generally use statistical approaches [15] such as Hidden Markov Model (HMM) [15], Bayes theory [16], cluster analysis [17], signal processing [18] and distance measuring [19] to detect anomalous activities.

Q: What is the description of the model?

Despite a high accuracy rate in identifying normal behaviors and detecting simpler attacks such as DoS attacks and probe, the model performs poorly in detecting low frequency and distribution attacks such as R2L.

Q: Why is the dimension reduction module deployed to addresslimitations?

The dimension reduction module is deployed to addresslimitations due to dimensionality that may lead to makingwrong decisions while increasing computational complexity of the classifier.

A two-layer dimension reduction and two-

tier classification model for anomaly-

based intrusion detection in IoT backbone

networks

Haddad Pajouh, H, Javadian, R, Khayami, R, Dehghantanha, A and Choo, R

http://dx.doi.org/10.1109/TETC.2016.2633228

Title A two-layer dimension reduction and two-tier classification model for

anomaly-based intrusion detection in IoT backbone networks

Authors Haddad Pajouh, H, Javadian, R, Khayami, R, Dehghantanha, A and Choo,

Publication title IEEE Transactions on Emerging Topics in Computing

Publisher IEEE

Type Article

USIR URL This version is available at: http://usir.salford.ac.uk/id/eprint/40937/

Published Date 2019

USIR is a digital collection of the research output of the University of Salford. Where copyright

permits, full text material held in the repository is made freely available online and can be read,

downloaded and copied for non-commercial private study or research purposes. Please check the

manuscript for any further copyright restrictions.

For more information, including our policy and submission procedure, please

contact the Repository Team at: library-research@salford.ac.uk.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <



Abstract—With increasing reliance on Internet of Things (IoT)

devices and services, the capability to detect intrusions and

malicious activities within IoT networks is critical for resilience

of the network infrastructure. In this paper, we present a novel

model for intrusion detection based on two-layer dimension

reduction and two-tier classification module, designed to detect

malicious activities such as User to Root (U2R) and Remote to

Local (R2L) attacks. The proposed model is using component

analysis and linear discriminate analysis of dimension reduction

module to spate the high dimensional dataset to a lower one with

lesser features. We then apply a two-tier classification module

utilizing Naïve Bayes and Certainty Factor version of K-Nearest

Neighbor to identify suspicious behaviors. The experiment results

using NSL-KDD dataset shows that our model outperforms

previous models designed to detect U2R and R2L attacks.

Index Terms— Anomaly Detection, CF-KNN, Intrusion Detection

System, IoT, Multi-layer Classification

I. INTRODUCTION

nternet of Things (IoT) technologies are becoming

increasingly prevalent across different industry sectors

such as health care, personal and social domains, and smart

cities [1]. Similar to most consumer technologies, IoT

technologies are not designed with security in mind, which are

now emerging as a key barrier in the wider adoption of IoT

networks and services [2]. Intrusion detection is one of several

security mechanisms for managing security intrusions [3],

which can be detected in any of four layers of IoT architecture

shown in Figure 1 [4]. The Network layer not only serves as a

backbone for connecting different IoT devices, but also

provides opportunities for deploying network-based security

defense mechanisms such as Network Intrusion Detection

H. H. Pajouh, R. Javidan, and R. Khatami are with the Department of

Computer Engineering and Information, Technology, Shiraz University of

Technology, Iran (e-mail: hp@sutech.ac.ir, reza.javidan@sutech.ac.ir,

khayami@sutech.ac.ir).

A. Dehghantanha is with the School of Computing, Science and

Engineering, University of Salford, UK. (e-mail:

A.Dehghantanha@salford.ac.uk).

K-K R Choo, is with the Department of Information Systems and Cyber

Security, University of Texas at San Antonio, USA (e-mail:

raymond.choo@fulbrightmail.org).

Systems (NIDS) [5],[6],[7]. According to the analysis of

KDD99 [3] and its latter version NSL-KDD [9], malicious

behaviors (attacks) in network-based intrusions can be

classified into the following four main categories [7]:

 Probe: when an attacker seeks to only gain information

about the target network through network and host

scanning activities (i.e. ports scanning).

 DoS (denial of service): when an attacker interrupts

legitimate users’ access to the given service or

machine.

 U2R (User to Root): when an attacker attempts to

escalate a limited user’ privilege to a super user or root

access (e.g. via malware infection or stolen

credentials).

 R2L (Remote to Local): when an attacker gains remote

access to a victim machine imitating existing local

users.

User to Root (U2R) and Remote to Local (R2L) attacks are

among the most challenging attacks to detect as they mimick

normal users behavior [10] [11].

IDS are categorized into signature-based and anomaly-

based detection based on their technique in detecting an

intrusion [12]. Signature-based IDS relies on a set of pre-

defined malicious activates patterns and attack signatures to

detect intrusions while anomaly-based IDS relies on

deviations from normal behaviors to detect intrusions [6].

Signature-based IDSes generally outperform anomaly-based

IDSes in detecting previously known attacks, but the former is

ineffective against unknown or polymorphic attacks [13]. On

the other hand, anomaly-based IDSes are capable of detecting

unknown attacks in the absence of a predefined pattern. Due to

the diversity of devices deployed in IoT networks, it would be

unrealistic and impractical to rely on pre-defined attack

patterns for intrusion detection, which limits signature-based

IDS utilization in IoT networks [14].

In this paper, we present a network anomaly-based model

for intrusion detection, hereafter referred to as Two-layer

Dimension Reduction and Two-tier Classification (TDTC)

model.

A Two-layer Dimension Reduction and Two-tier

Classification Model for Anomaly-Based

Intrusion Detection in IoT Backbone Networks

Hamed HaddadPajouh, Reza Javidan, Raouf Khayami, Ali Dehghantanha and Kim-Kwang Raymond

Choo

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

Fig 1. IoT Network Security Architecture [4]

The proposed model, designed for anomaly-based intrusion

detection in IoT backbone networks, uses two-layer dimension

reduction and two-tier classification detection techniques to

detect “hard-to-detect” intrusions, such as U2R and R2L

attacks. We also demonstrate that the proposed model has the

following characteristics:

 Higher overall detection rates due to the deployment

of a multi-layer classifier

 Lower false positive due to deployment of a

refinement feature

 Accurate detection of U2R and R2L attacks, without

reducing performance

 Lower computational complexity due to deployment

of dimension reduction in the two layers.

In the next section, we present related work. The proposed

model is presented in Section 3, and evaluation of the model is

presented in Section 4. Section 5 concludes this paper and

outlines future research topics.

II. RELATED WORK

Existing intrusion detection and prevention models generally

use statistical approaches [15] such as Hidden Markov Model

(HMM) [15], Bayes theory [16], cluster analysis [17], signal

processing [18] and distance measuring [19] to detect

anomalous activities. Anomaly detection approaches can be

broadly categorized into supervised and unsupervised learning

[6]. In supervised anomaly detection approach, normal

behavior of a system or networks is constructed using a

labeled dataset [20]. Unsupervised technique assumes that

normal behaviors are more frequent and, thus, the model is

built based on this assumption; thus, no training data is

required [21].

Casas et al. [22] proposed an unsupervised NIDS based on

subspace clustering and outlier detection and demonstrated

that their approach performs well against unknown attacks. In

[23], a feature section filter module is proposed, which utilizes

Principal Component Analysis and Fisher Dimension

Reduction to filter noises. In the approach, Self-Organizing

Maps (SOMs) neural model is also used to filter out normal

activities. However, this approach has a high false positive

rate. Bostani and Sheikhan [24] proposed an unsupervised

framework based on Optimum-path forest algorithm and K-

Means clustering technique. This framework models malicious

and normal behavior of networks.

The supervised anomaly detection approach in [25]

leverages both distance measure and density of clusters for

intrusion detection. Zhaung et al [26] proposed a model based

on random forest algorithm to discover anomaly patterns with

a high accuracy yet low false negative rate.

Guo et al. [27] proposed a two-level intrusion detection

approach which first detects misuse and then uses KNN

algorithm to reduce false alarms. Toosi et al. [28] proposed a

multi attack classifier model, which implements a mix of

fuzzy neural network, fuzzy inference approach, and genetic

algorithms for intrusion detection. Despite a high accuracy

rate in identifying normal behaviors and detecting simpler

attacks such as DoS attacks and probe, the model performs

poorly in detecting low frequency and distribution attacks

such as R2L. Horng et al [29] proposed a multi-classification

attack model consisting of support vector machines (SVM)

and BRICH hierarchical clustering technique to extract

significant attributes from KDD99 dataset. Their proposed

model has a high detection rate for DoS and Porbe attacks, but

is ineffective against U2R and R2L attacks.

Tan et al. [30] proposed a system for DoS detection using

multivariate correlation analysis (MCA) to improve the

accuracy of network traffic characterization. In [31], a two-

layer classification module was used to detect U2R and R2L

attacks with low computational complexity due to its

optimized feature reduction. Osanaiye et al. [13] proposed an

ensemble-based multi-filter feature selection method to detect

distributed DoS attacks in cloud environments using four filter

methods to achieve an optimum selection over NSL-KDD

dataset. Iqbal et al. [32] presented an attack taxonomy for

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

cloud services and suggested a cloud-based intrusion detection

system.

Ambusaidi et al. in [33] proposed a mutual information

based IDS that selects optimal feature for classification based

on feature selection algorithm. Their approach was evaluated

using three benchmark data set (KDD Cup 99, NSL-KDD and

Kyoto 2006+).

Intrusion detection systems have also been used for

managing security risks in industrial control systems [14]. For

example, Pan et al. [34] proposed a systematic and automated

approach to build a hybrid IDS that learns temporal state-

based specifications for electric power systems to accurately

differentiate between disturbances, normal control operations,

and cyber-attacks. Zhou et al. [35] presented an industrial

anomaly and multi model driven IDS based on Hidden

Markov Model to filter attacks from actual faults.

Security issues can be a barrier to widespread adoption of

IoT devices [36]. Whitmore et al., [37] showed that wide

range of techniques could mitigate cyber threat targeting IoT

systems. Ning et al. [38] proposed a hierarchical

authentication architecture to provide anonymous data

transmission in IoT networks. Cao et al. [39] highlighted the

impact and importance of ghost attacks on ZigBee based IoT

devices. Chen et al. [40] proposed an autonomic model-driven

cyber security management approach for IoT systems, which

can be used to estimate, detect, and respond to cyberattacks

with little or no human intervention. Teixeira et al. [41]

proposed a scheme for thwarting insiders attacks in IoT

networks by crosschecking data transformation of every IoT

node.

III. PROPOSED TDTC MODEL

The proposed model comprises a dimension reduction module

and a classification module, to be discussed in sections III.A

and III.B, respectively.

Fig 2. In PCA, linear transformation is used to reduce high dimension dataset

to a low dimension dataset

A. Dimension Reduction Module

The dimension reduction module is deployed to address

limitations due to dimensionality that may lead to making

wrong decisions while increasing computational complexity of

the classifier. We deployed both Linear Discriminant Analysis

(LDA) (i.e. a supervised dimension reduction technique) and

Principal Component Analysis (i.e. an unsupervised

dimension reduction technique) in order to address the high

dimensionality issue. Principal Component Analysis (PCA)

can be used to perform feature selection and extraction [42]:

a) Feature selection: choose a subset of all features based

on their effectiveness in higher classification (i.e.

choosing more informative features)

b) Feature extraction: create a subset of new features by

combining existing features.

In TDTC, we used PCA as a feature extraction mechanism to

map the NSL-KDD dataset, which consists of 41 features to

one with a lower feature space by removing less significant

features. Feature extraction technique is commonly limited to

linear transforms:   as shown in in Figure 2.

Let X be an N-dimensional random vector in the original

dataset, and the new feature space consists of lower M-

dimensions (M is the number of new dataset features that are

transformed) where (  ). For the transformation

operation, we will need to compute Eq. 1 to Eq.3:

Covariance matrix:







󰇛



 󰇜󰇛



 󰇜









, (Eq.1)

Where m (mean vector) is:

 















(Eq.2)

Eigenvector-eigenvalue decomposition:

  Where v=Eigenvector =Eigenvalue (Eq.3)

PCA will then sort the eigenvectors in descending order. In

other words, eigenvectors with lower eigenvalues have the

least information about the distribution of the data and these

are the eigenvectors we wish to drop. A common approach is

to rank the eigenvectors from the highest to the lowest

eigenvalue and choose the top eigenvectors based on

eigenvalues. Similarly, in TDTC, one may decide which

eigenvalues are more useful; thus, the ideal feature mapping

matrix  can be concluded and used for linear transformation

of training and test dataset.

At this layer of dimension reduction, Imbedded Error

Function (IEF) factor analysis measure [43] is used to select

the principal [44] as shown in Eq.4, where l, m denotes the

number of Principal Components (PCs). Both l and m are used

to represent the data and number of dimension, respectively. N

and  denote the number of samples and Eigenvalues,

respectively.



󰇛



󰇜













󰇛󰇜









(E q.4)

Cross Validation (CV) is used to evaluate optimum principals

with minimum errors as shown in Figure 3. Applying selection

criteria would reduce some features and help the next layer of

dimension reduction module to compute lower dimension

matrix and spreadable objects.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

Fig 5. Imbedded Error Function measure of NSL-KDD train

data set to select optimum number of dimension with minimum

error and information loss.

As observed in Figure 5, Cumulative Percent Variance (CPV)

measure with 95% threshold is also examined to justify the

selection of optimum dimensions.



󰇛



󰇜

 





















% (Eq.5)

B. Linear Discriminant Analysis

Linear computation can be used to achieve a reasonable speed

in intrusion detection systems [31].

Since objects (samples) in the PCA-transformed dataset are

not ideal for classification, the proposed model utilized

another feature reduction module to apply the labeled data in

an optimal transformation to new dimensions. LDA examines

the class labels to reduce the dimension of large working

datasets and LDA is widely used in different domains such as

image processing and stock analysis [45]. LDA chooses an

After the transformation using LDA, the new mapped features

will have only four dimensions {lda1, ..., lda4}.

Figure 4 shows the two-dimension of the newly mapped

original data set transformed by LDA. In other words, the

dataset has been converted into    dimensions, where C is

number of class labels that exist in the original dataset.

optimal projection matrix to map a higher dimensional feature

space to a new lower dimensional space while preserving the

required information for data classification [46].

There are two scatter matrices that need to be obtained in

LDA, namely: S

which is the between-class scatter matrix,

and S

the within-class scatter matrix. In TDTC, the LDA

dimension reduction module transforms the NSL-KDD dataset

to a lower dimension. It is assumed that there is a set of n d-

dimensional vectors of x

, ..., x

belonging to k different class

labels of C

, where each i = 1, 2, 3,...,k has n

samples (in

TDTC k = 5 e.g. normal, DoS, Probe, U2R, L2R).

The projection matrix  is calculated to maximize S

– see

Eq. 6, and minimize S

– see Eq. 7.



μ

- xμ

- x

(Eq.6)

 

x

- μ

x

- μ



i  c

(Eq.7)





is the mean value of class C

samples, and is given by Eq.8.





x C

(Eq.8)

Since the ratio J in Eq.9 is within the range of S

and S

, it

can be easily maximized as an optimization problem using the

projection matrix W

(see Eq.9).

J=

(Eq.9)

All these operations will be conducted on the training dataset

(see Section IV) to obtain an ideal transformation matrix that

can be applied to future test sets or unknown instances.

Table 1. Transformed Features Dependency Of Train

Data Set After

Applying Two Level Of Reduction Due To Correlation Coefficient Measure.

features

LDA1

LDA2

LDA3

LDA4

LDA1

-3.76E-17

4.73E-16

1.06E-16

LDA2

-3.76E-17

-6.69E-17

-3.52E-16

LDA3

4.73E-16

-6.69E-17

-1.65E-15

LDA4

1.06E-16

-3.52E-16

-1.65E-15

Fig 3. Imbedded Error Function measure of NSL-KDD train data set to select optimum number of dimension with minimum error and

information loss.

A Two-Layer Dimension Reduction and Two-Tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks

Figures

Citations

Data Mining Practical Machine Learning Tools and Techniques

A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security

Network Intrusion Detection for IoT Security Based on Learning Techniques

Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches

Internet of Things security and forensics: Challenges and opportunities

References

Data Mining: Practical Machine Learning Tools and Techniques

The Internet of Things: A survey

Anomaly detection: A survey

Data Mining Practical Machine Learning Tools and Techniques

The internet of things: a survey

Related Papers (5)

Distributed attack detection scheme using deep learning approach for Internet of Things

A detailed analysis of the KDD CUP 99 data set

Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches

A survey of intrusion detection in Internet of Things

A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection

Frequently Asked Questions (22)

Q1. What are the contributions mentioned in the paper "A two-layer dimension reduction and two- tier classification model for anomaly- based intrusion detection in iot backbone networks" ?

Q2. What are the future works mentioned in the paper "A two-layer dimension reduction and two- tier classification model for anomaly- based intrusion detection in iot backbone networks" ?

Q3. What is the supervised anomaly detection approach?

Q4. What is the way to detect anomalous activity?

Q5. What is the CF-KNN version of TDTC?

Q6. What is the certainty factor in the classification module?

Q7. What is the description of the model?

Q8. Why is the dimension reduction module deployed to addresslimitations?

Q9. What is the performance of TDTC two dimension reduction module?

Q10. How many new attack types are included in the training set?

Q11. What is the correlation coefficient assessment of the final features?

Q12. What is the role of TDTC in detecting residual attacks?

Q13. What is the computational complexity of Nave Bayes classifier?

Q14. What is the way to detect DoS attacks in cloud environments?

Q15. What is the smallest number of features in the original dataset?

Q16. What is the description of the supervised approach?

Q17. What is the way to determine which eigenvalues are more useful?

Q18. What is the optimal projection matrix for the TDTC dataset?

Q19. What is the description of the supervised anomaly detection approach?

Q20. What is the description of the approach?

Q21. How many features are used in TDTC?

Q22. What is the result value of each feature?