scispace - formally typeset
Open AccessJournal ArticleDOI

A Two-Layer Dimension Reduction and Two-Tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks

Reads0
Chats0
TLDR
A novel model for intrusion detection based on two-layer dimension reduction and two-tier classification module, designed to detect malicious activities such as User to Root (U2R) and Remote to Local (R2L) attacks is presented.
Abstract
With increasing reliance on Internet of Things (IoT) devices and services, the capability to detect intrusions and malicious activities within IoT networks is critical for resilience of the network infrastructure. In this paper, we present a novel model for intrusion detection based on two-layer dimension reduction and two-tier classification module, designed to detect malicious activities such as User to Root (U2R) and Remote to Local (R2L) attacks. The proposed model is using component analysis and linear discriminate analysis of dimension reduction module to spate the high dimensional dataset to a lower one with lesser features. We then apply a two-tier classification module utilizing Naive Bayes and Certainty Factor version of K-Nearest Neighbor to identify suspicious behaviors. The experiment results using NSL-KDD dataset shows that our model outperforms previous models designed to detect U2R and R2L attacks.

read more

Content maybe subject to copyright    Report

A two-layer dimension reduction and two-
tier classification model for anomaly-
based intrusion detection in IoT backbone
networks
Haddad Pajouh, H, Javadian, R, Khayami, R, Dehghantanha, A and Choo, R
http://dx.doi.org/10.1109/TETC.2016.2633228
Title A two-layer dimension reduction and two-tier classification model for
anomaly-based intrusion detection in IoT backbone networks
Authors Haddad Pajouh, H, Javadian, R, Khayami, R, Dehghantanha, A and Choo,
R
Publication title IEEE Transactions on Emerging Topics in Computing
Publisher IEEE
Type Article
USIR URL This version is available at: http://usir.salford.ac.uk/id/eprint/40937/
Published Date 2019
USIR is a digital collection of the research output of the University of Salford. Where copyright
permits, full text material held in the repository is made freely available online and can be read,
downloaded and copied for non-commercial private study or research purposes. Please check the
manuscript for any further copyright restrictions.
For more information, including our policy and submission procedure, please
contact the Repository Team at: library-research@salford.ac.uk.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
1
AbstractWith increasing reliance on Internet of Things (IoT)
devices and services, the capability to detect intrusions and
malicious activities within IoT networks is critical for resilience
of the network infrastructure. In this paper, we present a novel
model for intrusion detection based on two-layer dimension
reduction and two-tier classification module, designed to detect
malicious activities such as User to Root (U2R) and Remote to
Local (R2L) attacks. The proposed model is using component
analysis and linear discriminate analysis of dimension reduction
module to spate the high dimensional dataset to a lower one with
lesser features. We then apply a two-tier classification module
utilizing Naïve Bayes and Certainty Factor version of K-Nearest
Neighbor to identify suspicious behaviors. The experiment results
using NSL-KDD dataset shows that our model outperforms
previous models designed to detect U2R and R2L attacks.
Index Terms Anomaly Detection, CF-KNN, Intrusion Detection
System, IoT, Multi-layer Classification
I. INTRODUCTION
nternet of Things (IoT) technologies are becoming
increasingly prevalent across different industry sectors
such as health care, personal and social domains, and smart
cities [1]. Similar to most consumer technologies, IoT
technologies are not designed with security in mind, which are
now emerging as a key barrier in the wider adoption of IoT
networks and services [2]. Intrusion detection is one of several
security mechanisms for managing security intrusions [3],
which can be detected in any of four layers of IoT architecture
shown in Figure 1 [4]. The Network layer not only serves as a
backbone for connecting different IoT devices, but also
provides opportunities for deploying network-based security
defense mechanisms such as Network Intrusion Detection
H. H. Pajouh, R. Javidan, and R. Khatami are with the Department of
Computer Engineering and Information, Technology, Shiraz University of
Technology, Iran (e-mail: hp@sutech.ac.ir, reza.javidan@sutech.ac.ir,
khayami@sutech.ac.ir).
A. Dehghantanha is with the School of Computing, Science and
Engineering, University of Salford, UK. (e-mail:
A.Dehghantanha@salford.ac.uk).
K-K R Choo, is with the Department of Information Systems and Cyber
Security, University of Texas at San Antonio, USA (e-mail:
raymond.choo@fulbrightmail.org).
Systems (NIDS) [5],[6],[7]. According to the analysis of
KDD99 [3] and its latter version NSL-KDD [9], malicious
behaviors (attacks) in network-based intrusions can be
classified into the following four main categories [7]:
Probe: when an attacker seeks to only gain information
about the target network through network and host
scanning activities (i.e. ports scanning).
DoS (denial of service): when an attacker interrupts
legitimate users’ access to the given service or
machine.
U2R (User to Root): when an attacker attempts to
escalate a limited user’ privilege to a super user or root
access (e.g. via malware infection or stolen
credentials).
R2L (Remote to Local): when an attacker gains remote
access to a victim machine imitating existing local
users.
User to Root (U2R) and Remote to Local (R2L) attacks are
among the most challenging attacks to detect as they mimick
normal users behavior [10] [11].
IDS are categorized into signature-based and anomaly-
based detection based on their technique in detecting an
intrusion [12]. Signature-based IDS relies on a set of pre-
defined malicious activates patterns and attack signatures to
detect intrusions while anomaly-based IDS relies on
deviations from normal behaviors to detect intrusions [6].
Signature-based IDSes generally outperform anomaly-based
IDSes in detecting previously known attacks, but the former is
ineffective against unknown or polymorphic attacks [13]. On
the other hand, anomaly-based IDSes are capable of detecting
unknown attacks in the absence of a predefined pattern. Due to
the diversity of devices deployed in IoT networks, it would be
unrealistic and impractical to rely on pre-defined attack
patterns for intrusion detection, which limits signature-based
IDS utilization in IoT networks [14].
In this paper, we present a network anomaly-based model
for intrusion detection, hereafter referred to as Two-layer
Dimension Reduction and Two-tier Classification (TDTC)
model.
A Two-layer Dimension Reduction and Two-tier
Classification Model for Anomaly-Based
Intrusion Detection in IoT Backbone Networks
Hamed HaddadPajouh, Reza Javidan, Raouf Khayami, Ali Dehghantanha and Kim-Kwang Raymond
Choo
I

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
2
Fig 1. IoT Network Security Architecture [4]
The proposed model, designed for anomaly-based intrusion
detection in IoT backbone networks, uses two-layer dimension
reduction and two-tier classification detection techniques to
detect “hard-to-detect” intrusions, such as U2R and R2L
attacks. We also demonstrate that the proposed model has the
following characteristics:
Higher overall detection rates due to the deployment
of a multi-layer classifier
Lower false positive due to deployment of a
refinement feature
Accurate detection of U2R and R2L attacks, without
reducing performance
Lower computational complexity due to deployment
of dimension reduction in the two layers.
In the next section, we present related work. The proposed
model is presented in Section 3, and evaluation of the model is
presented in Section 4. Section 5 concludes this paper and
outlines future research topics.
II. RELATED WORK
Existing intrusion detection and prevention models generally
use statistical approaches [15] such as Hidden Markov Model
(HMM) [15], Bayes theory [16], cluster analysis [17], signal
processing [18] and distance measuring [19] to detect
anomalous activities. Anomaly detection approaches can be
broadly categorized into supervised and unsupervised learning
[6]. In supervised anomaly detection approach, normal
behavior of a system or networks is constructed using a
labeled dataset [20]. Unsupervised technique assumes that
normal behaviors are more frequent and, thus, the model is
built based on this assumption; thus, no training data is
required [21].
Casas et al. [22] proposed an unsupervised NIDS based on
subspace clustering and outlier detection and demonstrated
that their approach performs well against unknown attacks. In
[23], a feature section filter module is proposed, which utilizes
Principal Component Analysis and Fisher Dimension
Reduction to filter noises. In the approach, Self-Organizing
Maps (SOMs) neural model is also used to filter out normal
activities. However, this approach has a high false positive
rate. Bostani and Sheikhan [24] proposed an unsupervised
framework based on Optimum-path forest algorithm and K-
Means clustering technique. This framework models malicious
and normal behavior of networks.
The supervised anomaly detection approach in [25]
leverages both distance measure and density of clusters for
intrusion detection. Zhaung et al [26] proposed a model based
on random forest algorithm to discover anomaly patterns with
a high accuracy yet low false negative rate.
Guo et al. [27] proposed a two-level intrusion detection
approach which first detects misuse and then uses KNN
algorithm to reduce false alarms. Toosi et al. [28] proposed a
multi attack classifier model, which implements a mix of
fuzzy neural network, fuzzy inference approach, and genetic
algorithms for intrusion detection. Despite a high accuracy
rate in identifying normal behaviors and detecting simpler
attacks such as DoS attacks and probe, the model performs
poorly in detecting low frequency and distribution attacks
such as R2L. Horng et al [29] proposed a multi-classification
attack model consisting of support vector machines (SVM)
and BRICH hierarchical clustering technique to extract
significant attributes from KDD99 dataset. Their proposed
model has a high detection rate for DoS and Porbe attacks, but
is ineffective against U2R and R2L attacks.
Tan et al. [30] proposed a system for DoS detection using
multivariate correlation analysis (MCA) to improve the
accuracy of network traffic characterization. In [31], a two-
layer classification module was used to detect U2R and R2L
attacks with low computational complexity due to its
optimized feature reduction. Osanaiye et al. [13] proposed an
ensemble-based multi-filter feature selection method to detect
distributed DoS attacks in cloud environments using four filter
methods to achieve an optimum selection over NSL-KDD
dataset. Iqbal et al. [32] presented an attack taxonomy for

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
3
cloud services and suggested a cloud-based intrusion detection
system.
Ambusaidi et al. in [33] proposed a mutual information
based IDS that selects optimal feature for classification based
on feature selection algorithm. Their approach was evaluated
using three benchmark data set (KDD Cup 99, NSL-KDD and
Kyoto 2006+).
Intrusion detection systems have also been used for
managing security risks in industrial control systems [14]. For
example, Pan et al. [34] proposed a systematic and automated
approach to build a hybrid IDS that learns temporal state-
based specifications for electric power systems to accurately
differentiate between disturbances, normal control operations,
and cyber-attacks. Zhou et al. [35] presented an industrial
anomaly and multi model driven IDS based on Hidden
Markov Model to filter attacks from actual faults.
Security issues can be a barrier to widespread adoption of
IoT devices [36]. Whitmore et al., [37] showed that wide
range of techniques could mitigate cyber threat targeting IoT
systems. Ning et al. [38] proposed a hierarchical
authentication architecture to provide anonymous data
transmission in IoT networks. Cao et al. [39] highlighted the
impact and importance of ghost attacks on ZigBee based IoT
devices. Chen et al. [40] proposed an autonomic model-driven
cyber security management approach for IoT systems, which
can be used to estimate, detect, and respond to cyberattacks
with little or no human intervention. Teixeira et al. [41]
proposed a scheme for thwarting insiders attacks in IoT
networks by crosschecking data transformation of every IoT
node.
III. PROPOSED TDTC MODEL
The proposed model comprises a dimension reduction module
and a classification module, to be discussed in sections III.A
and III.B, respectively.
Fig 2. In PCA, linear transformation is used to reduce high dimension dataset
to a low dimension dataset
A. Dimension Reduction Module
The dimension reduction module is deployed to address
limitations due to dimensionality that may lead to making
wrong decisions while increasing computational complexity of
the classifier. We deployed both Linear Discriminant Analysis
(LDA) (i.e. a supervised dimension reduction technique) and
Principal Component Analysis (i.e. an unsupervised
dimension reduction technique) in order to address the high
dimensionality issue. Principal Component Analysis (PCA)
can be used to perform feature selection and extraction [42]:
a) Feature selection: choose a subset of all features based
on their effectiveness in higher classification (i.e.
choosing more informative features)
b) Feature extraction: create a subset of new features by
combining existing features.
In TDTC, we used PCA as a feature extraction mechanism to
map the NSL-KDD dataset, which consists of 41 features to
one with a lower feature space by removing less significant
features. Feature extraction technique is commonly limited to
linear transforms:  as shown in in Figure 2.
Let X be an N-dimensional random vector in the original
dataset, and the new feature space consists of lower M-
dimensions (M is the number of new dataset features that are
transformed) where ( ). For the transformation
operation, we will need to compute Eq. 1 to Eq.3:
Covariance matrix:
󰇛
󰇜󰇛
󰇜

, (Eq.1)
Where m (mean vector) is:

(Eq.2)
Eigenvector-eigenvalue decomposition:
 Where v=Eigenvector =Eigenvalue (Eq.3)
PCA will then sort the eigenvectors in descending order. In
other words, eigenvectors with lower eigenvalues have the
least information about the distribution of the data and these
are the eigenvectors we wish to drop. A common approach is
to rank the eigenvectors from the highest to the lowest
eigenvalue and choose the top eigenvectors based on
eigenvalues. Similarly, in TDTC, one may decide which
eigenvalues are more useful; thus, the ideal feature mapping
matrix can be concluded and used for linear transformation
of training and test dataset.
At this layer of dimension reduction, Imbedded Error
Function (IEF) factor analysis measure [43] is used to select
the principal [44] as shown in Eq.4, where l, m denotes the
number of Principal Components (PCs). Both l and m are used
to represent the data and number of dimension, respectively. N
and denote the number of samples and Eigenvalues,
respectively.

󰇛
󰇜

󰇛󰇜
(E q.4)
Cross Validation (CV) is used to evaluate optimum principals
with minimum errors as shown in Figure 3. Applying selection
criteria would reduce some features and help the next layer of
dimension reduction module to compute lower dimension
matrix and spreadable objects.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
4
Fig 5. Imbedded Error Function measure of NSL-KDD train
data set to select optimum number of dimension with minimum
error and information loss.
As observed in Figure 5, Cumulative Percent Variance (CPV)
measure with 95% threshold is also examined to justify the
selection of optimum dimensions.

󰇛
󰇜



% (Eq.5)
B. Linear Discriminant Analysis
Linear computation can be used to achieve a reasonable speed
in intrusion detection systems [31].
Since objects (samples) in the PCA-transformed dataset are
not ideal for classification, the proposed model utilized
another feature reduction module to apply the labeled data in
an optimal transformation to new dimensions. LDA examines
the class labels to reduce the dimension of large working
datasets and LDA is widely used in different domains such as
image processing and stock analysis [45]. LDA chooses an
After the transformation using LDA, the new mapped features
will have only four dimensions {lda1, ..., lda4}.
Figure 4 shows the two-dimension of the newly mapped
original data set transformed by LDA. In other words, the
dataset has been converted into dimensions, where C is
number of class labels that exist in the original dataset.
optimal projection matrix to map a higher dimensional feature
space to a new lower dimensional space while preserving the
required information for data classification [46].
There are two scatter matrices that need to be obtained in
LDA, namely: S
B
which is the between-class scatter matrix,
and S
W
the within-class scatter matrix. In TDTC, the LDA
dimension reduction module transforms the NSL-KDD dataset
to a lower dimension. It is assumed that there is a set of n d-
dimensional vectors of x
i
, ..., x
n
belonging to k different class
labels of C
i
, where each i = 1, 2, 3,...,k has n
i
samples (in
TDTC k = 5 e.g. normal, DoS, Probe, U2R, L2R).
The projection matrix is calculated to maximize S
B
see
Eq. 6, and minimize S
W
see Eq. 7.
S
B
=
μ
c
- xμ
c
- x
T
c
(Eq.6)
S
W
=
x
i
- μ
c
x
i
- μ
c
T
i c
c
(Eq.7)
is the mean value of class C
i
samples, and is given by Eq.8.
μ
c
=
n
i
x
x C
i
(Eq.8)
Since the ratio J in Eq.9 is within the range of S
B
and S
W
, it
can be easily maximized as an optimization problem using the
projection matrix W
r
(see Eq.9).
J=
W
r
T
S
B
W
r
W
r
T
S
W
W
r
(Eq.9)
All these operations will be conducted on the training dataset
(see Section IV) to obtain an ideal transformation matrix that
can be applied to future test sets or unknown instances.
Table 1. Transformed Features Dependency Of Train
+
Data Set After
Applying Two Level Of Reduction Due To Correlation Coefficient Measure.
features
LDA1
LDA3
LDA4
LDA1
1
4.73E-16
1.06E-16
LDA2
-3.76E-17
-6.69E-17
-3.52E-16
LDA3
4.73E-16
1
-1.65E-15
LDA4
1.06E-16
-1.65E-15
1
Fig 3. Imbedded Error Function measure of NSL-KDD train data set to select optimum number of dimension with minimum error and
information loss.

Citations
More filters
Journal ArticleDOI

A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security

TL;DR: A comprehensive survey of ML methods and recent advances in DL methods that can be used to develop enhanced security methods for IoT systems and presents the opportunities, advantages and shortcomings of each method.
Journal ArticleDOI

Network Intrusion Detection for IoT Security Based on Learning Techniques

TL;DR: This survey classifies the IoT security threats and challenges for IoT networks by evaluating existing defense techniques and provides a comprehensive review of NIDSs deploying different aspects of learning techniques for IoT, unlike other top surveys targeting the traditional systems.
Journal ArticleDOI

Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches

TL;DR: Performances of several machine learning models have been compared to predict attacks and anomalies on the IoT systems accurately and other metrics prove that Random Forest performs comparatively better.
Journal ArticleDOI

Internet of Things security and forensics: Challenges and opportunities

TL;DR: This paper first introduces existing major security and forensics challenges within IoT domain and then briefly discusses about papers published in this special issue targeting identified challenges.
References
More filters
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI

The Internet of Things: A survey

TL;DR: This survey is directed to those who want to approach this complex discipline and contribute to its development, and finds that still major issues shall be faced by the research community.
Journal ArticleDOI

Anomaly detection: A survey

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.
Journal ArticleDOI

The internet of things: a survey

TL;DR: The definitions, architecture, fundamental technologies, and applications of IoT are systematically reviewed and the major challenges which need addressing by the research community and corresponding potential solutions are investigated.
Related Papers (5)
Frequently Asked Questions (22)
Q1. What are the contributions mentioned in the paper "A two-layer dimension reduction and two- tier classification model for anomaly- based intrusion detection in iot backbone networks" ?

In this paper, the authors present a novel model for intrusion detection based on two-layer dimension reduction and two-tier classification module, designed to detect malicious activities such as User to Root ( U2R ) and Remote to Local ( R2L ) attacks. 

Future research includes exploring the potential of nonparametric methods such as dimension reduction module and fuzzy clustering to achieve a better classification against U2R, R2L and other attacks. Another interesting future work could be extension of the proposed model to detect intrusions at other layers of the IoT architecture such as application and support layers, as well as other protocols running in the network layer. 

The supervised anomaly detection approach in [25] leverages both distance measure and density of clusters for intrusion detection. 

Existing intrusion detection and prevention models generally use statistical approaches [15] such as Hidden Markov Model (HMM) [15], Bayes theory [16], cluster analysis [17], signal processing [18] and distance measuring [19] to detect anomalous activities. 

The Naïve Bayes classifier is used to classify anomalous behavior, which is then refined to normal instances using the Certainty-Factor version of KNearest Neighbor (CF-KNN). 

The certainty-factor similarity measure in the classification module is based on the distribution proportion of classes in the training dataset to resolve imbalance data set issue. 

Despite a high accuracy rate in identifying normal behaviors and detecting simpler attacks such as DoS attacks and probe, the model performs poorly in detecting low frequency and distribution attacks such as R2L. 

The dimension reduction module is deployed to addresslimitations due to dimensionality that may lead to makingwrong decisions while increasing computational complexity of the classifier. 

TDTC two dimension reduction module performance is an offline task, which is applied once to obtain the transform vectors for incoming samples. 

Since the test set contains 17 new attack types not included in the training set, the authors can evaluate the effectiveness of TCTD in detecting unknown or uncommon attacks. 

TheCorrelation Coefficient assessments of the final features shows that the transferred features at two layers of dimension reduction module are mostly independent, since ρ=0. 

TDTC also can be deployed as an auxiliary service for digital forensics in IoT ecosystem, such as those discussed in [56] to detect residual attack patterns of IoT network layer.> 

The computational complexity of Naïve Bayes classifier of the classification module is determined as 𝑂(𝑒 × 𝑓), where e is the count of samples in dataset and f represents number of features. 

Osanaiye et al. [13] proposed an ensemble-based multi-filter feature selection method to detect distributed DoS attacks in cloud environments using four filter methods to achieve an optimum selection over NSL-KDD dataset. 

Let X be an N-dimensional random vector in the original dataset, and the new feature space consists of lower Mdimensions (M is the number of new dataset features that are transformed) where (𝑀 < 𝑁). 

Casas et al. [22] proposed an unsupervised NIDS based on subspace clustering and outlier detection and demonstrated that their approach performs well against unknown attacks. 

in TDTC, one may decide which eigenvalues are more useful; thus, the ideal feature mapping matrix 𝑊 can be concluded and used for linear transformation of training and test dataset. 

The projection matrix 𝑊 is calculated to maximize SB – see Eq. 6, and minimize SW – see Eq. 7.SB= ∑ (μc - x̅)(μc - x̅) T c (Eq.6) SW= ∑ ∑ (xi - μc)(xi - μc) 

Zhaung et al [26] proposed a model based on random forest algorithm to discover anomaly patterns with a high accuracy yet low false negative rate. 

Guo et al. [27] proposed a two-level intrusion detection approach which first detects misuse and then uses KNN algorithm to reduce false alarms. 

Therefore at this level, due to LDA optimum transformation, the first classifier of TDTC is equipped with only four features instead of 35. 

The resultvalue of each feature is mapped into an integer number, to avoid any bias, as shown in Eq.13 for each continuousvalued 𝑧.