A Strong Coreset Algorithm to Accelerate OPF as a Graph-based Machine Learning in Large-Scale Problems
Information Sciences, Accepted (with a minor revision) on 23 August 2020
Graph theory is an attractive branch of mathematics that is able to present powerful frameworks for data representation
and pattern recognition of real-world applications. Optimum-path forest (OPF) is one of the efficient graph-based frameworks
that can determine the patterns of input data set by extracting the optimal partitions of the graph obtained through encoding
data into a graph. OPF is a fast, simple, and parameter-independent machine that supports partial overlapping among the classes.
Since OPF was introduced based on simple assumptions without considering the requirements of large-scale problems, this machine
learning is an effective algorithm just for a reasonable size of input data sets. To provide a scalable OPF, this study
introduces a strong coreset for accelerating OPF algorithm. Coreset is a small set of some representative points that can
approximate the characteristics of the original points regarding a specific optimization problem. Applying this approach can
speed up OPF procedure, especially when it works on massive data sets. Regarding the aforementioned target, this work introduces
the definition of coreset in OPF problem for the first time. In this way, a novel algebra is developed to represent the problem
of OPF as a proper optimization problem for the proposed coreset definition. Eventually, a novel coreset construction algorithm
that can approximate the OPF solutions is proposed in order to improve the OPF construction speed. The simulation results of
diverse experiments on various benchmark data sets illustrate computation gain and superiority of the proposed algorithm in
terms of the construction and classification speeds of OPF algorithm as compared to the original algorithm while reliably
performs accurate. To put it briefly, regarding the obtained experimental results, the presented coreset construction algorithm
preforms the training and testing phases of OPF up to 6.1 and 4.9 times faster than before, respectively.
Developing a Fast Supervised Optimum-path Forest Based on Coreset
In Proc. 19th International Symposium on Artificial Intelligence and Signal Processing (AISP’2017), pp. 172-177, 2017
Optimum-path forest (OPF) is an effective graph-based machine learning that simplifies the pattern
recognition problems into the partitioning the corresponding derived graphs of the input datasets.
The amounts of the samples in the input datasets and, consequently the size of the node set of their
corresponding derived graphs has a major effect on the speed of OPF. In this study a novel version of
OPF is introduced which utilizes coreset approach for reducing the scale of the input dataset. From the
aspect of the computational geometry, coreset is a small set of points that includes the best representative
points of the original point set with regard to a geometric objective function. Our method finds the most
informative vertices (samples) by proposing a novel incremental coreset construction algorithm. The
experimental results of the proposed method reduces the input data samples, and the execution times of
the construction and the classification phases of OPF by 80%, 60%, and 12%, respectively, in contrast to
the traditional OPF.
Hybrid of Anomaly-Based and Specification-Based IDS for Internet of Things Using Unsupervised OPF based on Map-Reduce Approach
Computer Communications, vol. 98, pp. 52-71, 2017
Internet of Things (IoT) is a novel paradigm in computer networks in which resource-constrained
objects connect to unreliable Internet by using a wide range of technologies. The insecure nature of
the Internet and wireless sensor networks, that are the main components of IoT, make IoT vulnerable to
different attacks, especially routing attacks (as insider attacks). A novel real-time hybrid intrusion
detection framework is proposed in this study that consists of anomaly-based and specification-based
intrusion detection modules for detecting two well-known routing attacks in IoT called sinkhole and
selective-forwarding attacks. For this purpose, the specification-based intrusion detection agents,
that are located in the router nodes, analyze the behavior of their host nodes and send their local
results to the root node through normal data packets. In addition, an anomaly-based intrusion detection
agent, that is located in the root node, employs the unsupervised optimum-path forest algorithm for
projecting clustering models by using incoming data packets. This agent, which is based on the MapReduce
architecture, can work in a distributed platform for projecting clustering models and consequently parallel
detecting of anomalies as a global detection approach. The proposed method makes decision about suspicious
behavior by using a voting mechanism. Notably, the proposed method is also extended to detect wormhole
attack. The deployment of the hybrid proposed model is investigated in a smart-city scenario by an existing
platform, as well. The free network's scale and the ability to identify malicious nodes are two key features
of the proposed framework that are evaluated through different experiments in this study. The experimental
results of simulated scenarios showed that the proposed hybrid method can achieve true positive rate
of 76.19% and false positive rate of 5.92% when both sinkhole and selective-forwarding attacks were
launched simultaneously. These rates in detecting wormhole attack are 96.02% and 2.08%, respectively.
Modifying Supervised Optimum-Path Forest in Intrusion Detection Systems Using Social Network Approaches and Unsupervised Learning
Pattern Recognition, vol. 62, pp. 56-72, 2017
Optimum-path forest (OPF) is a graph-based machine learning method that can overcome some limitations of
the traditional machine learning algorithms that have been used in intrusion detection systems. This paper
presents a novel approach for intrusion detection using a modified OPF (MOPF) algorithm for improving the
performance of traditional OPF in terms of detection rate (DR), false alarm rate (FAR), and time of execution.
To address the problem of scalability in large datasets and also for achieving high attack recognition rates,
the proposed framework employs the k-means clustering algorithm, as a partitioning module, for generating
different homogeneous training subsets from original heterogeneous training samples. In the proposed MOPF
algorithm, the distance between unlabeled samples and the root (prototype) of every sample in OPF is also
considered in classifying unlabeled samples with the aim of improving the accuracy rate of traditional OPF
algorithm. Moreover, the centrality and the prestige concepts in the social network analysis are employed in
a pruning module for determining the most informative samples in training subsets to speed up the traditional
OPF algorithm. The experimental results on NSL-KDD dataset show that the proposed method performs better than
traditional OPF in terms of accuracy rate, DR, FAR, and cost per example (CPE) evaluation metrics.
A Security Mechanism for Detecting Intrusions in Internet of Things Using Selected Features Based on MI-BGSA
International Journal of Information & Communication Technology Research, vol. 9, no. 2, pp. 53-62, 2017
Internet of things (IoT) is a novel emerging approach in computer networks wherein all heterogeneous objects
around us, which usually are resource-constrained objects, can connect to each other and also the Internet by
using a broad range of technologies. IoT is a hybrid network which includes the Internet and also wireless sensor
networks (WSNs) as the main components of IoT; so, implementing security mechanisms in IoT seems necessary.
This paper introduces a novel intrusion detection architecture model for IoT that provides the possibility of
distributed detection. The proposed hybrid model uses anomaly and misuse intrusion detection agents based on
the supervised and unsupervised optimum-path forest models for providing the ability to detect internal and
externals attacks, simultaneously. The number of input features to the proposed classifier is reduced by a hybrid
feature selection algorithm, as well. The experimental results of simulated scenarios show the superior
performance of proposed security mechanism in multi-faceted detection.
Binary Gravitational Search Algorithm (BGSA): Improved Efficiency
Encyclopedia of Information Assurance, 2016
Today, detecting anomalous traffic and preventing it in computer networks has become increasingly
important for the community of security researchers. An intrusion detection system (IDS) is an effective
tool for reaching high security. This is a software tool for analyzing system behavior or network traffic as
input data to detect deviations from normal behavior. With the development of computer networks, highdimensional
input data analysis has become a huge problem in IDSs. One solution for overcoming this
problem is feature selection, which is a process for selecting an optimal subset of features. Populationbased
heuristic search algorithms have been widely used for this optimization problem. This entry presents
a novel feature selection method based on a binary gravitational search algorithm (BGSA). The proposed
method, which is called modified BGSA (MBGSA), uses BGSA for performing the global search to find
the best subset of features through the wrapper method. Moreover, for improving the efficiency of BGSA,
mutual information (MI) feature selector under the uniform information distribution (MIFS-U) method,
which works as a filter method, is integrated into BGSA as the inner optimization layer. In fact, with the
computation of the relevance between each selected feature and the target class and the redundancy between
the selected features (in the feature subset generated by the wrapper), MIFS-U will find more valuable
features that have maximum relevance to the target class and minimum redundancy to each other. The
experimental results on NSL-KDD dataset using different classifiers show that the proposed method can
find better subset features and achieve higher accuracy and an improved detection rate using fewer features
as compared to standard BGSA and binary particle swarm optimization (BPSO) feature selection methods.
Modification of Optimum-Path Forest using Markov Cluster Process Algorithm
In Proc. 2nd International Conference on Signal Processing and Intelligent Systems (ICSPIS’2016), pp. 1-5, 2016 (Winner of the Outstanding Paper Award)
Optimum-path forest (OPF) is a novel supervised graph-based classifier which reduces the
classification problem into partitioning of vertices in a graph derived from the data samples.
One of the main processes in OPF is identifying the optimum set of key samples named prototypes.
This process is based on creating a minimum spanning tree on a complete weighted graph which is
derived from the training samples; hence, it is much time-consuming for large-scale problems.
In this study, for overcoming this limitation, the process of finding the prototypes in
traditional OPF is modified by using Markov cluster (MCL) algorithm. The graph partitioning
in MCL is based on finding key samples named attractors, which attract other related samples;
so the obtained attractors can be selected as prototypes for generating optimum-path trees.
Experiments on public benchmark datasets show that the speed of proposed modified OPF is
improved considerably as compared to the traditional OPF.
A Hybrid Intrusion Detection Architecture for Internet of Things
In Proc. 8th International Symposium on Telecommunication (IST’2016), pp. 601-606, 2016 (Selected as one of the Best Paper)
In computer networks, Internet of things (IoT) is an emerging paradigm wherein smart and resource-constrained
objects can connect to Internet by using a wide range of technologies. Due to the insecure nature of
Internet and also wireless sensor networks (WSNs), which are the main components of IoT, implementing security
mechanisms in IoT seems necessary. To deal with intrusions which may occur in IoT, a novel intrusion detection
architecture model for IoT is proposed in this paper. This model is based on MapReduce approach with the aim of
distributed detection. To provide multi-faceted detection (from the Internet and WSNs sides), the proposed model
consists of anomaly-based and misuse-based intrusion detection agents that use supervised and unsupervised
optimum-path forest model for intrusion detection. The experimental results of simulated scenarios show the
superior performance of proposed method in intrusion detection for IoT.
Hybrid of Binary Gravitational Search Algorithm and Mutual Information for Feature Selection in Intrusion Detection Systems
Soft Computing, vol. 21, no. 9, pp. 2307-2324, 2017
Intrusion detection systems (IDSs) play an important role in the security of computer
networks. One of the main challenges in IDSs is the high-dimensional input data analysis.
Feature selection is a solution to overcoming this problem. This paper presents a hybrid
feature selection method using binary gravitational search algorithm (BGSA) and mutual
information (MI) for improving the efficiency of standard BGSA as a feature selection algorithm.
The proposed method, called MI-BGSA, used BGSA as a wrapper-based feature selection method for
performing global search. Moreover, MI approach was integrated into the BGSA, as a filter-based
method, to compute the feature–feature and the feature–class mutual information with the aim of
pruning the subset of features. This strategy found the features considering the least
redundancy to the selected features and also the most relevance to the target class.
A two-objective function based on maximizing the detection rate and minimizing the false
positive rate was defined as a fitness function to control the search direction of the
standard BGSA. The experimental results on the NSL-KDD dataset showed that the proposed
method can reduce the feature space dramatically. Moreover, the proposed algorithm found
better subset of features and achieved higher accuracy and detection rate as compared to
the some standard wrapper-based and filter-based feature selection methods.