A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis
Adil Fahad,Najlaa Alshatri,Zahir Tari,Abdullah Alamri,Ibrahim Khalil,Albert Y. Zomaya,Sebti Foufou,Abdelaziz Bouras +7 more
Reads0
Chats0
TLDR
Concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as a comparison, both from a theoretical and an empirical perspective are introduced.Abstract:
Clustering algorithms have emerged as an alternative powerful meta-learning tool to accurately analyze the massive volume of data generated by modern applications. In particular, their main goal is to categorize data into clusters such that objects are grouped in the same cluster when they are similar according to specific metrics. There is a vast body of knowledge in the area of clustering and there has been attempts to analyze and categorize them for a larger number of applications. However, one of the major issues in using clustering algorithms for big data that causes confusion amongst practitioners is the lack of consensus in the definition of their properties as well as a lack of formal categorization. With the intention of alleviating these problems, this paper introduces concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as providing a comparison, both from a theoretical and an empirical perspective. From a theoretical perspective, we developed a categorizing framework based on the main properties pointed out in previous studies. Empirically, we conducted extensive experiments where we compared the most representative algorithm from each of the categories using a large number of real (big) data sets. The effectiveness of the candidate clustering algorithms is measured through a number of internal and external validity metrics, stability, runtime, and scalability tests. In addition, we highlighted the set of clustering algorithms that are the best performing for big data.read more
Citations
More filters
The Self-Organizing Map
TL;DR: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article, where the authors present an overview of their work.
Journal ArticleDOI
Big data analytics: a survey
Chun-Wei Tsai,Chin-Feng Lai,Han-Chieh Chao,Han-Chieh Chao,Han-Chieh Chao,Athanasios V. Vasilakos +5 more
TL;DR: The question that arises now is, how to develop a high performance platform to efficiently analyze big data and how to design an appropriate mining algorithm to find the useful things from big data.
Journal ArticleDOI
Big Data Analytics in Operations Management
TL;DR: This study first explores the existing big data‐related analytics techniques, and identifies their strengths, weaknesses as well as major functionalities, and discusses various big data analytics strategies to overcome the respective computational and data challenges.
Journal ArticleDOI
A survey towards an integration of big data analytics to big insights for value-creation
Mandeep Kaur Saggi,Sushma Jain +1 more
TL;DR: This article presents a comprehensive, well-informed examination, and realistic analysis of deploying big data analytics successfully in companies and presents a methodical analysis for the usage of Big Data Analytics in various applications such as agriculture, healthcare, cyber security, and smart city.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Some methods for classification and analysis of multivariate observations
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Book
Data Mining: Concepts and Techniques
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Proceedings Article
A density-based algorithm for discovering clusters in large spatial Databases with Noise
TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.