scispace - formally typeset
Search or ask a question
JournalISSN: 2364-1541

Data Science and Engineering 

Springer Science+Business Media
About: Data Science and Engineering is an academic journal published by Springer Science+Business Media. The journal publishes majorly in the area(s): Computer science & Graph (abstract data type). It has an ISSN identifier of 2364-1541. It is also open access. Over the lifetime, 199 publications have been published receiving 2978 citations.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: This article presents a review of methods that are used for big data reduction including the network theory, big data compression, dimension reduction, redundancy elimination, data mining, and machine learning methods.
Abstract: Research on big data analytics is entering in the new phase called fast data where multiple gigabytes of data arrive in the big data systems every second. Modern big data systems collect inherently complex data streams due to the volume, velocity, value, variety, variability, and veracity in the acquired data and consequently give rise to the 6Vs of big data. The reduced and relevant data streams are perceived to be more useful than collecting raw, redundant, inconsistent, and noisy data. Another perspective for big data reduction is that the million variables big datasets cause the curse of dimensionality which requires unbounded computational resources to uncover actionable knowledge patterns. This article presents a review of methods that are used for big data reduction. It also presents a detailed taxonomic discussion of big data reduction methods including the network theory, big data compression, dimension reduction, redundancy elimination, data mining, and machine learning methods. In addition, the open research issues pertinent to the big data reduction are also highlighted.

138 citations

Journal ArticleDOI
TL;DR: The challenges of medical big data handing are explored and the concept of the computer-aided diagnosis (CAD) system how it works is introduced and a survey of developed CAD methods in the area of neurological diseases diagnosis is provided.
Abstract: Diagnosis of neurological diseases is a growing concern and one of the most difficult challenges for modern medicine. According to the World Health Organisation’s recent report, neurological disorders, such as epilepsy, Alzheimer’s disease and stroke to headache, affect up to one billion people worldwide. An estimated 6.8 million people die every year as a result of neurological disorders. Current diagnosis technologies (e.g. magnetic resonance imaging, electroencephalogram) produce huge quantity data (in size and dimension) for detection, monitoring and treatment of neurological diseases. In general, analysis of those medical big data is performed manually by experts to identify and understand the abnormalities. It is really difficult task for a person to accumulate, manage, analyse and assimilate such large volumes of data by visual inspection. As a result, the experts have been demanding computerised diagnosis systems, called “computer-aided diagnosis (CAD)” that can automatically detect the neurological abnormalities using the medical big data. This system improves consistency of diagnosis and increases the success of treatment, save lives and reduce cost and time. Recently, there are some research works performed in the development of the CAD systems for management of medical big data for diagnosis assessment. This paper explores the challenges of medical big data handing and also introduces the concept of the CAD system how it works. This paper also provides a survey of developed CAD methods in the area of neurological diseases diagnosis. This study will help the experts to have some idea and understanding how the CAD system can assist them in this point.

122 citations

Journal ArticleDOI
TL;DR: The survey can help the partitioners to understand existing AQP techniques and select appropriate methods in their applications and provide research challenges and opportunities of AQP.
Abstract: Online analytical processing (OLAP) is a core functionality in database systems. The performance of OLAP is crucial to make online decisions in many applications. However, it is rather costly to support OLAP on large datasets, especially big data, and the methods that compute exact answers cannot meet the high-performance requirement. To alleviate this problem, approximate query processing (AQP) has been proposed, which aims to find an approximate answer as close as to the exact answer efficiently. Existing AQP techniques can be broadly categorized into two categories. (1) Online aggregation: select samples online and use these samples to answer OLAP queries. (2) Offline synopses generation: generate synopses offline based on a-priori knowledge (e.g., data statistics or query workload) and use these synopses to answer OLAP queries. We discuss the research challenges in AQP and summarize existing techniques to address these challenges. In addition, we review how to use AQP to support other complex data types, e.g., spatial data and trajectory data, and support other applications, e.g., data visualization and data cleaning. We also introduce existing AQP systems and summarize their advantages and limitations. Lastly, we provide research challenges and opportunities of AQP. We believe that the survey can help the partitioners to understand existing AQP techniques and select appropriate methods in their applications.

99 citations

Journal ArticleDOI
TL;DR: How well the two main privacy models used in anonymization meet the requirements of big data, namely composability, low computational cost and linkability is evaluated.
Abstract: This paper explores the challenges raised by big data in privacy-preserving data management. First, we examine the conflicts raised by big data with respect to preexisting concepts of private data management, such as consent, purpose limitation, transparency and individual rights of access, rectification and erasure. Anonymization appears as the best tool to mitigate such conflicts, and it is best implemented by adhering to a privacy model with precise privacy guarantees. For this reason, we evaluate how well the two main privacy models used in anonymization (k-anonymity and \(\varepsilon \)-differential privacy) meet the requirements of big data, namely composability, low computational cost and linkability.

89 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors provided a comprehensive survey on traffic prediction, which is from the spatio-temporal data layer to the intelligent transportation application layer, and split the whole research scope into four parts from bottom to up, where the four parts are, respectively, spatiotemporal data, preprocessing, traffic prediction and traffic application.
Abstract: Intelligent transportation (e.g., intelligent traffic light) makes our travel more convenient and efficient. With the development of mobile Internet and position technologies, it is reasonable to collect spatio-temporal data and then leverage these data to achieve the goal of intelligent transportation, and here, traffic prediction plays an important role. In this paper, we provide a comprehensive survey on traffic prediction, which is from the spatio-temporal data layer to the intelligent transportation application layer. At first, we split the whole research scope into four parts from bottom to up, where the four parts are, respectively, spatio-temporal data, preprocessing, traffic prediction and traffic application. Later, we review existing work on the four parts. First, we summarize traffic data into five types according to their difference on spatial and temporal dimensions. Second, we focus on four significant data preprocessing techniques: map-matching, data cleaning, data storage and data compression. Third, we focus on three kinds of traffic prediction problems (i.e., classification, generation and estimation/forecasting). In particular, we summarize the challenges and discuss how existing methods address these challenges. Fourth, we list five typical traffic applications. Lastly, we provide emerging research challenges and opportunities. We believe that the survey can help the partitioners to understand existing traffic prediction problems and methods, which can further encourage them to solve their intelligent transportation applications.

87 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202315
202229
202128
202028
201925
201823