scispace - formally typeset
Open AccessJournal ArticleDOI

Approximate Query Processing: What is New and Where to Go?: A Survey on Approximate Query Processing

Kaiyu Li, +1 more
- 01 Dec 2018 - 
- Vol. 3, Iss: 4, pp 379-397
TLDR
The survey can help the partitioners to understand existing AQP techniques and select appropriate methods in their applications and provide research challenges and opportunities of AQP.
Abstract
Online analytical processing (OLAP) is a core functionality in database systems. The performance of OLAP is crucial to make online decisions in many applications. However, it is rather costly to support OLAP on large datasets, especially big data, and the methods that compute exact answers cannot meet the high-performance requirement. To alleviate this problem, approximate query processing (AQP) has been proposed, which aims to find an approximate answer as close as to the exact answer efficiently. Existing AQP techniques can be broadly categorized into two categories. (1) Online aggregation: select samples online and use these samples to answer OLAP queries. (2) Offline synopses generation: generate synopses offline based on a-priori knowledge (e.g., data statistics or query workload) and use these synopses to answer OLAP queries. We discuss the research challenges in AQP and summarize existing techniques to address these challenges. In addition, we review how to use AQP to support other complex data types, e.g., spatial data and trajectory data, and support other applications, e.g., data visualization and data cleaning. We also introduce existing AQP systems and summarize their advantages and limitations. Lastly, we provide research challenges and opportunities of AQP. We believe that the survey can help the partitioners to understand existing AQP techniques and select appropriate methods in their applications.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

QTune: a query-aware database tuning system with deep reinforcement learning

TL;DR: A query-aware database tuning system QTune with a deep reinforcement learning (DRL) model, which can efficiently and effectively tune the database configurations based on both the query vector and database states, and which outperforms the state-of-the-art tuning methods.
Journal ArticleDOI

A survey of data partitioning and sampling methods to support big data analysis

TL;DR: It is believed that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.
Journal ArticleDOI

A Survey of Traffic Prediction: from Spatio-Temporal Data to Intelligent Transportation

TL;DR: Wang et al. as discussed by the authors provided a comprehensive survey on traffic prediction, which is from the spatio-temporal data layer to the intelligent transportation application layer, and split the whole research scope into four parts from bottom to up, where the four parts are, respectively, spatiotemporal data, preprocessing, traffic prediction and traffic application.
Journal ArticleDOI

HeavyKeeper: An Accurate Algorithm for Finding Top-$k$ Elephant Flows

TL;DR: The proposed algorithm called HeavyKeeper incurs small, constant processing overhead per packet and thus supports high line rates, and achieves 99.99% precision with a small memory size, and reduces the error by around 3 orders of magnitude on average compared to the state-of-the-art.
Journal ArticleDOI

Querying shortest paths on time dependent road networks

TL;DR: A novel height-balanced tree-structured index, called TD-G-tree, which supports fast route queries over TDRNs and devise efficient algorithms to support TDSP queries, as well as time-interval based route planning, for computing optimal solutions through dynamic programming and chronological divide-and-conquer.
References
More filters
Journal ArticleDOI

Answering queries using views: A survey

TL;DR: The state of the art on the problem of answering queries using views is surveyed, the algorithms proposed to solve it are described, and the disparate works into a coherent framework are synthesized.
Journal ArticleDOI

Probabilistic counting algorithms for data base applications

TL;DR: A class of probabilistic counting algorithms with which one can estimate the number of distinct elements in a large collection of data in a single pass using only a small additional storage and only a few operations per element scanned is introduced.
Journal ArticleDOI

Trajectory Data Mining: An Overview

TL;DR: A systematic survey on the major research into trajectory data mining, providing a panorama of the field as well as the scope of its research topics, and introduces the methods that transform trajectories into other data formats, such as graphs, matrices, and tensors.
Proceedings ArticleDOI

Online aggregation

TL;DR: In this article, the authors propose an online aggregation interface that allows users to both observe the progress of their aggregation queries and control execution on the fly, and present a suite of techniques that extend a database system to meet these requirements.
Proceedings ArticleDOI

BlinkDB: queries with bounded errors and bounded response times on very large data

TL;DR: BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.
Related Papers (5)