Approximate Query Processing: What is New and Where to Go?: A Survey on Approximate Query Processing
Kaiyu Li,Guoliang Li +1 more
TLDR
The survey can help the partitioners to understand existing AQP techniques and select appropriate methods in their applications and provide research challenges and opportunities of AQP.Abstract:
Online analytical processing (OLAP) is a core functionality in database systems. The performance of OLAP is crucial to make online decisions in many applications. However, it is rather costly to support OLAP on large datasets, especially big data, and the methods that compute exact answers cannot meet the high-performance requirement. To alleviate this problem, approximate query processing (AQP) has been proposed, which aims to find an approximate answer as close as to the exact answer efficiently. Existing AQP techniques can be broadly categorized into two categories. (1) Online aggregation: select samples online and use these samples to answer OLAP queries. (2) Offline synopses generation: generate synopses offline based on a-priori knowledge (e.g., data statistics or query workload) and use these synopses to answer OLAP queries. We discuss the research challenges in AQP and summarize existing techniques to address these challenges. In addition, we review how to use AQP to support other complex data types, e.g., spatial data and trajectory data, and support other applications, e.g., data visualization and data cleaning. We also introduce existing AQP systems and summarize their advantages and limitations. Lastly, we provide research challenges and opportunities of AQP. We believe that the survey can help the partitioners to understand existing AQP techniques and select appropriate methods in their applications.read more
Citations
More filters
Journal ArticleDOI
QTune: a query-aware database tuning system with deep reinforcement learning
TL;DR: A query-aware database tuning system QTune with a deep reinforcement learning (DRL) model, which can efficiently and effectively tune the database configurations based on both the query vector and database states, and which outperforms the state-of-the-art tuning methods.
Journal ArticleDOI
A survey of data partitioning and sampling methods to support big data analysis
Mohammad Sultan Mahmud,Joshua Zhexue Huang,Salman Salloum,Tamer Z. Emara,Kuanishbay Sadatdiynov +4 more
TL;DR: It is believed that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.
Journal ArticleDOI
A Survey of Traffic Prediction: from Spatio-Temporal Data to Intelligent Transportation
Haitao Yuan,Guoliang Li +1 more
TL;DR: Wang et al. as discussed by the authors provided a comprehensive survey on traffic prediction, which is from the spatio-temporal data layer to the intelligent transportation application layer, and split the whole research scope into four parts from bottom to up, where the four parts are, respectively, spatiotemporal data, preprocessing, traffic prediction and traffic application.
Journal ArticleDOI
HeavyKeeper: An Accurate Algorithm for Finding Top-$k$ Elephant Flows
TL;DR: The proposed algorithm called HeavyKeeper incurs small, constant processing overhead per packet and thus supports high line rates, and achieves 99.99% precision with a small memory size, and reduces the error by around 3 orders of magnitude on average compared to the state-of-the-art.
Journal ArticleDOI
Querying shortest paths on time dependent road networks
Yong Wang,Guoliang Li,Nan Tang +2 more
TL;DR: A novel height-balanced tree-structured index, called TD-G-tree, which supports fast route queries over TDRNs and devise efficient algorithms to support TDSP queries, as well as time-interval based route planning, for computing optimal solutions through dynamic programming and chronological divide-and-conquer.
References
More filters
Journal ArticleDOI
Answering queries using views: A survey
TL;DR: The state of the art on the problem of answering queries using views is surveyed, the algorithms proposed to solve it are described, and the disparate works into a coherent framework are synthesized.
Journal ArticleDOI
Probabilistic counting algorithms for data base applications
TL;DR: A class of probabilistic counting algorithms with which one can estimate the number of distinct elements in a large collection of data in a single pass using only a small additional storage and only a few operations per element scanned is introduced.
Journal ArticleDOI
Trajectory Data Mining: An Overview
TL;DR: A systematic survey on the major research into trajectory data mining, providing a panorama of the field as well as the scope of its research topics, and introduces the methods that transform trajectories into other data formats, such as graphs, matrices, and tensors.
Proceedings ArticleDOI
Online aggregation
TL;DR: In this article, the authors propose an online aggregation interface that allows users to both observe the progress of their aggregation queries and control execution on the fly, and present a suite of techniques that extend a database system to meet these requirements.
Proceedings ArticleDOI
BlinkDB: queries with bounded errors and bounded response times on very large data
TL;DR: BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.