Journal ArticleDOI
The Unreasonable Effectiveness of Data
TLDR
A trillion-word corpus - along with other Web-derived corpora of millions, billions, or trillions of links, videos, images, tables, and user interactions - captures even very rare aspects of human behavior.Abstract:
At Brown University, there is excitement of having access to the Brown Corpus, containing one million English words. Since then, we have seen several notable corpora that are about 100 times larger, and in 2006, Google released a trillion-word corpus with frequency counts for all sequences up to five words long. In some ways this corpus is a step backwards from the Brown Corpus: it's taken from unfiltered Web pages and thus contains incomplete sentences, spelling errors, grammatical errors, and all sorts of other errors. It's not annotated with carefully hand-corrected part-of-speech tags. But the fact that it's a million times larger than the Brown Corpus outweighs these drawbacks. A trillion-word corpus - along with other Web-derived corpora of millions, billions, or trillions of links, videos, images, tables, and user interactions - captures even very rare aspects of human behavior. So, this corpus could serve as the basis of a complete model for certain tasks - if only we knew how to extract the model from the data.read more
Citations
More filters
Book
Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Book
Machine Learning : A Probabilistic Perspective
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Journal ArticleDOI
A survey on Image Data Augmentation for Deep Learning
TL;DR: This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing DataAugmentation, a data-space solution to the problem of limited data.
Book
Linked Data: Evolving the Web into a Global Data Space
TL;DR: This Synthesis lecture provides readers with a detailed technical introduction to Linked Data, including coverage of relevant aspects of Web architecture, as the basis for application development, research or further study.
Proceedings ArticleDOI
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
TL;DR: In this paper, the authors investigated how the performance of current vision tasks would change if this data was used for representation learning and found that the performance on vision tasks increases logarithmically based on volume of training data size.
References
More filters
Book
Computational analysis of present-day American English
Henry Kučera,W. Nelson Francis,W. F. Twaddell,Mary Lois Marckworth,Laura M. Bell,John Bissell Carroll +5 more
Book
A Comprehensive Grammar of the English Language
Randolph Quirk,David Crystal +1 more
TL;DR: A Comprehensive grammar of the English language as mentioned in this paper, a comprehensive grammar of English language, a Comprehensive grammar for English language, and a comprehensive grammars of English, is an example of such a grammar.
Journal ArticleDOI
The unreasonable effectiveness of mathematics in the natural sciences. Richard courant lecture in mathematical sciences delivered at New York University, May 11, 1959
Book
Introduction to statistical relational learning
Lise Getoor,Ben Taskar +1 more
TL;DR: In Introduction to Statistical Relational Learning, leading researchers in this emerging area of machine learning describe current formalisms, models, and algorithms that enable effective and robust reasoning about richly structured systems and data.