scispace - formally typeset
Open AccessJournal ArticleDOI

Remote Sensing Image Scene Classification Meets Deep Learning: Challenges, Methods, Benchmarks, and Opportunities

TLDR
This article provides a systematic survey of deep learning methods for remote sensing image scene classification by covering more than 160 papers and discusses the main challenges of remote sensing images classification and survey.
Abstract
Remote sensing image scene classification, which aims at labeling remote sensing images with a set of semantic categories based on their contents, has broad applications in a range of fields. Propelled by the powerful feature learning capabilities of deep neural networks, remote sensing image scene classification driven by deep learning has drawn remarkable attention and achieved significant breakthroughs. However, to the best of our knowledge, a comprehensive review of recent achievements regarding deep learning for scene classification of remote sensing images is still lacking. Considering the rapid evolution of this field, this article provides a systematic survey of deep learning methods for remote sensing image scene classification by covering more than 160 papers. To be specific, we discuss the main challenges of remote sensing image scene classification and survey: first, autoencoder-based remote sensing image scene classification methods; second, convolutional neural network-based remote sensing image scene classification methods; and third, generative adversarial network-based remote sensing image scene classification methods. In addition, we introduce the benchmarks used for remote sensing image scene classification and summarize the performance of more than two dozen of representative algorithms on three commonly used benchmark datasets. Finally, we discuss the promising opportunities for further research.

read more

Citations
More filters
Journal Article

Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling

TL;DR: This paper presents a large-scale scene classification dataset that contains one million aerial images termed MillionAID and designs a designed hierarchical multi-task learning method that achieves the state-of-the-art pixel-wise classification on the challenging GID, which is a profitable attempt to bridge the tile-level scene classification toward pixel- wise semantic labeling for aerial image interpretation.
Journal ArticleDOI

A Weakly Pseudo-Supervised Decorrelated Subdomain Adaptation Framework for Cross-Domain Land-Use Classification

TL;DR: Wang et al. as mentioned in this paper proposed a weakly pseudo-supervised decorrelated subdomain adaptation (WPS-DSA) framework for HSR cross-domain land-use classification.
Journal ArticleDOI

MLFC-net: A multi-level feature combination attention model for remote sensing scene classification

TL;DR: In this article , a multi-level semantic feature clustering attention model based on deep convolution neural networks (DCNNs) was proposed to extract more accurate feature information within remote sensing photographs.
Journal ArticleDOI

Vision-Language Models for Vision Tasks: A Survey

TL;DR: Jiang et al. as mentioned in this paper provided a systematic review of visual language models for various visual recognition tasks, including: (1) the background that introduces the development of visual recognition paradigms; (2) the foundations of VLM that summarize the widely-adopted network architectures, pre-training objectives, and downstream tasks; (3) the widely adopted datasets in VLM pretraining and evaluations; (4) the review and categorization of existing VLM pretraining methods, VLM transfer learning methods, and VLM knowledge distillation methods; (5) the benchmarking, analysis and discussion of the reviewed methods; and 6) several research challenges and potential research directions that could be pursued in the future VLM studies for visual recognition.
Journal ArticleDOI

Multispectral Scene Classification via Cross-Modal Knowledge Distillation

TL;DR: A cross-modal knowledge distillation framework is proposed to improve the performance of multispectral scene classification by transferring the prior knowledge from teacher models pre-trained on RGB images to the student network with limited samples.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Related Papers (5)