Abstract: Transformer-based object detection models usually adopt an encoding-decoding architecture that mainly combines self-attention (SA) and multilayer perceptron (MLP). Although this architecture ...
Abstract: Classifying tiny objects in remote sensing images (e.g., a 20x20 pixels target within a resolution 1000x1000 image) is a significant challenge. This paper adopts a fused FPN (Feature Pyramid ...