Abstract: In remote sensing (RS), convolutional neural networks (CNNs) are well-recognized for their spatial–spectral feature extraction capabilities, whereas vision transformers (ViTs), which ...
Abstract: Self-supervised monocular depth estimation (MDE) typically employs convolutional neural networks (CNNs) or Transformers to predict scene depth. However, CNNs struggle with long-range ...