Cristin-resultat-ID: 2131882
Sist endret: 21. november 2023, 11:18
NVI-rapporteringsår: 2023
Resultat
Vitenskapelig Kapittel/Artikkel/Konferanseartikkel
2023

Deformable and Structural Representative Network for Remote Sensing Image Captioning

Bidragsytere:
  • Jaya Sharma
  • Divya Peketi
  • Vishnu C.
  • Linga Reddy Cenkeramaddi
  • Shekar B.H. og
  • Krishna Mohan C.

Bok

VISAPP 2023 : 18th International Conference on Computer Vision Theory and Applications
ISBN:
  • 978-989-758-634-7

Utgiver

SciTePress
NVI-nivå 1

Serie

VISIGRAPP
ISSN 2184-5921
e-ISSN 2184-4321
NVI-nivå 1

Om resultatet

Vitenskapelig Kapittel/Artikkel/Konferanseartikkel
Publiseringsår: 2023
Volum: 4
Hefte: 2023
Sider: 56 - 64
ISBN:
  • 978-989-758-634-7

Klassifisering

Fagfelt (NPI)

Fagfelt: IKT
- Fagområde: Realfag og teknologi

Beskrivelse Beskrivelse

Tittel

Deformable and Structural Representative Network for Remote Sensing Image Captioning

Sammendrag

Remote sensing image captioning has greater significance in image understanding that generates textual description of aerial images automatically. Majority of the existing architectures work within the framework of encoder-decoder structure. However, it is noted that the existing encoder-decoder based methods for remote sensing image captioning avoid fine-grained structural representation of objects and lack deep encoding representation of an image. In this paper, we propose a novel structural representative network for capturing fine-grained structures of remote sensing imagery to produce fine grained captions. Initially, a deformable network has been incorporated on intermediate layers of convolutional neural network to take out spatially invariant features from an image. Secondly, a contextual network is incorporated in the last layers of the proposed network for producing multi-level contextual features. In order to extract dense contextual features, an attention mechanism is accomplished in contextual networks. Thus, the holistic representations of aerial images are obtained through a structural representative network by combining spatial and contextual features. Further, features from the structural representative network are provided to multi-level decoders for generating spatially semantic meaningful captions. The textual descriptions obtained due to our proposed approach is demonstrated on two standard datasets, namely, the Sydney-Captions dataset and the UCM-Captions dataset. The comparative analysis is made with recently proposed approaches to exhibit the performance of the proposed approach and hence argue that the proposed approach is more suitable for remote sensing image captioning tasks.

Bidragsytere

Jaya Sharma

  • Tilknyttet:
    Forfatter
    ved Indian Institute of Technology Hyderabad

Divya Peketi

  • Tilknyttet:
    Forfatter
    ved Indian Institute of Technology Hyderabad

Vishnu Chalavadi

Bidragsyterens navn vises på dette resultatet som Vishnu C.
  • Tilknyttet:
    Forfatter
    ved Indian Institute of Technology Hyderabad

Linga Reddy Cenkeramaddi

  • Tilknyttet:
    Forfatter
    ved Institutt for informasjons- og kommunikasjonsteknologi ved Universitetet i Agder

Sekhar B.H.

Bidragsyterens navn vises på dette resultatet som Shekar B.H.
  • Tilknyttet:
    Forfatter
    ved Mangalore University
1 - 5 av 6 | Neste | Siste »

Resultatet er en del av Resultatet er en del av

VISAPP 2023 : 18th International Conference on Computer Vision Theory and Applications.

Press, SciTe. 2023, SciTePress. Vitenskapelig antologi/Konferanseserie
1 - 1 av 1