University of Lincoln
Browse

Insights into Object Semantics Leveraging Transformer Networks for Advanced Image Captioning.pdf

Download (1.51 MB)
journal contribution
posted on 2024-03-15, 15:35 authored by Deema Abdal HafethDeema Abdal Hafeth, Stefanos Kollias
<p>Image captioning is a technique used to generate descriptive captions for images. Typically, it involves employing a Convolutional Neural Network (CNN) as the encoder to extract visual features, and a decoder model, often based on Recurrent Neural Networks (RNNs), to generate the captions. Recently, the encoder–decoder architecture has witnessed the widespread adoption of the self-attention mechanism. However, this approach faces certain challenges that require further research. One such challenge is that the extracted visual features do not fully exploit the available image information, primarily due to the absence of semantic concepts. This limitation restricts the ability to fully comprehend the content depicted in the image. To address this issue, we present a new image-Transformer-based model boosted with image object semantic representation. Our model incorporates semantic representation in encoder attention, enhancing visual features by integrating instance-level concepts. Additionally, we employ Transformer as the decoder in the language generation module. By doing so, we achieve improved performance in generating accurate and diverse captions. We evaluated the performance of our model on the MS-COCO and novel MACE datasets. The results illustrate that our model aligns with state-of-the-art approaches in terms of caption generation. </p>

History

School affiliated with

  • School of Computer Science (Research Outputs)
  • College of Health and Science (Research Outputs)

Publication Title

Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning

Volume

24

Issue

6

Pages/Article Number

1796

Publisher

MDPI

Date Submitted

2023-12-14

Date Accepted

2024-03-03

Date of Final Publication

2024-03-11

Open Access Status

  • Open Access

Date Document First Uploaded

2024-03-14

Usage metrics

    University of Lincoln (Research Outputs)

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC