Open Access archive

Image Captioning and Comparison of Different Encoders

Ankit Pal, National Institute of Technology Delhi
Subasish Kar, National Institute of Technology Delhi
Anuveksh Taneja, National Institute of Technology Delhi
Vinay Kumar Jadoun, Manipal Institute of Technology

Document Type

Conference Proceeding

Publication Title

Journal of Physics: Conference Series

Abstract

Generation of a sentence given an image, called image captioning, has been one of the most intriguing topics in computer vision. It incorporates knowledge of both image processing and natural language processing. Most of the current approaches integrates the concepts of neural network. Different predefined convolutional neural network (CNN) models are used for extracting features from an image and uni-directional or bi-directional recurrent neural network (RNN) for language modelling. This paper discusses about the commonly used models that are used as image encoder, such as Inception-V3, VGG19, VGG16 and InceptionResNetV2 while using the uni-directional LSTMs for the text generation. Further, the comparative analysis of the result has been obtained using the Bilingual Evaluation Understudy (BLEU) score on the Flickr8k dataset.

DOI

10.1088/1742-6596/1478/1/012004

Publication Date

5-13-2020

Recommended Citation

Pal, Ankit; Kar, Subasish; Taneja, Anuveksh; and Jadoun, Vinay Kumar, "Image Captioning and Comparison of Different Encoders" (2020). Open Access archive. 1493.
https://impressions.manipal.edu/open-access-archive/1493

This document is currently not available here.

COinS

Open Access archive

Image Captioning and Comparison of Different Encoders

Document Type

Publication Title

Abstract

DOI

Publication Date

Recommended Citation

Search

Browse

Author Corner

Open Access archive

Image Captioning and Comparison of Different Encoders

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Recommended Citation

Share

Search

Browse

Author Corner