||Image Visual Description Model
||Rishabh Kanodiya;Smriti Mittal;Shikha Jain
||Computer vision; CNN; Deep learning; GRU; Image context; Image captioning; Visual description; Validity measures
||Image captioning is a keen area of interest for many researchers. With the evolution of machine learning and deep learning, different models are being applied to improve the accuracy and time complexity of the model. However, further improvement in terms of accuracy and time complexity is still an open research challenge. This paper’s contribution is twofold. First, we propose an image captioning model (ImgCap) using a VGG16 Convolution Neural Network and Gated Recurrent Unit (GRU) to generate the captions. Next, a similarity metric (SimM) is proposed in order to compare the generated captions with the expected ones. Furthermore, the proposed model is compared with an existing Long Short-Term Memory (LSTM)-based model. We observe that the proposed model outperforms the existing one in terms of both accuracy and time complexity.