Every day, we encounter a large number of images from various sources such as the internet, news articles, document diagrams and advertisements.The main aim of this paper is to provide a comprehensive survey of deep learning for image captioning.In traditional machine learning, hand crafted features such as Local Binary Patterns (LBP) [107], Scale-Invariant Feature Transform (SIFT) [87], the Histogram of Oriented Gradients (HOG) [27], and a combination of such features are widely used.Image indexing is important for Content-Based Image Retrieval (CBIR) and therefore, it can be applied to many areas, including biomedicine, commerce, the military, education, digital libraries, and web searching.For example, Convolutional Neural Networks (CNN) [79] are widely used for feature learning, and a classifier such as Softmax is used for classification.CNN is generally followed by Recurrent Neural Networks (RNN) in order to generate captions.These survey papers mainly discussed template based, retrieval based, and a very few deep learning-based novel image caption generating models.Social media platforms such as Facebook and Twitter can directly generate descriptions from images.Generating well-formed sentences requires both syntactic and semantic understanding of the language [143].Since hand crafted features are task specific, extracting features from a large and diverse set of data is not feasible.On the other hand, in deep machine learning based techniques, features are learned automatically from training data and they can handle a large and diverse set of images and videos.Although the papers have presented a good literature survey of image captioning, they could only cover a few papers on deep learning because the bulk of them was published after the survey papers.To provide an abridged version of the literature, we present a survey mainly focusing on the deep learning-based papers on image captioning.