Lakhasly

Online English Summarizer tool, free and accurate!

Summarize result (50%)

Every day, we encounter a large number of images from various sources such as the internet, news
articles, document diagrams and advertisements.The main aim of this paper is to provide a comprehensive survey of deep learning for image
captioning.In traditional machine learning, hand crafted features such as Local Binary Patterns (LBP) [107],
Scale-Invariant Feature Transform (SIFT) [87], the Histogram of Oriented Gradients (HOG) [27],
and a combination of such features are widely used.Image indexing is important for Content-Based Image Retrieval (CBIR) and therefore,
it can be applied to many areas, including biomedicine, commerce, the military, education, digital
libraries, and web searching.For example,
Convolutional Neural Networks (CNN) [79] are widely used for feature learning, and a classifier
such as Softmax is used for classification.CNN is generally followed by Recurrent Neural Networks
(RNN) in order to generate captions.These survey papers mainly discussed template based, retrieval
based, and a very few deep learning-based novel image caption generating models.Social media platforms such as Facebook and Twitter can directly
generate descriptions from images.Generating well-formed sentences requires both syntactic and semantic understanding
of the language [143].Since hand crafted features are task specific, extracting features from
a large and diverse set of data is not feasible.On the other hand, in deep machine learning based techniques, features are learned automatically
from training data and they can handle a large and diverse set of images and videos.Although the papers have presented a good literature survey of
image captioning, they could only cover a few papers on deep learning because the bulk of them was
published after the survey papers.To provide an abridged version of the literature, we present a survey mainly focusing
on the deep learning-based papers on image captioning.


Original text

Every day, we encounter a large number of images from various sources such as the internet, news
articles, document diagrams and advertisements. These sources contain images that viewers would
have to interpret themselves. Most images do not have a description, but the human can largely
understand them without their detailed captions. However, machine needs to interpret some form
of image captions if humans need automatic image captions from it.
Image captioning is important for many reasons. For example, they can be used for automatic image indexing. Image indexing is important for Content-Based Image Retrieval (CBIR) and therefore,
it can be applied to many areas, including biomedicine, commerce, the military, education, digital
libraries, and web searching. Social media platforms such as Facebook and Twitter can directly
generate descriptions from images. The descriptions can include where we are (e.g., beach, cafe),
what we wear and importantly what we are doing there.
Image captioning is a popular research area of Artificial Intelligence (AI) that deals with image
understanding and a language description for that image. Image understanding needs to detect and
recognize objects. It also needs to understand scene type or location, object properties and their
interactions. Generating well-formed sentences requires both syntactic and semantic understanding
of the language [143].
Understanding an image largely depends on obtaining image features. The techniques used for
this purpose can be broadly divided into two categories: (1) Traditional machine learning based
techniques and (2) Deep machine learning based techniques.
In traditional machine learning, hand crafted features such as Local Binary Patterns (LBP) [107],
Scale-Invariant Feature Transform (SIFT) [87], the Histogram of Oriented Gradients (HOG) [27],
and a combination of such features are widely used. In these techniques, features are extracted
from input data. They are then passed to a classifier such as Support Vector Machines (SVM) [17]
in order to classify an object. Since hand crafted features are task specific, extracting features from
a large and diverse set of data is not feasible. Moreover, real world data such as images and video
are complex and have different semantic interpretations.
On the other hand, in deep machine learning based techniques, features are learned automatically
from training data and they can handle a large and diverse set of images and videos. For example,
Convolutional Neural Networks (CNN) [79] are widely used for feature learning, and a classifier
such as Softmax is used for classification. CNN is generally followed by Recurrent Neural Networks
(RNN) in order to generate captions.
In the last 5 years, a large number of articles have been published on image captioning with deep
machine learning being popularly used. Deep learning algorithms can handle complexities and
challenges of image captioning quite well. So far, only three survey papers [8, 13, 75] have been
published on this research topic. Although the papers have presented a good literature survey of
image captioning, they could only cover a few papers on deep learning because the bulk of them was
published after the survey papers. These survey papers mainly discussed template based, retrieval
based, and a very few deep learning-based novel image caption generating models. However, a
large number of works have been done on deep learning-based image captioning. Moreover, the
availability of large and new datasets has made the learning-based image captioning an interesting
research area. To provide an abridged version of the literature, we present a survey mainly focusing
on the deep learning-based papers on image captioning.
The main aim of this paper is to provide a comprehensive survey of deep learning for image
captioning. First,


Summarize English and Arabic text online

Summarize text automatically

Summarize English and Arabic text using the statistical algorithm and sorting sentences based on its importance

Download Summary

You can download the summary result with one of any available formats such as PDF,DOCX and TXT

Permanent URL

ٌYou can share the summary link easily, we keep the summary on the website for future reference,except for private summaries.

Other Features

We are working on adding new features to make summarization more easy and accurate


Latest summaries

تأسست مجموعة ال...

تأسست مجموعة الريادة المصرية كمجموعة شركات استثمارية رائدة في التطوير العقاري والمقاوالت والتوريدات...

تعد الأسرة الرك...

تعد الأسرة الركيزة الأساسية التي يتم عليها بناء المجتمع، ويضع عليها أكبر حمله؛ فهي المسؤولة عن اعداد...

شرح مجمع الأصول...

شرح مجمع الأصول الدرس الاول: بسم الله الرحمن الرحيم يسر الادارة العامة للتوجيه والارشاد بالمسجد ال...

هو قانون اعلنه ...

هو قانون اعلنه محمد باي في 9 سبتمبر 1888. اقتداء بالتنظيمات العثمانيه وتحت الضغط الدول الاوروبي وقد ...

موقف الإسلام من...

موقف الإسلام من العولمة وواجب المسلمين في مقاومتها. العولمة ليست أكثر من دعاية مزيفة. القوى العظمى ...

The power sourc...

The power source for the model's electrically analogous cardiovascular system is changing elastance....

الحكم الذاتي ال...

الحكم الذاتي الداخلي (1954-1956) مفاوضات الحكم الذاتي 1954: بعد تصاعد المقاومة المسلحة وضغوط الحركات...

تعد الاستقلالية...

تعد الاستقلالية أحد العناصر الأساسية المميزة للسلطة الوطنية المستقلة لضبط السمعي البصري، عن غيرها م...

1.Develop advan...

1.Develop advanced drone hardware: - Design and build a drone with high-resolution cameras and sp...

يعتبر العمران م...

يعتبر العمران من المسائل المسلم بها في كافة المجتمعات المتحضرة، كما يبين مدى تطورها ومستوى الحضارة ...

ميرا، المتقدمة ...

ميرا، المتقدمة لوظيفة مدرس في مدرسة، قدمت مقابلة تستعرض فيها خلفيتها العملية والتعليمية. تحدثت عن خب...

إن لموضوع الدرا...

إن لموضوع الدراسة أهمية كبيرة ، فهو من المواضيع التي تهدف إلى تسيير المدينة وتنظيم الممارسات العمران...