f45588c7-713e-4cb9-8781-33edf8e7758c beie director@blueeyesintelligence.org 10.35940/ijrte.C6439.0910321 Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., & Yuille, A. (2014). Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632.In Proceedings of the IEEE conference on computer vision and pattern recognition (ppKarpathy201510.1109/cvpr.2015.7298932Deep visual-semantic alignments for generating image descriptionsKarpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3128-3137).Karpathy, A., Joulin, A., & Fei-Fei, L. (2014). Deep fragment embeddings for bidirectional image sentence mapping. arXiv preprint arXiv:1406.5679.In Proceedings of the IEEE conference on computer vision and pattern recognition (ppVinyals201510.1109/cvpr.2015.7298935Show and tell: A neural image caption generatorVinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164).In Proceedings of the IEEE conference on computer vision and pattern recognition (ppAnderson201810.1109/cvpr.2018.00636Bottom-up and top-down attention for image captioning and visual question answeringAnderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086).In Proceedings of the IEEE/CVF International Conference on Computer Vision (ppHuang201910.1109/iccv.2019.00473Attention on attention for image captioningHuang, L., Wang, W., Chen, J., & Wei, X. Y. (2019). Attention on attention for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4634-4643).Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621.Journal of Big DataShorten611201910.1186/s40537-019-0197-0A survey on image data augmentation for deep learningShorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 1-48.Inoue, H. (2018). Data augmentation by pairing samples for images classification. arXiv preprint arXiv:1801.02929.Bujimalla, S., Subedar, M., & Tickoo, O. (2021). Data augmentation to improve robustness of image captioning solutions. arXiv preprint arXiv:2106.05437.10.14569/IJACSA.2019.0101074Aldabbas, H., Asad, M., Ryalat, M. H., Malik, K. R., & Qureshi, M. Z. A. (2019). Data Augmentation to Stabilize Image Caption Generation Models in Deep Learning.Mitchell, M., Dodge, J., Goyal, A., Yamaguchi, K., Stratos, K., Han, X., … & Daumé III, H. (2012, April). Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 747-756).Yang, Y., Teo, C., Daumé III, H., & Aloimonos, Y. (2011, July). Corpus-guided sentence generation of natural images. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 444-454).Kuznetsova, P., Ordonez, V., Berg, A., Berg, T., & Choi, Y. (2012, July). Collective generation of natural image descriptions. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 359-368).10.1007/978-3-642-15561-1_2Farhadi, A., Hejrati, M., Sadeghi, M. A., Young, P., Rashtchian, C., Hockenmaier, J., & Forsyth, D. (2010, September). Every picture tells a story: Generating sentences from images. In European conference on computer vision (pp. 15-29). Springer, Berlin, Heidelberg.10.3115/v1/P15-2017Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., … & Mitchell, M. (2015). Language models for image captioning: The quirks and what works. arXiv preprint arXiv:1505.01809.In Proceedings of the IEEE conference on computer vision and pattern recognition (ppYou201610.1109/cvpr.2016.503Image captioning with semantic attentionYou, Q., Jin, H., Wang, Z., Fang, C., & Luo, J. (2016). Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4651-4659).In Proceedings of the IEEE conference on computer vision and pattern recognition (ppYao201710.1109/cvpr.2017.559Incorporating copying mechanism in image captioning for learning novel objectsYao, T., Pan, Y., Li, Y., & Mei, T. (2017). Incorporating copying mechanism in image captioning for learning novel objects. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6580-6588)IEEE transactions on pattern analysis and machine intelligenceKulkarni35122891201310.1109/TPAMI.2012.162Babytalk: Understanding and generating simple image descriptionsKulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., … & Berg, T. L. (2013). Babytalk: Understanding and generating simple image descriptions. IEEE transactions on pattern analysis and machine intelligence, 35(12), 2891-2903.In Proceedings of the IEEE conference on computer vision and pattern recognition (ppFang201510.1109/cvpr.2015.7298754From captions to visual concepts and backFang, H., Gupta, S., Iandola, F., Srivastava, R. K., Deng, L., Dollár, P., … & Zweig, G. (2015). From captions to visual concepts and back. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1473-1482).In Proceedings of the IEEE conference on computer vision and pattern recognition (ppVinyals201510.1109/cvpr.2015.7298935Show and tell: A neural image caption generatorVinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164).Wireless Communications and Mobile Computing 2020Chu202010.1155/2020/8909458Automatic image captioning based on ResNet50 and LSTM with soft attentionChu, Y., Yue, X., Yu, L., Sergei, M., & Wang, Z. (2020). Automatic image captioning based on ResNet50 and LSTM with soft attention. Wireless Communications and Mobile Computing, 2020.NOTE: All the pictures used in this study( Fig [1-13]) are taken from the Flickr8k dataset.