Self-Attention Based Visual Dialogue
Vaibhav Mathur1, Divyansh Jha2, Sunil Kumar3
1Vaibhav Mathur, Electrical and Electronics, Birla Institute of Technology and Science, Pilani, India.
2Divyansh Jha, Electronics and Communication, Maharaja Agrasen Institute of Technology, India.
3Sunil Kumar, Electronics and communication, Maharaja Agarsen Institute of Technology, India.
Manuscript received on 05 August 2019. | Revised Manuscript received on 13 August 2019. | Manuscript published on 30 September 2019. | PP: 8792-8795 | Volume-8 Issue-3 September 2019 | Retrieval Number: C5306098319/2019©BEIESP | DOI: 10.35940/ijrte.C5306.098319
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: We improvised the performance on the task of Visual Dialogue. We integrate a novel mechanism called self-attention to improve the results reported in the original Visual Dialogue paper. Visual Dialogue is different from other downstream tasks and serves as a universal test of machine intelligence. The model has to be adroit in both vision and language enough to allow assessment of individual answers and observe the development. The dataset used in this paper is VisDial v0.9 which is collected by Georgia Tech University. We used the same train/test splits as the original paper to estimate the result. It contains a total of approximately 1.2 million dialogue question-answer pair which has ten question-answer pairs on ~120,000 images from COCO.To keep the comparison fair and simple, we have used the encoder-decoder architecture namely Late-Fusion Encoder and Discriminative decoder. We included the self-attention module from SAGAN paper into the encoder. The inclusion self-attention module was based in the fact that a lot of answers from visual dialog model were solely based on the questions asked and not on the image. So, the hypothesis is that the self-attention module will make the model attend to the image while generating an answer.
Keywords: Performance Mechanism Called Self-Attention.
Scope of the Article: Performance Evaluation of Networks