Abstract: Visual Dialog is a typical AI-agent task on images, in which the agent interprets information from heterogeneous modalities and provides the correct answer. In this area, most approaches are ...