Dear Colleague,



On behalf of the organizing committee, we sincerely invite you to submit
papers to the MULEA Workshop (https://sites.google.com/view/mulea2019/home), in
conjunction with ACM Multimedia 2019 (https://www.acmmm.org/2019/). We
invite you to help define new directions in this broad research area!



The MULEA Workshop and the Conference will be held in France, from
2019/10/21 through 2019/10/25.  The full name of the Workshop is the
*1st International
Workshop on Multimodal Understanding and Learning for Embodied Applications*,
and it will cover many of the applications in AI, such as robotics,
autonomous driving, multimodal chatbots, or simulated games. It also covers
many new and exciting research areas.



*Paper submission deadline: July 15, 2019.* This inter-disciplinary Workshop
has considerable breadth and diversity, across several research fields,
including language, vision, robotics, etc. We therefore encourage 2-page
Abstracts and Cross-Submissions, in addition to regular papers. Please
refer to the Workshop website. The topics of the workshop include but are
not limited to:



1. Multimodal context understanding.  Context include environment,
task/goal states, dynamic objects of the scene, activities, etc. Relevant
research streams include visually grounded learning, context understanding,
and environmental modeling which includes 3D environment modeling and
understanding. Language grounding is also an interesting topic. Connecting
the vision and language modalities is essential in applications such as
question answering and image captioning. Other relevant research areas
include multimodal understanding, context modeling, and grounded dialog
systems.



2. Knowledge inference. Knowledge in this multimedia scenario is
represented with knowledge graph, scene graph, memory, etc.  Representing
contextual knowledge is a topic that has attracted much interest, and
goal-driven knowledge representation and reasoning are also new research
directions. Deep learning methods are good options to deal
with unstructured multimodal knowledge signals.



3. Embodied learning. Building on context understanding and knowledge
representation, the policy generates actions for intelligent agents
to achieve goals or finish tasks. The input signals are multimodal and can
be images or dialogs etc., and the learning policies not only need to
provide short-term reactions, but also need to plan its actions to
optimally finish the long-term goals. The actions may involve navigation
and localizations as well, which are mainstream in the robotics and
self-driving vehicle fields. This is relevant to reinforcement learning,
and the algorithms are driven by multiple industrial applications in
robotics, self-driving vehicles, simulated games, multimodal chatbots, etc.



Please feel free to forward to colleagues who might be interested. If there
are suggestions or comments, those will be welcome and appreciated! We are
sorry if you receive duplicate copies of this email.



Best Regards,

-John and Tim (Co-Chairs)