Call for Papers

Special Issue of IEEE Transactions on Multimedia

“Weakly Supervised Learning for Image and Video Understanding”



With the goal of addressing fine-level image and video understanding tasks by learning from coarse-level human annotations, WSL is of particular importance in such a big data era as it can dramatically alleviate the human labor for annotating each of the structured visual/multimedia data and thus enables machines to learn from much larger-scaled data but with the equal annotation cost of the conventional fully supervised learning methods. More importantly, when dealing with the data from real-world application scenarios, such as the medical imaging data, remote sensing data, and audio-visual data, fine-level manual annotations are very limited and difficult to obtain. Under these circumstances, the WSL-based learning frameworks, specifically for the WSL-based multi-modality/multi-task learning frameworks, would bring great benefits. Unfortunately, designing effective WSL systems is challenging due to the issues of “semantic unspecificity” and “instance ambiguity”, where the former refers to the setting where the provided semantic label is at image level rather than specific instance-level while the latter refers to the ambiguity when determining an instance sample against the instance part or instance cluster. Principled solutions to address these problems are still under-studied. Nowadays, with the rapid development of advanced machine learning techniques, such as the Graph Convolutional Networks, Capsule Networks, Transformers, Generative Adversarial Networks, and Deep Reinforcement Learning models, new opportunities have emerged for solving the problems in WSL and applying WSL to richer vision and multimedia tasks. This special issue aims at promoting cutting-edge research along this direction and offers a timely collection of works to benefit researchers and practitioners. We welcome high-quality original submissions addressing both novel theoretical and practical aspects related to WSL, as well as the real-world applications based on WSL approaches.



Topics of interests include, but are not limited to:

-          Multi-modality weakly supervised learning theory and framework;

-          Multi-task weakly supervised learning theory and framework;

-          Robust learning theory and framework;

-          Audio-visual learning under weak supervision;

-          Weakly supervised spatial/temporal feature learning;

-          Self-supervised learning frameworks and applications;

-  Graph Convolutional Networks/Graph Neural Networks-based weakly supervised learning frameworks;

-          Deep Reinforcement Learning for weakly supervised learning;

-          Emerging vision and multimedia tasks with limited supervision;



Manuscript submission:           15th January 2021 

Preliminary results:                  15th April 2021

Revisions due:                          1st June 2021

Notification:                             15th July 2021

Final manuscripts due:             15th August 2021

Anticipated publication:           4th quarter of 2021




Papers should be formatted according to the IEEE Transactions on Multimedia guidelines for authors (see: By submitting/resubmitting your manuscript to these Transactions, you are acknowledging that you accept the rules established for publication of manuscripts, including agreement to pay all over-length page charges, color charges, and any other charges and fees associated with publication of the manuscript. Manuscripts (both 1-column and 2-column versions are required) should be submitted electronically through the online IEEE manuscript submission system at When selecting a manuscript type, the authors must choose WSL Special Issue. All submitted papers will go through the same review process as the regular TMM paper submissions. Referees will consider originality, significance, technical soundness, clarity of exposition, and relevance to the special issue topics above.



Dingwen Zhang, Xidian University, China

Chuang Gan, MIT and MIT-IBM Watson AI Lab, USA

Enrico Magli, Politecnico di Torino, Italy

David Crandall, Indiana University, USA

Junwei Han, Northwestern Polytechnical University, China

Fatih Porikli, Australian National University, Australia

Dingwen Zhang
Xidian University
Carnegie Mellon University