Call for Participation
MediaEval 2015 Multimedia Benchmark Evaluation
Early registration deadline: 1 May 2015
MediaEval is a multimedia benchmark evaluation that offers tasks
promoting research and innovation in areas related to human and social
aspects of multimedia. MediaEval 2015 focuses on aspects of multimedia
including and going beyond visual content, such as language, speech,
music, and social factors. Participants carry out one or more of the
tasks offered and submit runs to be evaluated. They then write up their
results and present them at the MediaEval 2015 workshop.
For each task, participants receive a task definition, task data and
accompanying resources (dependent on task) such as shot boundaries,
keyframes, visual features, speech transcripts and social metadata. In
order to encourage participants to develop techniques that push forward
the state-of-the-art, a “recommended reading" list of papers will be
provided for each task.
Participation is open to all interested research groups. To sign up,
please click the “MediaEval 2015 Registration” link at:
The following tasks are available to participants at MediaEval 2015:
*QUESST: Query by Example Search on Speech Task*
The task involves searching FOR audio content WITHIN audio content USING
an audio content query. This task is particularly interesting for speech
researchers in the area of spoken term detection or
low-resource/zero-resource speech processing. The primary performance
metric will be the normalized cross entropy cost (Cnxe).
*Multimodal Person Discovery in Broadcast TV (New in 2015!)*
Given raw TV broadcasts, each shot must be automatically tagged with the
name(s) of people who can be both seen as well as heard in the shot. The
list of people is not known a priori and their names must be discovered
in an unsupervised way from provided text overlay or speech transcripts.
The task will be evaluated on a new French corpus (provided by INA) and
the AGORA Catalan corpus, using standard information retrieval metrics
based on a posteriori collaborative annotation of the corpus.
*C@merata: Querying Musical Scores*
The input is a natural language phrase referring to a musical feature
(e.g., ‘consecutive fifths’) together with a classical music score, and
the required output is a list of passages in the score which contain
that feature. Scores are in the MusicXML format, which can capture most
aspects of Western music notation. Evaluation is via versions of
Precision and Recall relative to a Gold Standard produced by the organisers.
*Affective Impact of Movies (including Violent Scenes Detection)*
In this task participating teams are expected to classify short movie
scenes by their affective content according to two use cases: (1) the
presence of depicted violence, and (2) their emotional impact (valence,
arousal). The training data consists of short Creative Commons-licensed
movie scenes (both professional and amateur) together with human
annotations of violence and valence-arousal ratings. The results will be
evaluated using standard retrieval and classification metrics.
*Emotion in Music (An Affect Task)*
We aim at detecting emotional dynamics of music using its content. Given
a set of songs, participants are asked to automatically generate
continuous emotional representations in arousal and valence.
*Retrieving Diverse Social Images*
This task requires participants to refine a ranked list of Flickr photos
with location related information using provided visual, textual and
user credibility information. Results are evaluated with respect to
their relevance to the query and the diverse representation of it.
*Placing: Multimodal Geo-location Prediction*
The Placing Task requires participants to estimate the locations where
multimedia items (photos or videos) were captured solely by inspecting
the content and metadata of these items, and optionally exploiting
additional knowledge sources such as gazetteers. Performance is
evaluated using the distance to the ground truth coordinates of the
*Verifying Multimedia Use (New in 2015!)*
For this task, the input is a tweet about an event that has the profile
to be of interest in the international news, and the accompanying
multimedia item (image or video). Participants must build systems that
output a binary decision representing a verification of whether the
multimedia item reflects the reality of the event in the way purported
by the tweet. The task is evaluated using the F1 score. Participants are
also requested to return a short explanation or evidence for the
*Context of Experience: Recommending Videos Suiting a Watching Situation
(New in 2015!)*
This task develops multimodal techniques for automatic prediction of
multimedia in a specific consumption context. In particular, we focus on
the context of predicting movies that are suitable to watch on
airplanes. Input to the prediction methods are movie trailers, and
metadata from IMDb. Output is evaluated using the Weighted F1 score,
with expert labels as ground truth.
*Synchronization of Multi-User Event Media*
This task addresses the challenge of automatically creating a
chronologically-ordered outline of multiple multimedia collections
corresponding to the same event. Given N media collections (galleries)
taken by different users/devices at the same event, the goal is to find
the best (relative) time alignment among them and detect the significant
sub-events over the whole gallery. Performance is evaluated using ground
truth time codes and actual event schedules.
*DroneProtect: Mini-drone Video Privacy Task (New in 2015!)*
Recent popularity of mini-drones and their rapidly increasing adoption
in various areas, including photography, news reporting, cinema, mail
delivery, cartography, agriculture, and military, raises concerns for
privacy protection and personal safety. Input to the task is drone
video, and output is version of the video which protects privacy while
retaining key information about the event or situation recorded.
*Search and Anchoring in Video Archives*
The 2015 Search and Anchoring in Video Archives task consists of two
sub-tasks: search for multimedia content and automatic anchor selection.
In the “search for multimedia content” sub-task, participants use
multimodal textual and visual descriptions of content of interest to
retrieve potentially relevant video segments from within a collection.
In the “automatic anchor selection” sub-task, participants automatically
predict key elements of videos as anchor points for the formation of
hyperlinks to relevant content within the collection. The video
collection consists of professional broadcasts from BBC or
semi-professional user generated content. Participant submissions will
be assessed using professionally-created anchors, and
MediaEval 2015 Timeline
(dates vary slightly from task to task, see the individual task pages
for the individual deadlines: http://www.multimediaeval.org/mediaeval2015)
Mid—March-May: Registration and return usage agreements.
May-June: Release of development/training data.
June-July: Release of test data.
Mid-Aug.: Participants submit their completed runs, and receive results.
End Aug: Participants submit their 2-page working notes papers.
14-15 September: MediaEval 2015 Workshop, Wurzen, Germany. Workshop as a
satellite event of Interspeech 2015, held nearby in Dresden the previous
We ask you to register by 1 May (because of the timing of the first wave
of data releases). After that point, late registration will be possible,
but we encourage teams to register as early as they can.
For questions or additional information please contact Martha Larson
[log in to unmask] or visit http://www.multimediaeval.org.
The ISCA SIG SLIM: Speech and Language in Multimedia
(http://slim-sig.irisa.fr) is a key supporter of MediaEval. This year,
the MediaEval workshop will be held as a satellite event of Interspeech
A large number of organizations and projects make a contribution to
MediaEval organization, including the projects (alphabetical): Camomile
(http://www.chistera.eu/projects/camomile), CNGL (http://www.cngl.ie),
COMMIT/ (http://www.commit-nl.nl), CrowdRec (http://crowdrec.eu), EONS
(http://phenicx.upf.edu), Reveal (http://revealproject.eu), VideoSense
(http://www.videosense.eu), Visen (http://www.chistera.eu/projects/visen).
For news of CHI books, courses & software, join CHI-RESOURCES
mailto: [log in to unmask]
To unsubscribe from CHI-ANNOUNCEMENTS send an email to
mailto:[log in to unmask]
For further details of CHI lists see http://listserv.acm.org