apologies for cross-posting
Concept Extraction Challenge @
the 3rd Workshop on Making Sense of Microposts (#MSM2013)
at WWW 2013
13th May 2013. Rio de Janeiro, Brazil
#MSM2013 will host a 'Concept Extraction Challenge', with a prize
sponsored by eBay, where participants must label Microposts in a
given dataset with the concepts referenced. Existing concept extraction
tools are intended for use over news corpora and similar document-based
corpora with relatively long length. The aim of the challenge is to
foster research into novel, more accurate concept extraction for
(much shorter) Micropost data.
The goal of the challenge is to detect concepts contained in Microposts.
Concepts are defined as abstract notions of things; for this challenge we
are constraining the task to the extraction of entity concepts
characterised by an entity type and an entity value. We consider four
entity types defined as follows:
1. Person (PER) - references in the Micropost to a full or partial person
Obama responds to diversity criticism
2. Location (LOC) - references in the Micropost to full or partial location
names including: cities, provinces or states, countries, continents and
Finally on the train to London ahhhh
3. Organisation (ORG) - references in the Micropost to full or partial
organisation names including academic, state, governmental, military and
business or enterprise organisations.
NASA's Donated Spy Telescopes May Aid Dark Energy Search
4. Miscellaneous (MISC) - references in the Micropost to a concept not covered
by any of the categories above, but limited to one of the entity types:
film/movie, entertainment award event, political event, programming language,
sporting event, TV show, nationality, and (spoken or written) language.
Okay, now this is getting seriously bizarre. Like a Monty Python script
Two datasets covering a variety of topics of discussion have been provided:
one for training and one for testing. The complete dataset (both training and
testing data) contains 4265 manually annotated microposts using the above
definitions. The dataset is split by 60%/40% for training and testing.
A tab-separated data with the following element indices per micropost:
- Element 1: The numeric ID of the micropost
- Element 2: The concepts found within the micropost, described by an entity
type and an entity instance. These are semi-colon separated values
- Element 3: The content of the micropost - this is what the concepts were
detected and extracted from.
Also tab-separated data, but unlike the training dataset the concepts have
not been extracted:
-Element 1: The numeric ID of the micropost
-Element 2: The content of the micropost, this is what you must use to detect
and extract the concepts contained.
Anonymisation and Special Terms
To ensure anonymity all username mentions in the microposts have been replaced
with '_Mention_', and all URLs with '_URL_'.
The datasets can be downloaded from: http://oak.dcs.shef.ac.uk/msm2013/ie_challenge
In order to evaluate your submissions we require you to submit (along with a
paper describing your approach) a tab-separated value (TSV) file with the
following format for the microposts in the test dataset:
-Element 1: The numeric ID of the micropost.
-Element 2: The entity type and entity instance detected in each micropost. These
are semi-colon separated values (e.g. PER/Obama;ORG/NASA).
For instance, your results would be formatted as:
This file will be parsed and the accuracy of each approach computed. Accuracy
will be judged using the f-measure (with beta = 1 so precision and recall are weighted
equally). This will be computed on a per entity-type/entity-instance pair basis and
then averaged across the four entity types. We will also provide entity-type specific
f-measure values for each team to assess how each approach fares across the different
The best submission to the Micropost Concept Extraction Challenge will receive
an award of (US)$1500, generously sponsored by eBay. Information extraction
challenges associated with treating eBay items, often of short textual content, are
very similar to those used to treat other short textual microposts. By teaming up with
eBay to make the challenge possible, the MSM workshop organisers wish to highlight this
aspect of the micropost extraction research question.
The Challenge Committee will judge submissions based on the outcome of the evaluation
procedure described above, and a review of the extended abstracts, to obtain insight
into the quality and applicability of the approaches taken. A selection of the submissions
accepted will be presented at the challenge. All accepted submissions will be published in
a separate CEUR compendium and made available from the workshop website.
Submissions is as a zip file using your system name as the file name (e.g. 'awesomeo9000.zip'),
1. a TSV file with your system name (e.g. 'awesomeo9000.tsv').
2. an extended abstract of 2 pages describing your approach and how you tuned/tested it using
the training split.
Written submissions should be prepared according to the ACM SIG Proceedings Template
(see http://www.acm.org/sigs/publications/proceedings-templates), and should include author
names and affiliations, and 3-5 keywords. Submission is via the EasyChair Conference System,
Challenge Data release: 17 Jan 2013
Intent to submit to challenge: 03 Mar 2013
Challenge Submission deadline: 17 Mar 2013
Challenge Notification: 31 Mar 2013
Challenge camera-ready deadline: 07 Apr 2012
(all deadlines 23:59 Hawaii Time)
Workshop program issued: 09 Apr 2013
Challenge proceedings to be published via CEUR
Workshop - 13 May 2013 (Registration open to all)
E-mail: [log in to unmask]
Facebook Group: http://www.facebook.com/#!/home.php?sk=group_180472611974910
Facebook Public Event page: http://www.facebook.com/events/116134955169543
Twitter hashtag: #msm2013
W3C Microposts Community Group: http://www.w3.org/community/microposts
Matthew Rowe, Lancaster University, UK
Milan Stankovic, Université Paris-Sorbonne, France
Aba-Sah Dadzie, The University of Sheffield, UK
A. Elizabeth Cano, KMi, The Open University, UK
Steering Committee & Local Chair:
Bernardo Pereira Nunes, PUC-Rio, Brazil / L3S Research Center, Germany
Naren Chittar, eBay, USA
Peter Mika, Yahoo! Research, Spain
Andrea Varga, OAK Group, University of Sheffield, UK
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).
For news of CHI books, courses & software, join CHI-RESOURCES
mailto: [log in to unmask]
To unsubscribe from CHI-ANNOUNCEMENTS send an email to
mailto:[log in to unmask]
For further details of CHI lists see http://listserv.acm.org