ACM SIGCHI General Interest Announcements (Mailing List)


Options: Use Forum View

Use Monospaced Font
Show HTML Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
1.0 (Mac OS X Mail 6.2 \(1499\))
text/plain; charset="windows-1252"
Thu, 17 Jan 2013 23:01:08 +0000
ElizabethCanoBasave <[log in to unmask]>
"ACM SIGCHI General Interest Announcements (Mailing List)" <[log in to unmask]>
ElizabethCanoBasave <[log in to unmask]>
text/plain (211 lines)
apologies for cross-posting


 		          Concept Extraction Challenge @
  		the 3rd Workshop on Making Sense of Microposts (#MSM2013)
               				at WWW 2013

    		13th May 2013. Rio de Janeiro, Brazil


#MSM2013 will host a 'Concept Extraction Challenge', with a prize 
sponsored by eBay, where participants must label Microposts in a 
given dataset with the concepts referenced. Existing concept extraction 
tools are intended for use over news corpora and similar document-based 
corpora with relatively long length. The aim of the challenge is to 
foster research into novel, more accurate concept extraction for 
(much shorter) Micropost data.

The goal of the challenge is to detect concepts contained in Microposts. 
Concepts are defined as abstract notions of things; for this challenge we 
are constraining the task to the extraction of entity concepts 
characterised by an entity type and an entity value. We consider four 
entity types defined as follows:

1. Person (PER) - references in the Micropost to a full or partial person 
	Obama responds to diversity criticism 
Extracted instances:

2. Location (LOC) - references in the Micropost to full or partial location 
names including: cities, provinces or states, countries, continents and 
(physical) facilities.
	Finally on the train to London ahhhh
Extracted instances:

3. Organisation (ORG) - references in the Micropost to full or partial 
organisation names including academic, state, governmental, military and 
business or enterprise organisations.
NASA's Donated Spy Telescopes May Aid Dark Energy Search
Extracted instances:

4. Miscellaneous (MISC) - references in the Micropost to a concept not covered 
by any of the categories above, but limited to one of the entity types: 
film/movie, entertainment award event, political event, programming language, 
sporting event, TV show, nationality, and (spoken or written) language.
	Okay, now this is getting seriously bizarre. Like a Monty Python script 
	gone wrong.
Extracted Instances:
	MISC/Monty Python;

Two datasets covering a variety of topics of discussion have been provided: 
one for training and one for testing. The complete dataset (both training and 
testing data) contains 4265 manually annotated microposts using the above 
definitions. The dataset is split by 60%/40% for training and testing. 

Training Dataset
A tab-separated data with the following element indices per micropost:
- Element 1: The numeric ID of the micropost
- Element 2: The concepts found within the micropost, described by an entity 
type and an entity instance. These are semi-colon separated values 
(e.g. PER/Obama;ORG/NASA).
- Element 3: The content of the micropost - this is what the concepts were 
detected and extracted from.

Test Dataset
Also tab-separated data, but unlike the training dataset the concepts have 
not been extracted:
-Element 1: The numeric ID of the micropost
-Element 2: The content of the micropost, this is what you must use to detect 
and extract the concepts contained.

Anonymisation and Special Terms
To ensure anonymity all username mentions in the microposts have been replaced 
with '_Mention_', and all URLs with '_URL_'.

Data Access
The datasets can be downloaded from: 

In order to evaluate your submissions we require you to submit (along with a 
paper describing your approach) a tab-separated value (TSV) file with the 
following format for the microposts in the test dataset:
-Element 1: The numeric ID of the micropost.
-Element 2: The entity type and entity instance detected in each micropost. These 
are semi-colon separated values (e.g. PER/Obama;ORG/NASA).

For instance, your results would be formatted as:
2560     PER/Obama;ORG/NASA
2562     ORG/FDA;

This file will be parsed and the accuracy of each approach computed. Accuracy 
will be judged using the f-measure (with beta = 1 so precision and recall are weighted 
equally). This will be computed on a per entity-type/entity-instance pair basis and 
then averaged across the four entity types. We will also provide entity-type specific 
f-measure values for each team to assess how each approach fares across the different 

The best submission to the Micropost Concept Extraction Challenge will receive 
an award of (US)$1500, generously sponsored by eBay. Information extraction 
challenges associated with treating eBay items, often of short textual content, are 
very similar to those used to treat other short textual microposts. By teaming up with 
eBay to make the challenge possible, the MSM workshop organisers wish to highlight this 
aspect of the micropost extraction research question.

The Challenge Committee will judge submissions based on the outcome of the evaluation 
procedure described above, and a review of the extended abstracts, to obtain insight 
into the quality and applicability of the approaches taken. A selection of the submissions 
accepted will be presented at the challenge. All accepted submissions will be published in 
a separate CEUR compendium and made available from the workshop website.

Submissions is as a zip file using your system name as the file name (e.g. ''), 
1. a TSV file with your system name (e.g. 'awesomeo9000.tsv').
2. an extended abstract of 2 pages describing your approach and how you tuned/tested it using 
the training split.

Written submissions should be prepared according to the ACM SIG Proceedings Template 
(see, and should include author 
names and affiliations, and 3-5 keywords. Submission is via the EasyChair Conference System, 


Challenge Data release: 17 Jan 2013
Intent to submit to challenge: 03 Mar 2013
Challenge Submission deadline: 17 Mar 2013
Challenge Notification: 31 Mar 2013
Challenge camera-ready deadline: 07 Apr 2012

(all deadlines 23:59 Hawaii Time)

Workshop program issued: 09 Apr 2013
Challenge proceedings to be published via CEUR
Workshop - 13 May 2013 (Registration open to all)


E-mail: [log in to unmask]
Facebook Group:!/home.php?sk=group_180472611974910
Facebook Public Event page:
Twitter hashtag: #msm2013
W3C Microposts Community Group:


Matthew Rowe, Lancaster University, UK
Milan Stankovic, Université Paris-Sorbonne, France
Aba-Sah Dadzie, The University of Sheffield, UK

Challenge Chair:
A. Elizabeth Cano, KMi, The Open University, UK

Steering Committee & Local Chair:
Bernardo Pereira Nunes, PUC-Rio, Brazil / L3S Research Center, Germany

Evaluation  Committee:

Naren Chittar, eBay, USA
Peter Mika, Yahoo! Research, Spain
Andrea Varga, OAK Group, University of Sheffield, UK

The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).

    For news of CHI books, courses & software, join CHI-RESOURCES
     mailto: [log in to unmask]

    To unsubscribe from CHI-ANNOUNCEMENTS send an email to
     mailto:[log in to unmask]

    For further details of CHI lists see