**
*Postdoctoral researcher in Computational Diachronic Semantics*
*
Labex EFL (Empirical Foundations of Linguistics, Paris,
https://en.labex-efl.fr/)
Strand 5: Computational Semantic Analysis
Research area: interpretable computational models for automatic
detection and monitoring of semantic evolutions: combination of
Contextual Embeddings and Pattern Mining approaches
Contract duration: 18 months
Location: Paris
Research Laboratory: Sorbonne Paris Nord University, LIPN UMR7030 CNRS
Application deadline: November 15, 2021
Audition period: November 15-30, 2021
Job Starting date: from January 1, 2022
Context, Issues and research axes
Languages are constantly evolving, driven by the need to adapt to
socio-cultural and technological developments and to make communication
more efficient and expressive. In particular, new words are forged or
borrowed from other languages, some words become obsolete, others
acquire new meanings or lose existing meanings.
In NLP, the study of language dynamics, especially from the lexical
point of view, has gained audience in recent years, complementing
synchronic approaches. The field of research is structuring itself, with
recent state of the art (Monteirol et al., 2021; Tahmasebi et al., 2021)
and several scientific events (International Workshop on Computational
Approaches to Historical Language Change 2019 and 2021, ACL 2019 and
2020). Two initial evaluation tasks have been proposed (Unsupervised
Lexical Semantic Change Detection Task, SemEval2020) and reference sets
have been set up for four languages (English, Latin, Swedish and German).
Lexical change detection systems have followed advances in NLP methods:
after the first systems essentially based on frequency changes (for
example Gulordova & Baroni, 2011), systems used word embeddings (Kim et
al., 2014, Schletchweg et al., 2019) and more recently contextual
embeddings (Hu et al., 2019; Martinc et al., 2019; Giulianelli et al.,
2020). These latter systems generally proceed by grouping the contextual
vector representations of the different uses into clusters of meaning,
then detect changes according to different metrics (Monteirol et al.
2021). Current systems still face many limitations. Mainly, the opacity
of neural models does not make it possible to characterize these
evolutions, in particular it is difficult, if not impossible, to link
the semantic changes to linguistic morphological, syntactic or
lexico-syntactic features, or to categorize the types of changes
(extension, restriction, metaphor, metonymy, etc.). To this end, one
avenue would be to combine neural approaches with Pattern Mining(Béchet
et al. 2015) or collocation extraction approaches from corpus
linguistics (for example Gries, 2012) which make it possible to extract
the most salient lexico-syntactic patterns of a given meaning from a
corpus of occurrences and thus identify the evolution. It would also be
interesting to use the contextual information of the occurrences (date,
type of source, domain, diatopic and diastratic features, etc.) to
characterize and follow the evolution of usages.
The job main objective is therefore to set up a system combining these
approaches to allow an automatic characterization of semantic
evolutions. The first step will consist in experimenting with
state-of-the-art models for detecting changes. The second step will
then try to combine contextual embeddings and pattern mining approaches
/ collocation extraction to highlight the linguistic characteristics of
each of the meaning clusters and their evolution. The studied corpora
will be mainly in English and French. The postdoctoral fellow will work
in collaboration with computer scientists and linguists from the Labex
who are currently building a reference corpus of semantic evolutions for
French (following the Durel methodology: Schlechtweg et al., 2018).
Other issues may also be addressed by the recruited person, and in
particular: current systems do not take into account the graduality of
evolutions, generally being limited to comparing two synchronic language
states; to get the vector representation of a lexis in a context, it is
possible to use one of the hidden layers or a combination of them. There
is currently no consensus on the most adequate layer to take into
account to obtain the most adequate semantic representation.
The recruited person will join the strand 5 (“Computational Semantics”)
of the Labex, specifically the research team working on the “Semantic
Variation and Change” operation which aims to:
*
develop new models and methods for the automatic detection of
lexical semantic changes, the typology of changes from intra- and
extra-linguistic points of view;
*
develop a reference dataset of semantic evolutions in contemporary
French, based on available diachronic corpora.
Candidate profile
- PhD in computer science specialised in Computational Linguistics and
Machine Learning
- deep learning methods and language models attested training and experience
- working language: French and / or English
Application
Please send :
• a cover letter
• a description of the research project related to the research
questions
• a CV with a list of publications and 3 representative
publications (pdf or link),
• letters of recommendation or names of two referees.
to [log in to unmask] and
[log in to unmask] before November 15, 2021. The
auditions of the pre-selected candidates will take place at the end of
November 2021.
Références
Béchet N., Cellier P., Charnois T. and Crémilleux B. (2015). “Sequence
mining under multiple constraints”. In Proceedings of the 30th Annual
ACM Symposium on Applied Computing (SAC 2015),ACM Press, Salamanca,
Spain, pages. 908--914.
Giulianelli, M., Tredici, M.D., & Fernández, R. (2020). “Analysing
Lexical Semantic Change with Contextualised Word Representations”.
Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics, pages 3960–3973 July 5 - 10, 2020.
https://www.aclweb.org/anthology/2020.acl-main.365.pdf
<https://www.aclweb.org/anthology/2020.acl-main.365.pdf>
Gries Stefan Th. (2012). "Behavioral Profiles: a fine-grained and
quantitative approach in corpus-based lexical semantics". In Gonia
Jarema, Gary Libben, Chris Westbury (eds.), Methodological and analytic
frontiers in lexical research, 57-80. Amsterdam Philadelphia: John
Benjamins.
Montariol, S. (2021). Models of diachronic semantic change using word
embeddings. (Modèles diachroniques à base de plongements de mot pour
l'analyse du changement sémantique). PhD Thesis, Paris-Saclay. 223 pages
https://tel.archives-ouvertes.fr/tel-03199801/document
<https://tel.archives-ouvertes.fr/tel-03199801/document>
Montariol S., Doucet A. and Allauzen A. (2021). “Etat de l’art du
changement sémantique à partir de plongements contextualisés”. In Coria
2021, http://coria.asso-aria.org/2021/articles/court_27/main.pdf
<http://coria.asso-aria.org/2021/articles/court_27/main.pdf>
Montariol, S., Martinc, M., & Pivovarova, L. (2021). “Scalable and
Interpretable Semantic Change Detection”. Proceedings of the 2021
Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, pages 4642–4652
June 6–11, 2021. .
https://www.aclweb.org/anthology/2021.naacl-main.369.pdf
<https://www.aclweb.org/anthology/2021.naacl-main.369.pdf>
Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., &
Tahmasebi, N. (2020). “SemEval-2020 Task 1: Unsupervised Lexical
Semantic Change Detection”. Proceedings of the 14th International
Workshop on Semantic Evaluation, pages 1–23 Barcelona, Spain (Online),
December 12, 2020. https://www.aclweb.org/anthology/2020.semeval-1.1.pdf
<https://www.aclweb.org/anthology/2020.semeval-1.1.pdf>
Schlechtweg, D., & Walde, S.S. (2020). “Simulating Lexical Semantic
Change from Sense-Annotated Data”. In Ravignani, A. and Barbieri, C. and
Martins, M. and Flaherty, M. and Jadoul, Y. and Lattenkamp, E. and
Little, H. and Mudd, K. and Verhoef, T. (Eds.): The Evolution of
Language: Proceedings of the 13th International Conference
(EvoLang13).http://brussels.evolang.org/proceedings/paper.html?nr=9
<http://brussels.evolang.org/proceedings/paper.html?nr=9>
Tahmasebi, N., Borin, L., & Jatowt, A. (2018). “Survey of Computational
Approaches to Lexical Semantic Change”. Computational Linguistics, vol.
1, n°1, https://arxiv.org/pdf/1811.06278.pdf
<https://arxiv.org/pdf/1811.06278.pdf>
Tahmasebi N., Borin L., Jatowt A., Xu Y. and Hengchen S. (éds, 2021).
Computational approaches to semantic change, Language Science Press,
396p. https://langsci-press.org/catalog/book/303
<https://langsci-press.org/catalog/book/303>
Schlechtweg D., Schulte im Walde S. and Eckmann S. (2018). Diachronic
usage relatedness (DURel): A framework for the annotation of lexical
semantic change. In Proceedings of the 2018 Conference of the North
American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 2 (Short Papers), pages 169–174, New
Orleans, Louisiana. Association for Computational Linguistics.
https://www.aclweb.org/anthology/N18-2027.pdf
<https://www.aclweb.org/anthology/N18-2027.pdf>
*
--
Emmanuel Cartier
Enseignant-chercheur en linguistique informatique
LIPN - RCLN UMR7030 CNRS / Pléiade EA 7338
Université Sorbonne Paris Nord
99 avenue Jean-Baptiste Clément
93430 Villetaneuse
+33 (0)6 46 79 12 86
[log in to unmask]
############################
Unsubscribe:
[log in to unmask]
If you don't already have a password for the LISTSERV.ACM.ORG server, we recommend
that you create one now. A LISTSERV password is linked to your email
address and can be used to access the web interface and all the lists to
which you are subscribed on the LISTSERV.ACM.ORG server.
To create a password, visit:
https://LISTSERV.ACM.ORG/SCRIPTS/WA-ACMLPX.CGI?GETPW1
Once you have created a password, you can log in and view or change your
subscription settings at:
https://LISTSERV.ACM.ORG/SCRIPTS/WA-ACMLPX.CGI?SUBED1=MM-INTEREST
|