This thesis is available (in French) on TEL archives.
My PhD Thesis: “Structuration in Named Entities”
This PhD thesis was realised in collaboration with the company Expert System France (ex TEMIS).
Thesis examiners
- Isabelle Tellier (supervisor)
- Marco Dinarelli (co-tutor)
- Christian Lautier (co-tutor)
- Agata Savary (reporter)
- François Yvon (reporter)
- Frédéric Landragin (examiner)
- Pascale Sébillot (examiner)
- Patrick Watrin (examiner)
Defence
PhD defended on 2017 November the 23rd at 14h. defence notice (french)
Abstract
Named entity recognition is a crucial discipline of NLP. It is used to extract relations between named entities, which allows the construction of knowledge bases (Surdeanu and Ji, 2014), automatic summary (Nobata et al., 2002) and so on. Our interest in this thesis revolves around structuring phenomena that surround them.
We distinguish here two kinds of structural elements in named entities. The first one are récurrent substrings, that we will call the \emph{characteristic affixes} of a named entity. The second type of element is tokens with a good discriminative power, which we call \emph{trigger tokens} of named entities. We will explain here the algorithm we provided to extract such affixes, which we will compare to Morfessor (Creutz and Lagus, 2005b). We will then apply the same algorithm to extract trigger tokens, which we will use for French named entity recognition and postal address extraction.
Another form of structuring for named entities is of a syntactic nature, where entities typically have a tree structure. We propose a novel kind of linear tagger cascade which has not been used before for structured named entity recognition, generalising other previous methods that are only able to recognise named entities of a fixed depth or unable to model certain characteristics of the structure. Ours, however, can do both.
Throughout this thesis, we compare two machine learning methods, CRFs and neural networks, for which we will compare respective advantages and drawbacks.
Keywords
- named entity recognition
- structured named entities
- machine learning
- conditional random fields
- neural networks