See this page in French

This thesis is available (in French) on TEL archives.

My PhD Thesis: “Structuration in Named Entities”

This PhD thesis was realised in collaboration with the company Expert System France (ex TEMIS).

Thesis examiners

Defence

PhD defended on 2017 November the 23rd at 14h. defence notice (french)

Abstract

Named entity recognition is a crucial discipline of NLP. It is used to extract relations between named entities, which allows the construction of knowledge bases (Surdeanu and Ji, 2014), automatic summary (Nobata et al., 2002) and so on. Our interest in this thesis revolves around structuring phenomena that surround them.

We distinguish here two kinds of structural elements in named entities. The first one are récurrent substrings, that we will call the \emph{characteristic affixes} of a named entity. The second type of element is tokens with a good discriminative power, which we call \emph{trigger tokens} of named entities. We will explain here the algorithm we provided to extract such affixes, which we will compare to Morfessor (Creutz and Lagus, 2005b). We will then apply the same algorithm to extract trigger tokens, which we will use for French named entity recognition and postal address extraction.

Another form of structuring for named entities is of a syntactic nature, where entities typically have a tree structure. We propose a novel kind of linear tagger cascade which has not been used before for structured named entity recognition, generalising other previous methods that are only able to recognise named entities of a fixed depth or unable to model certain characteristics of the structure. Ours, however, can do both.

Throughout this thesis, we compare two machine learning methods, CRFs and neural networks, for which we will compare respective advantages and drawbacks.

Keywords