Abeer Almutairi, a student under the supervision of Professor Robert Hoehndorf, defended her Master's thesis on November 4, 2019.
Diseases take a central role in biomedical research; many studies aim to enable access to disease information, by designing named entity recognition models to make use of the available information. Disease recognition is a problem that has been tackled by various approaches of which the most famous are the lexical and supervised approaches. However, the aforementioned approaches have many drawbacks as their performance is affected by the amount of human-annotated data set available. Moreover, lexical approaches cannot distinguish between real mentions of diseases and mentions of other entities that share the same name or acronym. The challenge of this project is to find a model that can combine the strengths of lexical and supervised approaches, to design a named entity recognizer. We demonstrate that our model can accurately identify disease name mentions in text, by using word embedding to capture context information of each mention, which enables the model to distinguish if it is a real disease mention or not. We evaluate our model using a gold standard data set which showed a high precision of 84%. Finally, we compare the performance of our model to different statistical named entity recognition models, and the results show that our model outperforms the unsupervised lexical approaches.