Life sciences have dedicated significant resources into developing ontologies that can be combined in a modular way to characterize biological data. To allow these ontologies to be combined and interoperate, significant resources have been invested into formalizing them and adding natural language labels, synonyms, and definitions. The axioms in life science ontologies are used to detect and ensure consistency within and between ontologies, find unsatisfiable classes, guide ontology extension through the application of axiom-based design patterns, and encode domain background knowledge. As a consequence, biomedical ontologies form large formalized domain knowledge bases and have the potential to improve biomedical data analysis by providing background knowledge and relations between biological entities that are not otherwise connected.
Recent advances in machine learning methods allow ontologies to be used as constraints and therefore incorporate knowledge into optimization problems. These methods operate in a data-driven way and can provide a means to empirically evaluate ontologies in an application-specific manner. I will discuss how to use machine learning to evaluate ontology modules in the Gene Ontology using a variety of tasks. The results show that different modules contribute to machine learning tasks in a context-specific manner and that richer axioms generally improve predictive performance. The methods I will describe can provide a framework for evaluating the quality of ontology modules (for specific tasks) and provide ways to improve them.