Next Generation Knowledge Mining Technologies


This program develops methods and technologies to extract efficiently meaningful associations/links between entities from free text and data repositories in domain of life science and beyond.

​Associations mining

This project aims at identifying and extracting associations between entities/concepts from disparate resources. It is based on a combination of deep text-mining and deep data-mining from Big Data and based on Big Data analytics infers associations that are not present in any individual information source, enriched concepts and associations and estimates trends. Technical solutions are generic and applicable to different types of data (news, web, social networks, published literature, databases, ...). The trends regarding specific associations, predictions of time development and potential interactions/effects on other associations are estimated. The systems use machine learning models at different stages of information processing. The project contributes partly to the CBRC CCF Flagship Program.


Topic-specific knowledgebases on demand

This project aims at combining various resources with different types of information and aims at providing efficient way for finding informati​on of interest with all of its associated concepts. The technical solutions are generic and applicable to many domains beyond life science. This project includes development of platform for generation of "topic-specific biological/biomedical knowledgebases on demand", as well as knowledge exploration systems for biology/biomedical/biotechnology fields. These systems rank the concepts and association of concepts based on various criteria users can select. They also provide a convenient platform for hypotheses generation. Compiled knowledgebases integrate information from a large number of resources such as PubMed, ChEBI, Entrez Gene, GO, KOBAS, KEGG, UniPathways, BioGrid, etc.. This project contributes to the CBRC CCF Flagship Program.

Drug repurposing, diagnostic biomarker discovery and text-mining

This project focuses on deriving solutions for health/medical domain for inference of knowledge/information regarding drugs and diagnostic biomarkers not explicitly present in single information source or present in a complex form in free text and data. It is heavily interlinked with the projects on "Associations mining" and "Topic-specific knowledgebases on demand".

Medical informatics

This project integrates methods from ontologies, text-mining, pehnotype characterization and data-mining to provide information that could assist physician in diagnosis and treatment management. It utilizes machine learning models at different stages of information processing.