Words and phrases associated with symptoms of common diseases are used to identify complex concepts in medical texts and to point to significant patterns that help identify the genes and pathways that underlie clusters of diseases.
Researchers from KAUST worked with scientists in the United Kingdom to develop a specialized word search of scientific literature called semantic text-mining. The work identifies shared traits in rare, common and infectious diseases for the first time at scale. It also provides, notes Robert Hoehndorf of the Computer, Electrical and Mathematical Science and Engineering Division at KAUST, “a tantalizing overview of the phenotypic structure of the human ‘diseasome.’”
Researchers routinely catalog data of signs and symptoms relating to genetically based diseases through electronic resources, such as the Online Mendelian Inheritance in Man (OMIM) and Orphanet databases. However, extending similar methods to common and infectious diseases has proved challenging due to the lack of an infrastructure providing the huge number of phenotypes associated with them.
“To take on this task, we needed very large computational capacity,” explains Hoehndorf. “Using 'ontologies'— formal representations of the concepts and relations within a domain — we designed a method that identifies concepts referring to phenotypes of common and rare diseases within millions of published papers and abstracts, and used these concepts to establish the phenotypic similarity between a large number of common and rare diseases.”
Read the full article