Genome-wide association studies(GWAS) assessed the effect of common variants on human disease and have uncovered thousands of disease associated variants. However, there is limited research on the contribution of rare variant. The UK Biobank contains detailed medical records and genetic information for nearly 500,000 individuals, offers a great oppotunity for genetic association studies on rare variants. Here we focused on the role of rare protein coding variants on UK Biobank phenotypes. We selected three diseases for analysis: breast cancer, hypothyroidism and type II diabete. We defined a criteria for qualifying variants and pruned the control group to reduce interference signal from similar phenotypes. We identified most of the known biomarkers for those diseases, such as BRCA1 and BRCA2 gene for breast cancer, TG and TSHR gene for hypothyroidism and GCK for type II diabete. This result supports the model validity and clarifies the contribution of rare variants to diseases. Moreover, we also tried geneset based collapsing method to aggregate information across genes to strengten the signal from rare variants, and build a diagnosis model that only relies on the genetic information. Our model could achieve a great performance with AUC more than 70\% for all these three diseases.
Yang Liu is an M.S./Ph.D. candidate in the KAUST Bio-Ontology research group under the supervision of Professor Robert Hoehndorf. Before joining KAUST, Yang obtained a bachelor's degree in Harbin Institute of Technology, China.
Yang's research interests include machine learning and bioinformatics. She is interested in cancer genomics and data. She is interested in building models for human disease diagnosis and prognosis, especially for cancer, and applying statistical methods to analyze human omics data.