Chakravarthi Kanduri(skanduri@ifi.uio.no), Christoph Bock(cbock@cemm.oeaw.ac.at), Sveinung Gundersen (sveinungu@ifi.uio.no), Eivind Hovig (ehovig@ifi.uio.no), Geir Kjetil Sandve(geirksa@ifi.uio.no)
(This page provides examples of the pitfalls related to co-localization analysis of genomic features. All the examples are provided through the Genomic HyperBrowser, which is tightly connected to the Galaxy framework. Users not familiar with Galaxy framework can quickly get familiar by following a quick introduction tutorial here (https://galaxyproject.org/tutorials/g101/).)
As discussed in the null models section, MC-based hypothesis testing involves iterative random shuffling of genomic locations to obtain a distribution of test statistic under the null model. The null model ought to preserve the essential geometric and biological properties of the genomic locations to provide a bias-free conclusion. Notably, various genomic elements and sequence properties occur along the DNA sequence in a non-uniform and dependent fashion leading to local heterogeneities. Therefore, not preserving the local genomic structure in null models may result both in false-positives and false-negatives. Here, we demonstrate this with an example.
In the above section, we tested the association between Schizophrenia GWAS SNPs and an enhancer activation mark (H3K4ME1) using different null models that varied in the geometric properties they preserved. However, none of those null models preserved the local genomic structure and also none of those models provided a strong evidence against the null hypothesis. We test the same relation again here, but this time by preserving the local genomic structure. Previous case studies (reference 36 in the main article) have shown that not matching SNPs by Linkage disequilibrium and other properties like minor allele frequency, gene density and distance to nearest gene would lead to spurious conclusions. As we have prior knowledge here about the local genomic properties that should be matched for, we sampled 1000 times the matched locations of Schizophrenia GWAS SNPs (1000 permutations). When we tested whether any of those 1000 tracks are overlapping with H3K4ME1 track more than the Schizophrenia GWAS SNPs track, none of those tracks had the test statistic (here Forbe’s coefficient) higher than that of GWAS SNPs track (history element-10). This gives an empirical p-value of ~ 0 (as 0/1000 tracks have extreme test statistic). Compared to the null models, which did not match local heterogeneity (history elements 11-15), preserving local heterogeneity in this way gave a stronger evidence against the null hypothesis.
hb-superuser
                    All published pages
                    Published pages by hb-superuser