In this example we are going to investigate the genomic co-localisation
between a subset of transcription factors and epigenetic marks known as histone modifications
from the K562 leukemia cell line generated by the ENCODE consortium [1]. The epigenetic mark
tri-methylation of histone 3 lysine 4 (H3K4me3) and acetylation of the same histone at lysine
9 and 27 (H3K9ac and H3K27ac) are found to correlate with active transcription while the tri-methylation
of lysine 9 and 27 (H3K9me3 and H3K27me3) with repression [2-4]. The workflow outlined can be used
to investigate the relationship between additional genomic interacting proteins and features,
such as DNA methylation and genomic mutations. The data used in the example are genomic areas
enriched for the transcription factor or histone modification in question, known as peaks, generated
by the technique Chromatin Immunoprecipitation coupled with next generation sequencing (ChIP-Seq).
[1] ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I,
Green ED, Gunter C, et al. (2012) An integrated encyclopedia of DNA elements in the human genome.
Nature 489: 57-74
[2] Barski A, Cuddapah S, Cui K, Roh T-Y, Schones DE, Wang Z,
et al. (2007) High-resolution profiling of histone methylations in the human genome. Cell
129: 823-837.
[3] Mikkelsen TS, Xu Z, Zhang X, Wang L, Gimble JM, Lander ES,
et al. (2010) Comparative epigenomic analysis of murine and human adipogenesis.
Cell 143: 156-169.
[4] Barth T and Imhof A (2010) Fast signals and slow marks:
the dynamics of histone modifications., 35(11), 618-626.
Annotated analysis history (including all the utilized data and
operations) is available on a
separate page . (All results mentioned below refer
to elements from this history page.)
Initial two GSuites for the subset of
histone modifications and transcription factors, respectively, were created using
"Create a GSuite from an integrated catalog of genomic datasets" satisfying
the following criteria were selected:
Histones:
Track category: Histone variants and
modifications (search by cell/tissue type)
Sub-category: K562 Leukemia cell line
Database: All original tracks
File type: narrowPeak
Datatype: Others
Selected subset of tracks manually
Transcription factors:
Track category: Transcription factor
binding sites (search by cell/tissue type)
Sub-category: K562 Leukemia cell line
Database: All original tracks
File type: narrowPeak
Data type: narrowPeak
Selected subset of tracks manually
For downstream analysis the downloaded files were used.
In order to count the number of times each transcription factor
overlap with a histone modifications the GSuite containg the peaks for the transcription factors
was modified to contain only the middle point of each peak using "Modify primary tracks referred
to in a GSuite":
Operation: Expand all points and segments equally
Parameter: typed: "middle"
Change file suffix: "Yes" and changed suffix to "bed"
To investigate how the transcription factors colocalizes with the
different histone modifications, the two GSuites were preprocessed for analysis using
"Preprocess a GSuite for analysis" and secondly a co-localization analysis were setup using
"Determine coinciding track combinations from two GSuites" selecting the following parameters:
Query track: GSuite with transcription factors containing the middle point of each peak.
Reference track: GSuite with Histone modifications
Genome: hg19
Similarity measure: Forbes coefficient: ratio of observed to expected overlap.