Create GTrack file from unstructured tabular data
This tool allows structuring unformatted tabular data by specifying the necessary meta-data through simple selection boxes, inferring further properties of the data where possible.
To do this the user must select the column names for the table, enabling the GTrack header expander to automatically expand the headers, effectively converting the file to a GTrack file.
In this example the raw data of Human Papilloma Virus (HPV) used for creating GTrack file was generated and published in Kraus, I et.al. The dataset is available from this link: http://hyperbrowser.uio.no/test/static/hyperbrowser/files/tutorial/HPV_sites.xls.
The document consist of Sample ID, Chromosome ID, Strand ( "+"or "-") and Coordinates.
Part 1- Create GTrack file
- In the "HyperBrowser track processing" tool menu, select "Format and convert track"
- Select "Create GTrack file from unstructured tabular data"
- Copy the raw dataset from the excel document and past it in the "Type or paste in tabular file:" box
- Select "Tab" as "Character to use to split lines into columns"
- Select "1" as "Number of lines to skip (from front)"
- This is done to remove the header row from the tabular file
To specify the column names in the table, you first have to:
- Select "Select individual columns" as "Column selection method"
- Select "--custom--" as "Select the name for column #1"
- In the "Type in a custom name for column #1" write "sample"
- This column is not a reserved GTrack column, but rather a custom column that can contain any text
- Select "seqid" in the "Select the name for column #2"
- This column is the sequence ID, in this case the chromosome IDs
- Select "strand" as "Select the name for column #3"
- Select "start" as "Select the name for column #4"
- Notice that the field "Current track type" is updated dynamically according to the columns that has been selected, using the mapping defined in the GTrack specification document (see the tool "GTrack specification" under "GTrack tools"
- Choose "Yes" for "Select a specific genome?"
- Select "Human Mar. 2006 (hg18/NCBI36) as "Genome build"
- Select "1-indexed, end inclusive" as "Indexing standard used for start and end coordinates"
- This is usually the case for data copied from unspecified tabular data
- Select "Yes, auto-correct to the best match in the genome build"
- Press the "Execute" button
- Click the eye icon to see the result data
Part 2- Do the analysis
To see where in the Human genome the HPV is localized, we will do a HyperBrowser analysis. This is done by first:
- Enlarging the "History" element by clicking it's name and clicking the "perform HyperBrowser analysis" button
- Select "Human Mar. 2006 (hg18/NCBI36) as "Genome build", if not already selected
- In the "First track" box, select "From history" and then the newly created GTrack file
- In the "Second track", select "Genes and gene subsets" as the first level, then "Flanks" as the second level and finally "refseq exons upstream 1kb"
- In the "Analysis" box, select "Hypothesis testing" as "Category" and the question "Located inside?"
- Click the "Start analysis" button
You may import the history by clicking the "import history" button below. You will see a overview of the files and parameter settings in the tools.
References
Kraus, I., Driesch, C., Vinokurova, S., Hovig, E.,
Schneider, A., Knebel Doeberitz, von, M., & Durst, M. (2008). The Majority
of Viral-Cellular Fusion Transcripts in Cervical Carcinomas Cotranscribe
Cellular Sequences of Known or Predicted Genes. Cancer Research, 68(7), 2514â2522.
doi:10.1158/0008-5472.CAN-07-2776