Create GTrack file from unstructured tabular data

This tool is used to create GTrack files from any tabular input file. The user must select the column names for the table, enabling the GTrack header expander to automatically expand the headers, effectively converting the file to a GTrack file. Custom column names are also supported.

The following column names are part of the GTrack specification

seqid: An identifier of the underlying sequence of the track element (i.e. the row). Example: 'chr1'

start: The start position of the track element

end: The end position of the track element

value: A value associated to the track element. The value type is automatically found by the tool.

strand: The strand of the track element, either '+', '-' or '.'

id: An unique identifier of the track element, e.g. 'a'

edges: A semicolon-separated list of id's, representing edges from this track element to other elements. Weights may also be specified. Example: 'a=1.0;b=0.9'

See the 'Show GTrack specification' tool for more information.

Column selection method

The tool supports two ways of selecting column names. First, you can select the column names manually. The other option is to select a GTrack file in the the history. The tool will then use the same column names (only using the first columns if the number of columns in the current tabular file is less than in the GTrack file.

Genome

Some GTrack files require a specified genome to be valid, e.g. if bounding regions are specified without explicit end coordinates. A genome build must thus be selected if such a GTrack file is to be used as template file for column specification. Also, auto-correction of the sequence id ('seqid') column requires the selection of a genome build. The resulting GTrack file in the history will be associated with the selected genome.

Track type

According to the columns selected, the tool automatically finds the corresponding track type according to the GTrack specification. Note that dense track types are noe supported yet byt this tool.

Indexing standard

Two common standards of coordinate indexing are common in bioinformatics. A track element covering the first 10 base pairs of chr1 are represented in two ways:

0-indexed, end exclusive: seqid=chr1, start=0, end=10

1-indexed, end inclusive: seqid=chr1, start=1, end=10

The GTrack format supports both standards, but the user must inform the system which standard is used for each particular case.

Auto-correction of sequence id

The tool supports auto-correction of the sequence id ('seqid') column. If this is selected, a search is carried out on the sequence id's defined for the current genome build. The nearest match, if unique, is inserted in the new GTrack file. If no unique match is found, the original value is used. The algorithm also handles roman numbers. Example: 'IV' -> 'chr4'

Example

Input table

start		id	something	seqid
100	.	a	yes	chr1
250	.	b	yes	chr1
120	.	c	no	chr2

Output file

##gtrack version: 1.0 ##track type: points ##uninterrupted data lines: true ##sorted elements: true ##no overlapping elements: true ###seqid start id something chr1 100 a yes chr1 250 b yes chr2 120 c no

Example

See full example of how to use this tool.