Labelling
GW is designed to make manually labelling 100s - 1000s of variants as pain free as possible. Using the command-line interface, labels can be saved to a tab-separated file and opened at a later date to support labelling over multiple sessions. GW can also write a modified VCF, updating the filter column with curated labels.
Use the UP and DOWN arrows to increase or decrease the number of image tiles, or LEFT and RIGHT to navigate through images. Below is a view of 196 variant images.
To use labelling correctly in GW, ensure all variant IDs in your input VCF/BCF are unique - this should normally be the case anyway.
When you open a VCF file, GW will parse the ‘filter’ column and display this as a label in the bottom left-hand corner of image tiles. Other labels can be parsed from the VCF using the –parse-label option. For example, the SU
tag can be parsed from the info column using:
gw hg38.fa -b your.bam -v variants.vcf --parse-label info.SU
Image tiles can then be clicked-on using the mouse to modify the label, choosing between PASS/FAIL by default. To provide a list of alternate labels, use the --labels
option:
gw hg38.fa -b your.bam -v variants.vcf --labels Yes,No,Maybe
Now, when you left-click on a tiled image, you can cycle through this list.
To save or open a list of annotations, use the --in-labels
and --out-labels
options. This makes it straightforward to keep track of labelling progress between sessions. Only variants that have been displayed to screen will be appended to the results in --out-labels
file. The same file can be used for in and out labels, if desired:
gw hg38 -b your.bam -v variants.vcf --in-labels labels.tsv --out-labels labels.tsv
Labels are output as a tab-separated file with the following format:
chrom | pos | variant_ID | label | var_type | labelled_date | variant_filename |
---|---|---|---|---|---|---|
chr1 | 200000 | 27390 | PASS | DEL | test.vcf | |
chr1 | 250000 | 2720 | FAIL | SNP | 14-10-2022 16-05-46 | test.vcf |
The labelled_date column is only filled out if one of the tiled images was manually clicked - if this field is blank then the --parsed-label
was used. This feature allows you to keep track of which variants were user-labelled over multiple sessions. Additionally, the --out-labels
file is auto-saved every minute for safe keeping.
GW can also write labels to a VCF file. We recommend using this feature to finalise your annotation - the whole VCF file will be written to --out-vcf
. The final label will appear in the ‘filter’ column in the vcf. Additionally, the date and previous filter label are kept in the info column under GW_DATE
, GW_PREV
:
gw hg38.fa -b your.bam -v variants.vcf --in-labels labels.tsv --out-vcf final_annotations.vcf
Note, the --in-labels
option is not required here, but could be used if labelling over multiple sessions, for example. Also, a GW window will still pop-up here, but this could be supressed using the --no-show
option.