API
class dysgu.DysguSV(ref_genome, bam, sample_name=’sample’, **kwargs)
This class is the main interface for calling structural variants using dysgu. To initialize DysguSV, provide a reference genome and bam file, via the pysam library. It is also recommended to provide a sample_name which will be used when saving results to a vcf file.
-
Parameters
-
ref_genome (pysam.FastaFile) – A reference genome object from the pysam library
-
bam (pysam.AlignmentFile) – An alignment file from the pysam library
-
sample_name (str) – The sample name to use in the vcf file
-
kwargs (dict) – Key-word arguments to modify default options of dysgu
-
import pysam
from dysgu import DysguSV
# open a reference genome and alignment file using pysam
bam = pysam.AlignmentFile('test.bam', 'rb')
genome = pysam.FastaFile('ref.fasta')
# Initialise dysgu
dysgu = DysguSV(genome, bam)
# Call SVs at a genomic location
df = dysgu("chr1:1-1000000")
To options can be provided during initialisation as key-word arguments:
dysgu = DysguSV(genome, bam, min_support=5, mq=20)
__call__(region, sort_df=True)
Call SVs using dysgu
-
Parameters
-
region (str** or **pysam Iterator) – The genomic region to call SVs from
-
sort_df (bool) – Sort the retured dataframe
-
-
Returns
Dataframe of called SVs
-
Return type
pandas.DataFrame
dysgu = DysguSV(ref, bam)
df = dysgu("chr1:10000-50000")
# Using a pysam iterator
df = dysgu(bam.fetch("chr1", 0, 500000))
apply_model(df)
Apply a machine leaning model to the dataframe. The model configuration is determined by the options set on the DysguSV class. For example, to use a non diploid model, first set diploid=False:
-
Parameters
df (pandas.DataFrame) – The input dataframe to apply the machine learning model to
-
Returns
Dataframe with a modified ‘prob’ column
-
Return type
pandas.DataFrame
dysgu = DysguSV(ref, bam)
dysgu.set_option("diploid", False)
df = dysgu("chr1:10000-50000")
call_bed_regions(regions)
Call SVs from a list of bed regions. Note, bed regions should be sorted by genome starting position, and be non-overlapping. To create a suitable input for this function see dysgu.load_bed() and dysgu.merge_intervals() functions.
-
Parameters
regions (iterable) – An iterable of bed regions
-
Returns
Dataframe of called SVs
-
Return type
pandas.DataFrame or None
from dysgu import load_bed, merge_intervals
# load, merge and sort intervals
bed = load_bed('test.bed')
bed = merge_intervals(bed, srt=True)
dysgu = DysguSV(ref, bam)
df = dysgu.call_bed_regions(bed)
set_option(option, value=None)
Change option(s) for dysgu.
-
Parameters
-
option (str) – The name of the option
-
value (object) – The value of the option
-
-
Returns
None
-
Return type
None
dysgu.set_option("min_support", 10)
# Or provide a mapping of arguments:
dysgu.set_option({"min_support": 10, "mq": 20, "min_size": 100})
to_vcf(dataframe, output_file)
Save dysgu SV calls to a vcf file
-
Parameters
-
dataframe (pandas.DataFrame) – A dataframe of called SVs from dysgu
-
output_file (file) – The file handle to write the vcf file to
-
-
Returns
None
-
Return type
None
with open(path, "w") as out:
dysgu.to_vcf(passed, out)
dysgu.dysgu_default_args()
Returns the default arguments used by dysgu
- Parameters
- None
-
Returns A dict of available arguments
- Return type dict
dysgu.load_dysgu_vcf(path, drop_na_columns=True)
Load a vcf file from dysgu
-
Parameters
-
path (str) – The path to the vcf file
-
drop_na_columns (bool) – Drop columns that are all NAN
-
-
Returns
A dataframe of SVs
-
Return type
pandas.DataFrame
dysgu.merge_dysgu_df(*dataframes, merge_distance=500, pick_best=True, add_partners=True)
Merge calls from dysgu. Input is one or more dataframes with dysgu calls.
-
Parameters
-
dataframes (pandas.DataFrame) – The input dataframes of dysgu calls to merge
-
merge_distance (int) – The merging distance, SVs closer than this spacing will be candidates for merging
-
pick_best (bool) – A single best SV is chosen for each cluster
-
add_partners (bool) – Add information to the output detailing which SVs were merged
-
-
Returns
The merged data
-
Return type
pandas.DataFrame
dysgu.merge_intervals()
Merge a list of intervals, the expected format is a 3-tuple e.g. (chromosome, start, end). If add_indexes is set to True, merge_intervals expects a 4-tuple with the last item corresponding to an index variable
-
Parameters
-
intervals (iterable) – The list of intervals to merge
-
srt (bool) – Sort the intervals by chromosome and start position
-
pad (int) – Add a padding to intervals before merging. E.g. pad=10 subtracts 10 from start and adds 10 to end of interval
-
add_indexes (bool) – Add the indexes of merged intervals to the output
-
-
Returns
list of merged intervals
-
Return type
list
>>> merge_intervals( [('chr1', 1, 4), ('chr1', 2, 5), ('chr2', 3, 5)] )
>>> [['chr1', 1, 5], ['chr2', 3, 5]]
>>> a = [("chr1", 1, 10, 0), ("chr1", 9, 11, 1), ("chr1", 20, 30, 2)]
>>> merge_intervals(a, add_indexes=True)
>>> [('chr1', 1, 11, [0, 1]), ['chr1', 20, 30, [2]]]