epiout.dataset
Module Contents
Classes
Dataset object to read bed and bam files and count reads for each peak. |
- class epiout.dataset.EpiOutDataset(bed, alignments, njobs=1, slack=200, subset_chrom=False)
Dataset object to read bed and bam files and count reads for each peak.
- Parameters
bed – path to bed file or pyranges object.
alignments – path to metadata file or list of paths to bam files or dict of sample name and path to bam file.
njobs – number of jobs to run in parallel during counting.
slack – slack to merge peaks.
subset_chrom – subset chromosomes to only those in the bam files.
- read_bed(self, bed, slack=200, subset_chrom=False)
- Read bed file and overlapping merge peaks with slack of
by default 200bp, subset chromosomes of chr1, chr2, …, chrX, chrY, chrM, if subset_chrom is True, and sort by chromosome and start position.
- read_alignments(self, alignments)
- Read alignments file and return dict of sample name
and path to bam file.
- static _valid_alignments(alignments: dict)
- static _valid_bed(bed: pyranges.PyRanges)
- static count_reads(gr, bam, mapq=10)
Read bam file and count reads for each peak.
- Parameters
gr – pyranges object of peaks.
bam – path to bam file or pysam.AlignmentFile object.
mapq – minimum mapping quality.
- _count_samples(self, mapq=10)
- static _filters(df_raw, min_count=100, min_percent_sample=0.5)
min_count: minimum count at least one sample. min_num_sample: minimum number of sample peak with at least one read.
- count(self, mapq=10, min_count=100, min_percent_sample=0.5)
- Count reads for each peak and filter peaks with minimum count and
minimum number of samples with at least one read.
- Parameters
mapq – minimum mapping quality.
min_count – minimum count at least one sample.
min_percent_sample – minimum number of sample peak with at least one read.