epiout.dataset

Module Contents

Classes

EpiOutDataset

Dataset object to read bed and bam files and count reads for each peak.

class epiout.dataset.EpiOutDataset(bed, alignments, njobs=1, slack=200, subset_chrom=False)

Dataset object to read bed and bam files and count reads for each peak.

Parameters
  • bed – path to bed file or pyranges object.

  • alignments – path to metadata file or list of paths to bam files or dict of sample name and path to bam file.

  • njobs – number of jobs to run in parallel during counting.

  • slack – slack to merge peaks.

  • subset_chrom – subset chromosomes to only those in the bam files.

read_bed(self, bed, slack=200, subset_chrom=False)
Read bed file and overlapping merge peaks with slack of

by default 200bp, subset chromosomes of chr1, chr2, …, chrX, chrY, chrM, if subset_chrom is True, and sort by chromosome and start position.

read_alignments(self, alignments)
Read alignments file and return dict of sample name

and path to bam file.

static _valid_alignments(alignments: dict)
static _valid_bed(bed: pyranges.PyRanges)
static count_reads(gr, bam, mapq=10)

Read bam file and count reads for each peak.

Parameters
  • gr – pyranges object of peaks.

  • bam – path to bam file or pysam.AlignmentFile object.

  • mapq – minimum mapping quality.

_count_samples(self, mapq=10)
static _filters(df_raw, min_count=100, min_percent_sample=0.5)

min_count: minimum count at least one sample. min_num_sample: minimum number of sample peak with at least one read.

count(self, mapq=10, min_count=100, min_percent_sample=0.5)
Count reads for each peak and filter peaks with minimum count and

minimum number of samples with at least one read.

Parameters
  • mapq – minimum mapping quality.

  • min_count – minimum count at least one sample.

  • min_percent_sample – minimum number of sample peak with at least one read.