Skip to content

Skipping of Indels in CountSNPASE #15

@petercombs

Description

@petercombs

So I get why we want to skip over indels—they're much more likely to have trouble mapping correctly. But the way it's working now, it actually seems more likely to have a bias, since it's looking in the cigar string for indels, but that would only show up in the cigar string for one of the alleles, leaving the other alone. The SNP itself doesn't get used because presumably we were smart enough to not include indels in the bed file, but if there's an indel next to a SNP, that might introduce bias.

In this toy example, the red is reference, and the blue has a 1bp deletion. In the "truth" the expression is unbiased, but ASEr would call it 2:1, since the blue read that falls across both the SNP and the indel gets thrown out for having a deletion, but the red one doesn't.

screen shot 2016-06-20 at 3 40 16 pm

My intuition would be that the way to fix this is to come up with a list of indels, and then check to see if each read falls across one, and throw it out if so. However, some quick thought while procrastinating on writing a fellowship proposal did not suggest to me any obvious ways to do this efficiently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions