GSOC Week 8: Benchmarking II

Finally some plots are ready!
I finished the python script that executes the paper workflow. Here I did the FDR MinProb transition I wrote about in my last post. I than wrote another python script to execute the OpenMS workflow. (That workflow can be found in my project plan.) Both of these scripts can be found here.
Finishing both of these scripts I followed with a quick plotting script and done! The first comparison is ready:


This looks promising! The more related the species are the higher the suitability. It would be nice to increase the resolution at the top end of the suitability to be able to differentiate the "right" database better. We are not sure how or if this would be possible since it very well can be that despite the different genomes in related species the proteasome can be even more similar. It would be nice get a similarity of proteasomes for this.

When comparing the paper results to the OpenMS results it can be noted that the paper consistently scores above OpenMS. The differences are very low when databases of high related species were tested and a lot higher when "bad" databases were used. The only differences between these two workflows are PeptideProphet vs. target-decoy-FDR and re-ranking with alternative peptides.

To explain the latter: OpenMS has a more generous approach when it comes to identifying 'de novo' hits. A hit is only considered a 'de novo' hit, when it doesn't have any 'database' protein hits. The paper script just checks for the existence of a 'de novo' protein as a protein hit and doesn't look for alternatives.
The paper doesn't provide a script for counting and I wrote a function myself, which checks for these alternative proteins. Therefore this difference is only true while re-ranking. It can be assumed that OpenMS re-ranks more times than the paper script does. This would explain a higher suitability of OpenMS. This isn't the case though. Hence the different suitabilities have to be a result of PeptideProphet vs. target-decoy-FDR.

The probability I use to filter in the paper workflow is calculated from the FDR. This calculation isn't perfect because if PeptideProphet doesn't provide a probability for exactly this FDR the next smaller one is used. But a stricter FDR would result in a lower suitability, right?
Technicly yes, but in this case no. Both 'database' and 'de novo' hits are filtered. In theory it should affect the database hits more because they tend to score worse, but this is only true for higher FDRs. The difference between a FDR of 0.01 and a little lower one isn't that big. All the bad 'database' hits were already filtered out using an FDR of 0.01 and lowering it more should have a similar effect on both kind of hits.
The reason for the different results has to be the different FDR calculations. Sure I convert my wanted FDR into a probability, but the probability is not calculated based on target-decoy-search. PeptideProphet uses way more information. This may result in more high scoring database hits.

We can also compare my results with the published results in the paper. I only tried to recreate their workflow. This is the plot directrly form the paper:


This looks way better than what I was able to reproduce. I don't know which settings of which search engine they changed to get results like this. F.e. they don't talk how the used PeptideProphet to get an FDR.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

A plot, I have the data for, but just didn't add it in time is the plot, is plotting the suitability against the number of top 'database' hits. Since a high suitability doesn't really mean anything with a low number of 'database' hits.


I still wasn't able to start with the merged FASTA file approach, but, as it turns out, this was kind of good. After re-thinking this idea we (me and my mentors) figured that this would not result in the test we wanted. Since merging a "bad" database into the "right" database essentially doubles the amount of decoy peptides, this wouldn't test how the suitability reacts to a smaller database. Hence we decided to do a sampling approach where we test different percentages of the "right" database to see what impact database size has.
Since I will draw random entries from the database for this, it will probably be necessary to repeat this multiple times to get a good average value.

This is what I will be starting to do next week.

Comments

Popular posts from this blog

GSOC Final Report

Week 13: The Final Week

Week 12: Suitability Correction (& some PR Work)