GSOC Week 5: Documentation With Dot

I focused on documentation and test this week since the code part of the tool is pretty much done .

Because the metrics this tool implements are very dependent on the right set-up of precursor tools, it would be very useful to provide a sample workflow in the documentation. Currently this is achieved with a table of potential predecessor and successor tools and looks like this:


For more complicated workflows this isn't reasonable. The other option would be to bind a picture of the workflow into the doxygen documentation. This would suffice if there wasn't another option: DOT!

This is one of the tools provided by graphviz and can be used in doxygen with the @dot command. This tool is actually designed to illustrate class hierarchy, but since it does this by drawing a directonal graph it can be used to draw workflows too. To learn how to do this I read a lot of documentation. For details like how to color or label the graph the graphviz website was very useful. For learning the general structure of a code for a dot graph graphviz provides a very insightful dot guide. The doxygen page about dot also helped me out a little.

In general the dot language isn't that complicated and most of the commands can just be used like you think they would. The following code creates a workflow that I want and I will go through the most important commands.
1:  digraph sample_workflow {
2: node [ style="solid,filled", color=black, fillcolor=grey90, width=1.5, fixedsize=true, shape=square,
fontname=Helvetica, fontsize=10 ];
3: edge [ arrowhead="open", style="solid" ];
4: rankdir="LR";
5: splines=ortho;
6: mzml [ label="mzML file(s)" shape=oval fillcolor=white group=1];
7: novor [ label="NovorAdapter" URL="\ref OpenMS::NovorAdapter" group=2];
8: id_filter [ label="IDFilter" URL="\ref OpenMS::IDFilter" group=2];
9: id_convert [ label="IDFileConverter" URL="\ref OpenMS::IDFileConverter" group=2];
10: decoy_db [ label="DecoyDatabase" URL="\ref OpenMS::DecoyDatabase" group=2];
11: comet [ label="CometAdapter" URL="\ref OpenMS::CometAdapter" group=1];
12: pep_ind [ label="PeptideIndexer" URL="\ref OpenMS::PeptideIndexer" group=1];
13: fdr [ label="FalseDiscoveryRate" URL="\ref OpenMS::FalseDiscoveryRate" group=1];
14: db_suit [ label="DatabaseSuitability" fillcolor="#6F42C1" fontcolor=white group=3];
15: tsv [ label="optional\ntsv output" shape=oval fillcolor=white group=3];
16: {rank = same; db_suit; decoy_db;}
17: mzml -> novor;
18: mzml -> comet;
19: comet -> pep_ind;
20: pep_ind -> fdr;
21: fdr -> db_suit [ xlabel="in_id" ];
22: novor -> id_filter;
23: id_filter -> id_convert;
24: id_convert -> decoy_db;
25: decoy_db -> comet;
26: mzml -> db_suit [ xlabel="in_spec" ];
27: novor -> db_suit [ xlabel="in_novor" ];
28: db_suit -> tsv;
29: }
To draw a graph with dot you basicly only need to define the nodes and the edges. In the code above line 6 - 15 define the nodes and line 17 - 28 define the edges. Nodes have names and to define an edge you just type a -> between to node names, pretty straight foreward.

After defining all these you can give labels, change color and shape and link to other docu parts. If you want to specify default attributes for all nodes (or respectively all edges) you just type node/edge followed by the attributes in square brackets. This is done in line 2 and 3. If specific nodes need to have other values than the default these can simply be overriden by providing another value for the attribute in the node definition. This is done in line 6 f.e. where the shape is overriden to be "oval" instead of the default defined in line 1 "square".

The only thing left is the positioning of the nodes. Of cause this isn't necessary dot has a default positioning system, but it's still usefull.
To customize this x and y positions can be specified manually by giving coordinates. This is obviously a lot of work and can be done a little simpler. By assigning groups to the nodes it is tried to position nodes with the same group on the same vertical orientation. Sometimes this isn't possible and dot will ignore this. For me this was more or less a trial-and-error.
For horizontal orientation this is done with the rank attribute (example in line 16). This on the other hand worked perfectly.

Note: The orientation arguments will only work like this if the graph is drawn from top to bottom or reverse. If the graph is draw from left to right or right to left rank defines vertical orientation and group horizontal orientation.

To control in which direction the graph will be draw the rankdir needs to defined (the default is top to bottom). This is done in line 4.
All this together results in this graph:
This is pretty cool, I think!
If dot isn't installed the graph simply will not be drawn in the docu. So it would be good to have a back-up or error for this case. F.e. simply writting "Dot plot could not be drawn." or something like this. But there isn't a good way of doing this in doxygen. That's a problem for another time.

Now the only thing left is some testing. To get the test data I ran the whole workflow with a relatively small mzML input. After I got the test input data for the tool I just ran the tool with some different settings to get the test outputs. I than added the test to the CMakeList and that's it. I pushed the tests and docu to my git repository and with that added them to the PR. Now the PR is ready for review.
Next week I'll see what to change!








Comments

Popular posts from this blog

GSOC Final Report

GSOC Week 8: Benchmarking II

Project Plan