TreeBASE visualization using JSON/NeXML and HTML5 canvas

This is a simple drawer for TreeBASE trees that is meant to demonstrate two points:

  • HTML5 canvas is neat. Fingers crossed for it being universally adopted because we might be able to do away with plugins (and the uncertainty of their being installed) and applets.
  • NeXML is not to be feared. If you don't like pointy braces for some reason or another you can always use its JSON incarnation instead.

Actually there's a third point, which is that Yahoo! pipes is really nice for glueing things together on the internet.

Here's how it all is going to fit together:

  1. First, we fetch a tree from TreeBASE from within this Yahoo! pipe. One of the output options that are available with pipes is as JSON, for which a callback function can be specified.
  2. That callback function is a bit of JavaScript code (shown below) that will pass the pipe's output into a little NeXML/JSON library that encapsulates the pipe's output into an API for easy access to the data. This will simplify tree traversal.
  3. We then traverse the tree object from the API to compute the Y-coordinates and the largest number of nodes between the focal node and the tips (I'm calling it the "depth" in the code below).
  4. In a second traversal we then compute the X-coordinates based on the node depths and draw the branches and tip labels on the HTML5 canvas

To make these steps all work we have to put things in our HTML in the right order so that nothing is undefined when we get to it. The following snippets can all go in the body of an HTML document, in the order I show them here:

This is the canvas element into which we're drawing the tree. We need to set the width so that we can compute the branch lengths relative to it (the root of the tree will be on the left, the tips on the right). The height of the canvas is variable, depending on the number of tips, and is adjusted by the javascript.

The second snippet is the JavaScript for steps 2-4, which should go inside <script type="text/javascript"></script> tags:

// This variable is used to deal with different XML2JSON mappings. 
// By default, the nexml.js library expects the badgerfish mapping,
// where XML attributes become object properties with an '@' prefix.
// Yahoo! pipes emits a different XML2JSON mapping, without a prefix
// for attributes. We need to configure this here.
var NeXMLAttributePrefix = '';  

// Increments every time we visit a tip in post-order.
// Used to compute Y-coordinates.
var tipcounter = 0;

// Sets the distance between tips.
var verticalDistance = 15;

// Callback that is executed by the output of the Yahoo! pipe.
function processJson(json) {

    // the nexml.js library expects an object with a 
    // field nex$nexml that subtends the document
    var nexml = { 'nex$nexml' : json.value.items[0] };
    var nexmlDoc = new NeXML.Document(nexml); 
    
    // there can be multiple tree blocks...
    var treesList = nexmlDoc.getTreesList()[0];
    
    // ...with multiple trees
    var tree = treesList.getTreeList()[0];
    var root = tree.getRootNode();
    
    // compute the y coordinates and depths for all nodes,
    // once we've computed the root node's depth we can
    // divide the canvas width by that to compute the x
    // coordinates
    computeCoordinates(tree,root);
    
    // html5 canvas
    var canvas  = document.getElementById('MyCanvas');
    canvas.height = ( tipcounter + 1 ) * verticalDistance;
    var context = canvas.getContext('2d');
    drawTree(tree,root,context,root.depth,canvas.width);
}

function drawTree(tree,node,ctx,maxdepth,width) {
    var children = tree.getChildNodes(node);
    for ( var i = 0; i < children.length; i++ ) {
        drawTree(tree,children[i],ctx,maxdepth,width);
    }   
    var parent = tree.getParentNode(node);
    if ( null != parent ) {
        var y1 = parent.y;
        var y2 = node.y;
        
        // 'width' is that of the canvas element. We subtract 200 to
        // leave space for tip labels, then divide the rest by the
        // longest root-to-tip path so we know the length of one 
        // cladogram branch and multiply by how many there are between
        // the focal node and the root.
        var x1 = ( ( width - 200 ) / maxdepth ) * ( maxdepth - parent.depth );
        var x2 = ( ( width - 200 ) / maxdepth ) * ( maxdepth - node.depth );
        
        // Writes the vertical path segment starting at the parent
        ctx.moveTo(x1,y1);
        ctx.lineTo(x1,y2);
        ctx.stroke();
        
        // Writes the horizontal path segment from the end of the
        // vertical segment to the child
        ctx.moveTo(x1,y2);
        ctx.lineTo(x2,y2);
        ctx.stroke();
        if ( children.length == 0 ) {
        
            // Writes the tip label
            ctx.fillText(node.getLabel(), x2 + 5, y2 + 5 );
        }
    }
}

function computeCoordinates(tree,node) {
    var children = tree.getChildNodes(node);
    
    // we do post-order traversal so that we first compute
    // child nodes' coordinates because their parents are
    // relative to them
    for ( var i = 0; i < children.length; i++ ) {
        computeCoordinates(tree,children[i]);
    }
    
    // processing tips is easy: they're just spread apart
    // by verticalDistance, and they're the shallowest ones
    if ( children.length == 0 ) {
        node.y = ++tipcounter * verticalDistance;
        node.depth = 1;
    }
    
    // for internal nodes we take as the y coordinate the
    // average of their immediate children. Of those children
    // we need to know which one is deepest (i.e. farthest 
    // away from the tips) and go one deeper than that
    else {
        var y_sum = 0;
        var max_depth = 0;
        for ( var i = 0; i < children.length; i++ ) {
            y_sum += children[i].y;
            if ( children[i].depth > max_depth ) {
                max_depth = children[i].depth;
            }
        }
        node.y = y_sum / children.length;
        node.depth = max_depth + 1;
    }
}

The following thing that needs to happen is the import of the nexml library. In this instance this needs to come after our hand-coded JavaScript because we need to set the NeXMLAttributePrefix to deal with the pipe's way of mapping XML to JSON: the library is by default expecting the "badgerfish" mapping that prefixes XML attributes with a '@' to distinguish them from element names. Pipes' output is a bit more simplistic, omitting such a prefix. Hence we set NeXMLAttributePrefix to an empty string. After that we can import the library:

Then, finally, we can call the pipe:

If you have a reasonable browser (I tested Chrome, Firefox and Safari) there should be a tree that shows the result below:

There are some follow-ups to this post:

  1. An implementation that generates SVG nodes instead of HTML5 canvas
  2. An extension of the SVG example with links to the NCBI taxonomy from RDFa annotations

5 comments:

  1. Nice, but I prefer SVG, partly because it is text-based (you can see the SVG in the browser DOM) and things like making labels interactive is trivial SVG (see http://iphylo.blogspot.com/2010/05/drawing-phylogeny-in-web-browser-using.html ). Then there is also the ability to zoom SVG, which could lead to some cool ways to navigate large trees.

    Oh, and for some reason Safari 5.1.2 wouldn't display the tree, but Chrome does.

    ReplyDelete
    Replies
    1. Second the vote for SVG. RafaelJS is a great library for it.

      This is really, cool, though.

      Delete
    2. >Oh, and for some reason Safari 5.1.2 wouldn't display the tree, but Chrome does

      For HTML5-canvas or SVG ? My Safari 5.1.2 sees the tree (above) just fine.

      Delete
    3. Here it is, using SVG: http://biophylo.blogspot.com/2012/01/treebase-visualization-using-jsonnexml_25.html

      Delete
  2. Here's another (better?) one that also demonstrates SVG interactivity by making the tips clickable using the NeXML RDFa-annotations that link taxa/OTUs to the NCBI taxonomy: http://biophylo.blogspot.com/2012/01/visualizing-rdfa-semantic-annotations.html

    ReplyDelete