Introduction

Contents

What is a tree data structure?

I guess that if you are reading this, you probably know more than me. A tree is a hierarchical data structure where, where the data are organized in nodes, that can have 0 or several children nodes. The topmost node is called the root of the tree, and has no parent and no siblings. The "bottom" nodes, that have no children, are called the leaves.

Contrary to real trees, it is common to represent data trees with the root at the top, like this:

            EMS
   +---------+----+
   |              |
  MS              E
 +-+--+      +----+----+
 |    |      |         |
MSa  MSp    Ea        Ep
           +-+--+    +-+--+
           |    |    |    |
          Eal  Ear  Epl  Epr

Here, EMS is the tree root, it has two children named MS and E, and the nodes named MSa, MSp, Eal, Ear, Epl and Epr are the leaves.

A bit of off-topic culture: The example depicted here is a subset of a C.elegans embryo cell lineage. C.elegans are small worms that are used a lot in Biology. Some research is made following a C.elegans embryo from the very first cell in the egg. It divides in two, each daughter subsequently divide in two, and so on. It turns out that for C.elegans, the dividing scheme and the cell fate is totally reproducible and predictable in healthy embryo, a rare feature amongst living species. Each cell has therefore a standardized name, and the above tree is a part of the lineage.

A tree is typically a good structure to hold any measure made on each cell. You can store a value in such a tree, and be able to compare it to the children, the siblings or parent value. A linear array does not offer this structure, and is more suited for instance to time series.

Note that another data structure would actually be more suited for this specific example: Since each cell has always exactly two daughter, a binary tree would be a better. A binary tree is a more specialized kind of tree, where each node has exactly two children, and their order is determined by their identity, which can be left or right. This tree class is a generic tree, where each node can have any number of children. The siblings order is not specific, and simply correspond to the addition order.

We will use anyway the C.elegans lineage data as an example. The tree class ships with examples that include a partial C.elegans lineage from the first cell:

lineage = tree.example;
disp(lineage.tostring)
                                                                Zygote                                                                 
                               +----------------------------------+--------------------------------+                                   
                               |                                                                   |                                   
                              AB                                                                  P1                                   
               +---------------+---------------+                                 +----------------+-----------------+                  
               |                               |                                 |                                  |                  
             AB.a                            AB.p                               P2                                 EMS                 
       +-------+-------+               +-------+-------+             +----------+-----+                 +-----------+-----+            
       |               |               |               |             |                |                 |                 |            
     AB.al           AB.ar           AB.pl           AB.pr          P3                C                MS                 E            
   +---+---+       +---+---+       +---+---+       +---+---+       ++----+      +-----+-----+        +--+--+        +-----+-----+      
   |       |       |       |       |       |       |       |       |     |      |           |        |     |        |           |      
AB.ala  AB.alp  AB.ara  AB.arp  AB.pla  AB.plp  AB.pra  AB.prp    P4     D     C.a         C.p     MS.a  MS.p      E.a         E.p     
                                                                 +-+-+       +--+--+     +--+--+                 +--+--+     +--+--+   
                                                                 |   |       |     |     |     |                 |     |     |     |   
                                                                Z2  Z3     C.aa  C.ap  C.pa  C.pp              E.al  E.ar  E.pl  E.pr  

Basic information on a tree

A tree object has simply two members:

lineage %#ok<NOPTS>
lineage = 

  tree with properties:

      Node: {41x1 cell}
    Parent: [41x1 double]

Parent is an integer array that stores the parent of the node at this index. Node is a cell array that stores the content of this node. We are between gentlemen, so you can read and even modify these fields at your will. But it is safer to use the methods described below. They also make it easier to manipulate the tree.

The nnodes method returns the total number of nodes in a tree:

nnodes(lineage)
ans =

    41

And the tostring method we have seen above generates a nice char matrix picturing the tree itself.

Back to main page.