Nested set in MySQL
There seems to be massive confusion about the concept behind nested set and when/how it should be used. In this article I will attempt to remove some of the mystery behind nested set, explain some of the magic and provide a few easy to understand examples for implementing and working with nested set in MySQL.
Before I get started I should note that there is a reasonably extensive article at MySQL on this topic. What I am trying to achieve here is an easier to understand explanation to the same problem.
What is nested set? When should I use it?
The nested set model is a way of efficiently storing and retrieving hierarchical data in a database. The model requires “sequence” data to be stored with each record to indicate it’s order and depth within a hierarchy. That stored sequence data can then be used to retrieve the data or parts of the data along with the associated depth information without the need to use multiple queries or sub-queries.
Nested set can be used whenever you need to store and retrieve any kind of hierarchical data. Regardless of whether it is single parented or multi-parented data. For example, if you have a “family tree” that you want to store in a database or if you need to store an organisational structure.
What is “sequence” data?
The sequence data stored with each record is simply two numbers stored in two separate fields (define as int unsigned in MySQL). The two numbers represent the “left” sequence number and the “right” sequence number that indicate the position of the record in the hierarchy relative to the other records.
If a record is a child of another record it’s left and right sequence will be between the left and right sequence values of the parent record.
1 Parent1 6 / \ / \ / \ 2 Child1 3 4 Child2 5
If a a record is “after” a sibling record at the same level of the hierarchy it’s left sequence value will be greater than the right sequence value of the previous sibling record.
1 Sibling1 2 – 3 Sibling2 4
The down side to using the nested set model is that whenever the order or depth of existing records are update or new records are added the “sequence” data for ALL records must be updated. Unfortunately this may mean that the nested set model will not work for you, particularly if you have a large amount of data to be stored (~1000 records).
An alternate solution to this drawback is to use nested intervals instead.
Organisational structure example
This example shows an example of a single parented organisational structure. In this example all of the data and the left and right sequence values can be stored in a single table.
- Senior managers
- Technical team leader
- Sales team leader
- Customer service team leader
- Technical team
- Sales team
- Customer service team
The SQL for the nested set schema/data:
DROP TABLE IF EXISTS organisation; CREATE TABLE organisation ( id int unsigned NOT NULL auto_increment, Name varchar(255) NOT NULL, CachedLeft int unsigned NOT NULL, CachedRight int unsigned NOT NULL, PRIMARY KEY (id) ); INSERT INTO organisation VALUES (NULL, 'CEO', 1, 16); INSERT INTO organisation VALUES (NULL, 'Senior managers', 2, 15); INSERT INTO organisation VALUES (NULL, 'Technical team leader', 3, 6); INSERT INTO organisation VALUES (NULL, 'Technical team', 4, 5); INSERT INTO organisation VALUES (NULL, 'Sales team leader', 7, 10); INSERT INTO organisation VALUES (NULL, 'Sales team', 8, 9); INSERT INTO organisation VALUES (NULL, 'Customer service team leader', 11, 14); INSERT INTO organisation VALUES (NULL, 'Customer service team', 12, 13);
The data above is a fairly simple, flat structure. In this structure you can select all of the child of a particular record by looking for all records with a CachedLeft value between the CachedLeft and CachedRight of the parent record. For example to select all children of the Senior managers record:
SELECT * FROM organisation WHERE CachedLeft BETWEEN 2 AND 15;
SELECT o.* FROM organisation o JOIN organisation p ON o.CachedLeft BETWEEN p.CachedLeft AND p.CachedRight WHERE p.Name = 'Senior managers';
Both of the above queries will return the same results. The first query can be used if you know the left and right sequence values of your parent. The second query can be used if you do not.
Family tree example
This example shows a very simple family tree. The particular difficulty with this data is that it is multi-parented. The main difference between single parented and multi-parented data is that you will need three tables instead of just one. The queries are also made much more complex as they require you to take much more consideration with how you want to data to be displayed. Note that in this model a single record must have a left and right sequence value for each parent it has.
- Mary Blogs (Grandmother)
- Henry Blogs (Grandfather)
- Sue Blogs (Aunty)
- Fred Blogs (Father) & Julie Blogs (Mother)
- Sam Blogs (Son)
The SQL for the nested set schema/data:
-- family table for storing data records DROP TABLE IF EXISTS family; CREATE TABLE family ( id int unsigned NOT NULL auto_increment, Name varchar(255), PRIMARY KEY (id) ); -- family_family table for storing the associations -- between records DROP TABLE IF EXISTS family_family; CREATE TABLE family_family ( FamilyID int unsigned NOT NULL, ParentID int unsigned NOT NULL, PRIMARY KEY (FamilyID, ParentID) ); -- familycache table to store the left and right -- sequences for each record/parent association DROP TABLE IF EXISTS familycache; CREATE TABLE familycache ( FamilyID int unsigned NOT NULL, CachedLeft int unsigned NOT NULL, CachedRight int unsigned NOT NULL, PRIMARY KEY (FamilyID, CachedLeft) ); -- insert the family records INSERT INTO family VALUES (1, 'Mary Blogs (Grandmother)'); INSERT INTO family VALUES (2, 'Henry Blogs (Grandfather)'); INSERT INTO family VALUES (3, 'Sue Blogs (Aunty)'); INSERT INTO family VALUES (4, 'Fred Blogs (Father) & Julie Blogs (Mother)'); INSERT INTO family VALUES (5, 'Sam Blogs (Son)'); -- insert family relations/record associations INSERT INTO family_family VALUES (3, 1); INSERT INTO family_family VALUES (3, 2); INSERT INTO family_family VALUES (4, 1); INSERT INTO family_family VALUES (4, 2); INSERT INTO family_family VALUES (5, 5); -- insert the cache sequences INSERT INTO familycache VALUES (1, 1, 8); INSERT INTO familycache VALUES (2, 1, 13); INSERT INTO familycache VALUES (3, 2, 3); INSERT INTO familycache VALUES (4, 4, 7); INSERT INTO familycache VALUES (5, 5, 6); INSERT INTO familycache VALUES (3, 9, 10); INSERT INTO familycache VALUES (4, 11, 12);
Updating the sequence values
There is a couple of ways you can update the the sequence values when a record changes or a new record is inserted. You can either do this by having a recursive stored procedure in MySQL or by using a similar recursive function/method in PHP (or whatever language you are using). The details of how to implement each of these options is beyond the scope of this article and may be covered in a future posting.
The main consideration you need to be aware of when updating you sequences is the ordering of your records within in level or the hierarchy. For example will it be sufficient to order each level by the Name column in the examples above? If not you may need to add a “Rank” column to store the required order of the records in each level.
Additionally, if you are updating the cache record via a PHP function or MySQL stored procedure (rather than manually specifying the values) you will also need to store a “ParentID” value for each record that has a parent.
As mentioned previously, this article was intended to provide a simple overview for nested set in MySQL. Hopefully I have managed to shed some degree of light on this topic although if you are still a little fuzzy on the subject please let me know.
Also, If you have any comments on anything in this article or require information on any other topics please feel free to contact me.