This package contains the infrastructure for a transparent nested set overlay on an adjacency list representation of hierarchical data structures.
The nested set approach allows extremely scalabe and fast querying of data represented in a hierarchy, a directed acyclic graph. On the other hand, a nested set approach has extra cost when the tree is modified, so it's only appropriate for read-mostly trees. Consider the following representation of hierarchical data:
A / \ B C / \ D E /\ F G
(Note: This particular graph is a binary tree, however, it doesn't have to be binary or balanced.)
This is traditionally implemented with an adjacency list in a SQL DBMS:
NODE --------------- | ID | PARENT | --------------- | A | NULL | | B | A | | C | A | | D | C | | E | C | | F | D | | G | D | ---------------
You can now query for the whole subtree of node C (or any other node) - this is called a bill of materials explosion - with the following strategies:
Manual Recursion: Select all the children of node C, and if these nodes have children, recursively query until the whole subtree is loaded. This can be implemented with a stored procedure in SQL or by recursively exeucting SELECT statements in the application language. This strategy does not scale.
Proprietary Recursion: Oracle offers a CONNECT BY ... PRIOR extension to standard SQL that executes a recursive query inside the database, so you only have to execute one SELECT in the application language. This extension is proprietary to Oracle DBMS and has several flaws (conceptually), as documented here: Practical Issues in Database Management: A Reference for the Thinking Practitioner by Fabian Pascal.
Standardized Recursion: The SQL:1999 standard allows a recursive SELECT syntax using the WITH clause (subquery factoring). This however also has numerous flaws (see the Pascal book) and is not implemented by many SQL DBMSs. Furthermore, the implementation is often suboptimal, with global temporary tables. As a stopgap measure, please remind your SQL DBMS vendor to implement it (or a proper explode() operator as explained by Pascal), so that workarounds like the nested set or materialized paths are no longer needed.
Materialized Path: An additional column named PATH is added to the table and the values are concatenated strings such as /A/C/D for the tuple (F, D). This path value has to be manipulated whenever a node is added, deleted, or moved in the tree. You can query for a subtree by using string operations such as where NODES.PATH like "/A/C/%", which would return all children of C. The performance depends on the query optimizer and index usage for such an operation. The cost of each tree modification is high, although, it can be implemented with stored procedures in the DBMS.
Nested Set: A nested set approach is often the most flexible and portable strategy.
Consider the following addition of LEFT and RIGHT values to each node:
NODE ------------------------------ | ID | PARENT | LEFT | RIGHT | ------------------------------ | A | NULL | 1 | 14 | | B | A | 2 | 3 | | C | A | 4 | 13 | | D | C | 5 | 10 | | E | C | 11 | 12 | | F | D | 6 | 7 | | G | D | 8 | 9 | ------------------------------
These values have been created by traversing the tree from top-down from left to right. You can now query for all children of C as follows:
select count(n1.ID) as NODE_LEVEL, n1.ID from NODE n1, NODE n2 where n1.LEFT between n2.LEFT and n2.RIGHT and n2.LEFT => 4 and n2.RIGHT <=13 group by n1.ID order by n1.LEFT
Which returns the following result:
RESULT ------------------- | NODE_LEVEL | ID | ------------------- | 1 | C | | 2 | D | | 3 | F | | 3 | G | | 2 | E | -------------------
The disadvantage of the nested set model is the high cost of tree modifications, which require, depending on the actual data, significant recalculation of left/right values of nodes. This is where the Hibernate implementation in this package comes into the picture, it can transparently renumber a nested set tree when you modify your parent/child relationships in Java application code, and it can support you when you query for whole subtrees.
(Note: The PARENT column of the adjacency list is no longer needed if you have the LEFT and RIGHT values of each node - which can also be used to identify the parent of each node. However, the following examples and the implementation in this package assumes that you keep this column intact and that you use it for non-nested traversal "up the tree". In other words, this implementation is an overlay on the adjacency list structure with nested set tree traversal and queries.)
Assuming that you have an existing adjacency list implementation with a typical parent/child relationship represented with associations in Java:
@Entity @Table("ITEM") class Item { @Id @GeneratedValue @Column(name = "ID") private Long id; @ManyToOne @JoinColumn(name = "PARENT") private Item parent; @OneToMany(mappedBy = "parent") private Set<Item> children; // Constructor, getters and setters ... }
This implementation is based on mix-in and delegates. The ITEM table of the Item class will does not carry the left, right, and thread values. This job is delegated to an additional ItemNestedSetDelegate class:
@Entity @Table("ITEM") class Item implements NestedSetDelegateOwner { @Id @GeneratedValue @Column(name = "ID") private Long id; @ManyToOne @JoinColumn(name = "PARENT") private Item parent; @OneToMany(mappedBy = "parent") @org.hibernate.annotations.OnDelete(action = org.hibernate.annotations.OnDeleteAction.CASCADE) private Set<Item> children; @OneToOne(mappedBy="owner", fetch = FetchType.EAGER, cascade = {CascadeType.PERSIST, CascadeType.REMOVE}) private ItemNestedSetDelegate nestedSetDelegate; // Constructor, getters and setters ... }
The delegate class maintains the nested set information transparently and stores it in a separate table:
@Entity @Table("ITEM_NESTED_SET") class ItemNestedSetDelegate extends AbstractNestedSetDelegate<Item> { protected ItemNestedSetDelegate() {} public ItemNestedSetDelegate(Item owner) { super(owner); } }
It is recommended that you enable ON DELETE CASCADE as a foreign key option on the join column of the adjacency list. With this option, you can easily delete a node in the tree and have the guarantee that all its children are deleted as well (note that this might conflict with in-memory state if you continue using the persistence context after deletion or if you have the second-level cache enabled).
You need to call entityManager.persist(item) in the right order, e.g.: You need to call persist(A) before you call persist(B) and persist(C) if B and C are children of A. In any case, the order in which B and C are inserted is undefined, this is a Set in the example - and it doesn't matter. However, parents need to be inserted before children.
The tree is manipulated through the parent property and children collection of each node. Remember to always set both references if you link a node to a parent. If you want to save a new item, create it, link it to its parent with addChild(), persist it with the EntityManager and flush the persistence context. If you want to remove an item from the tree, unlink it with removeChild() from its parent, remove it with the EntityManager, then flush the persistence context.
The nested set tree table is automatically updated by the event listeners in this package, which you have to add to your Hibernate configuration, here for JPA with persistence.xml:
<persistence-unit ...> ... <properties> <property name="hibernate.ejb.event.post-insert" value="nestedset.NestedSetPostInsertEventListener"/> <property name="hibernate.ejb.event.post-delete" value="nestedset.NestedSetPostDeleteEventListener"/> </properties> </persistence-unit>
Consult the Hibernate documentation if you want to configure them in a different environment. Note that these new listeners extend the Hibernate EntityManager listeners and that all classes require JDK 5.0. You can however rewrite the code easily to make it work with plain Hibernate Core in JDK 1.4.
To query for a subtree, use the NestedSetWrapper and NestedSetResultTransformer convenience classes. An example, loading the whole subtree starting at startNode (which would be an instance of Item you have already loaded):
NestedSetQueryBuilder nsQuery = new NestedSetQueryBuilder(startNode); Query nestedSetQuery = session.createQuery(nsQuery.getSimpleQuery()); // Bind parameters nestedSetQuery.setParameter("nsThread", startNode.getNestedSetDelegate().getNsThread()); nestedSetQuery.setParameter("nsLeft", startNode.getNestedSetDelegate().getNsLeft()); nestedSetQuery.setParameter("nsRight", startNode.getNestedSetDelegate().getNsRight()); // Apply transformer that marshalls flat table result into an in-memory tree NestedSetNodeWrapper<ItemNestedSetDelegate> startNodeWrapper = new NestedSetNodeWrapper<ItemNestedSetDelegate>(startNode); nestedSetQuery.setResultTransformer( new NestedSetResultTransformer<ItemNestedSetDelegate>(startNodeWrapper) ); nestedSetQuery.list();
You can now traverse the tree by accessing startNodeWrapper.getWrappedParent(), startNodeWrapper.getWrappedChildren(), startNodeWrapper.getLevel(), and startNodeWrapper.getWrappedNode(). These wrappers wrap nested set delegates, so you get the real node by calling getOwner() if necessary. All sub-children as well as their owners are initialized with this single query.
Note that moving of nodes between parents is not yet supported by the even listeners. If you remove a node from a parent, and add it to another parent, the behavior is undefined.
Consult the Javadoc of each interface and class for more information. This implementation is licensed under the LGPL, any modification and distribution of modifications requires distribution of any modified source under the LGPL.