Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Programming with Judy: C LanguageJudy Version 4.0 > Chapter 2 The Judy Advantage

Judy Data Structures

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Digital tree (trie) data structures

A digital tree or trie (pronounced try) is a multi-way or M-way tree whose nodes are vectors of M elements. Each element in a node is either a pointer to another node or to a leaf value. The tree is populated (and searched) using each digit of a key as an index into the nodes as they are traversed. A digit is determined by M. If M is 16, each digit is 4 bits. If M is 256 (as in Judy), each digit is 8 bits.

For example, assume we are storing a set of zip codes in 10-way digital tree. The zip codes are 87303, 92515, 92523, and 96503. Each level of the tree decodes one base-10 digit until a unique suffix remains. Zip code 87303 is stored at the top level of the tree hierarchy because the leading digit 8 is unique. Its position in the root node is node[8], as shown in Figure 2-1 “A ten-way digital tree ”.

Since more than one zip code starts with 9, root node[9] points to a second level node. The second level node[6] contains a unique suffix 6503 (the leading 9 is decoded in the root level). The second level node[2] points to the third level.

The third level node[5] points to the last level with the remaining suffixes. Therefore, zip code 92515 is represented in the array by level0[9]->level1[2]->level2[5] and level 3 contains the unique suffix 15. Zip code 92523 is represented in the array by level0[9]->level1[2]->level2[5] and level 3 contains the unique suffix 23.

Unlike an analog tree (B-tree, binary tree, etc.) no comparisons are necessary to search a digital tree until you reach the value. The index is divided into a series of array indexes until a unique index suffix is obtained.

Figure 2-1 A ten-way digital tree

A ten-way digital tree

Judy: a digital tree hybrid

Pure digital trees, like the one described above, typically exhibit very fast insertion and retrieval times; however, a digital tree often achieves fast performance at the cost of large memory use. A computation of 2564 uses over four billion indexes, while a computation of 2563 uses over 16 million indexes or at least 67 MB of storage (4 bytes/index).

Judy is a hybrid 256-way digital tree. Like a pure digital tree, each level in a Judy tree decodes one or more (8 bit) digits. Unlike a pure digital tree, each node is not necessarily a 256-way vector of elements.

Judy achieves scalability and excellent memory characteristics by choosing data structures for each node appropriate for the population below the node. For example, if you need a small associative array with 6 indexes, you would probably create a linear list to hold them. Judy does the same. If you need 600 million indexes, you would do something different. The same goes for Judy. As a Judy tree is populated, each node gets a data structure appropriate to the population under it.

To be this dynamic, Judy must know the population under any node. To do this, Judy keeps internal population counts at each node. These counts are also the reason why you can call Judy1Count or JudyLCount to find all the indexes between x and y and get an answer back much faster than you can count a linear array of the same population.

Here is what you might find in an element of a Judy node:

Pointers

Like a traditional tree, Judy provides pointers to the next tree level.

Indexes

Because it is based on a digital tree, Judy always stores indexes in sorted order or infers them from their position if the node is an M-way vector. The order in which values are stored is based on numeric value.

Type

Judy uses type information to track the kind of structure being pointed to in the next level.

Count

The index population stored below this node element.

Since Judy inherits the advantages of a digital tree structure, a Judy tree with 32-bit indexes is a maximum of 4 levels deep. A Judy tree with 64-bit indexes is a maximum of 8 levels deep. When you consider that a 64-bit tree could have 1.845 x 1019 indexes (if you had enough memory), this is an efficient search and retrieval storage technique.

With in-memory data structures like the current version of Judy, the depth of the tree is only an indication the CPU cycles necessary to access the data. However, the time required to retrieve that data is even more important. Many search and retrieval algorithms have problems because memory cache-line fills frequently dominate in memory search and retrieval times. Modern processors have become so fast that it is advantageous to trade CPU cycles to avoid cache-line fills whenever possible. Judy is designed with this trade-off in mind. (See Appendix B “Caching and Memory Management”.)

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.