Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Programming with Judy: C LanguageJudy Version 4.0 > Chapter 2 The Judy Advantage

Hashing versus JudyL

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

The figures below show JudyL compared to a commonly used hashing algorithm. The hash algorithm is a simple static table sized to be a power of two with a linear (linked list) collision chain. The table is a fixed size of 1,048,576 buckets (220). Each bucket is a pointer to a chain. Each chain element is 3 words (key, value, next pointer). The key, value and next pointer are all 64-bit quantities; therefore, the hash memory used should asymptotically approach 24 bytes per inserted data element.

To insert into the hash table, the key is hashed and then ANDed with the table size - 1 to find a bucket. It is also common to have a prime-number-sized hash table and use the mod function to get a table index. However, mod is very slow (~1 microsecond on the system we were using to acquire the above data) so we used the power of two table method instead. Chains are linear searched for a match. If there is no match, a new chain element is created. Finally, the data value is incremented (the data value becomes a usage count to verify the number of unique keys inserted).

The hash table was seriously overloaded by a factor of 40:1. Overloading is very undesirable but all too common with hashing. The resulting collision chains were an average of 39.15 in length, showing that our hash distribution was very good.

The hash table creation time is not shown in the data. Judy table creation is an integral part of JudyLIns since a Judy array starts at zero length and grows dynamically.

The hashing code for this benchmark was all in-lined. This is common with hashing but not a practical option with Judy. Mallopt(3C) was used to speed up the allocation of each hash element. Measurements were done on a 2GB HP 9000 J5000 workstation running HP-UX 11i at 440 MHz. The raw data is shown below.

NOTE: The data was collected using random numbers as keys. Random keys provide the best-case data for hashing and the worst case data for Judy. Judy's performance and memory usage will both improve with any data clustering.

Figure 2-2 Judy versus hash insertion

Judy versus hash insertion

Figure 2-3 Judy versus hash retrieval

Judy versus hash retrieval

Figure 2-4 Judy versus hash memory performance

Judy versus hash memory performance

Most programmers understand that hashing with a static table can deliver excellent performance. However when you use hashing, you must remember to:

  • Size your hash table correctly.

  • Beware of operators like mod (i.e. %) on a PA-RISC system.

  • Use any available optimizations, like mallopt(3C).

  • In-line your code.

When you use Judy, you don't need to remember anything on this checklist! Judy takes care of all the optimizations for you.

In summary

Judy is a hybrid digital tree that chooses data structures for each node appropriate to the population below the node, thereby allowing tree-level compression and balanced memory usage.

Over large populations, the Judy advantage becomes clear when Judy is benchmarked against other search and retrieval methods, such as hashing.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.