| United States-English |
|
|
|
![]() |
Programming with Judy: C LanguageJudy Version 4.0Chapter 1 About Judy |
|
Table of Contents The Judy technology brings the benefits of exceptionally efficient, dynamic arrays to the C programmer.
What is Judy?The Judy technology is a C language library that provides an unbounded array capability. For example, suppose you could declare most arrays as:
Now imagine that the machine uses only RAM when data is stored into an array. Before you continue you need more information. How much RAM is required and how fast is storage and retrieval accomplished?Normally you would expect a ratio of at least three words per element stored: one word for the index (key), one word for the value, and one or more words for the overhead to manage the structure. A linked list takes three words. A well written binary tree or skip list requires more memory. By contrast, the Judy technology uses much less memory per element than traditional data structures (depending on the density and population of the indexes). It is remarkable that Judy often uses less than two words per element with very high populations. This is possible because Judy stores the indexes in sorted order allowing compression of the common digits in the index word. With binary-tree structures,
you expect the speed to be close to a log2 N function
of the number of elements stored. Judy's speed is closer to Scalability is Judy's greatest strength. However, the Judy technology has many more benefits, including fast search and retrieval, and built-in sorting and counting functions. Design goalsJudy is designed to replace many common data structures, such as arrays, sparse arrays, hash tables, B-trees, binary trees, linear lists, skiplists, and counting functions. Judy functions combine sparse arrays with innovative memory management and retrieval. Judy is implemented as a tree and dynamically optimizes each node based on the population under that node. Judy also optimizes memory access to be more efficient than other algorithms that ignore memory caching. As a result, Judy exhibits better than O(log256 N) retrieval behavior.[1]Judy has been used successfully for arrays with as little as one element (actually zero) to arrays with billions of elements. Since the technology is self-adapting, its design can support future machines with much larger memories than currently available. Like many other algorithms, such as B-trees but notably excluding hashing, Judy keeps indexes in sorted order.[2] Judy also includes a powerful counting function that returns the number of data elements between any two keys or counts to any ordinal (position) within an array.[3] For example, Judy can store the first million prime numbers. Then it can efficiently determine how many prime numbers exist between any random pair of numbers (prime or not). Because Judy maintains counts internally, the count function call can return an answer very quickly. [1] Big O notation is used to compare two algorithms that accomplish the same task. It is a theoretical measure of the execution of an algorithm, usually in terms of the time or memory needed. For more information, visit this web site: www.wiu.edu/users/mflll/cs350/onot.htm . [2] Because Judy necessarily defines the sorted order, the order cannot be modified to include user-defined sorting criteria (see “Sorting”). [3] Counting functions are available in Judy1 arrays (Judy1Count and Judy1ByCount) and in JudyL arrays (JudyLCount and JudyLByCount). |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||