Skip to content

jhash in Linux kernel

Nisarg Joshi edited this page Sep 30, 2019 · 1 revision

Jenkins Hash function in the Linux kernel

Introduction

The Linux kernel has a hash function defined called the Jenkins hash function. This hash function was first developed by Bob Jenkins. Bob Jenkins went through many iterations of the function from one_at_a_time hash to lookup2 to lookup3. Currently, the Linux kernel(as of v5.3) is using the lookup3 version of the Jenkins hash. Here is the link to the original implementation of the hash function: Lookup3 original implementation.

Into the kernel

The hash function is defined in a file called jhash.h in /include/linux/jhash.h as well as /tools/include/linux/jhash.h. The hash function implemetation is almost a direct implementation of the jenkins original implementation except for a few minor changes.

Here is a code snippet of the function called jhash2 in the kernel

/* jhash2 - hash an array of u32's
 * @k: the key which must be an array of u32's
 * @length: the number of u32's in the key
 * @initval: the previous hash, or an arbitray value
 *
 * Returns the hash value of the key.
 */
static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
{
	u32 a, b, c;

	/* Set up the internal state */
	a = b = c = JHASH_INITVAL + (length<<2) + initval;

	/* Handle most of the key */
	while (length > 3) {
		a += k[0];
		b += k[1];
		c += k[2];
		__jhash_mix(a, b, c);
		length -= 3;
		k += 3;
	}

	/* Handle the last 3 u32's */
	switch (length) {
	case 3: c += k[2];	/* fall through */
	case 2: b += k[1];	/* fall through */
	case 1: a += k[0];
		__jhash_final(a, b, c);
	case 0:	/* Nothing left to add */
		break;
	}

	return c;
}

Why is it called jhash2 and not jhash? Because there is a function called jhash already that does the same thing but for input keys of arbitrary nature, i.e it can hash characters, numbers etc. For that jhash has to do some manipulations to align the bytes in the register(I don't understand this fully at the time of writing this). The above function takes a u32 array of integers and returns a u32 hash value.

Also the functions __jhash_mix(a,b,c) and __jhash_final(a,b,c) Are basically used to create something called an avalanche effect for the input bytes. In simple terms, these functions create an effect where each bit of input bytes affects each and every bit of output bit of the hash. This ensures that whenever the input bits change the output changes significantly and thus less collisions.

Here is a definition of these functions:

/* __jhash_mix -- mix 3 32-bit values reversibly. */
#define __jhash_mix(a, b, c)			\
{						\
	a -= c;  a ^= rol32(c, 4);  c += b;	\
	b -= a;  b ^= rol32(a, 6);  a += c;	\
	c -= b;  c ^= rol32(b, 8);  b += a;	\
	a -= c;  a ^= rol32(c, 16); c += b;	\
	b -= a;  b ^= rol32(a, 19); a += c;	\
	c -= b;  c ^= rol32(b, 4);  b += a;	\
}

/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
#define __jhash_final(a, b, c)			\
{						\
	c ^= b; c -= rol32(b, 14);		\
	a ^= c; a -= rol32(c, 11);		\
	b ^= a; b -= rol32(a, 25);		\
	c ^= b; c -= rol32(b, 16);		\
	a ^= c; a -= rol32(c, 4);		\
	b ^= a; b -= rol32(a, 14);		\
	c ^= b; c -= rol32(b, 24);		\
}

In case you are wondering rol32 does left rotation. It is defined in kernel as(/include/linux/bitops.h):

/**
 * rol32 - rotate a 32-bit value left
 * @word: value to rotate
 * @shift: bits to roll
 */
static inline __u32 rol32(__u32 word, unsigned int shift)
{
	return (word << (shift & 31)) | (word >> ((-shift) & 31));
}

Thus there we have it. Jenkins hash implementation in the Linux kernel!

Clone this wiki locally