Go Map Hash

Thanh Pham / Sat 10 Jun 2023
Hash Function
Hash Function

maphash

Map implementation is one of the interesting features in Go, it accepts anything as key, from string to numbers or even complex structs. And the magic thing is it is able to calculate the hash code of the given key effectively without the need to provide any hash code function. I feel this is really interesting and it stimulates my curiosity every single time I use map.

I would like to know how it's implemented and whether I can use the same implementation for some other contexts. But unfortunately, Go doesn't expose the hash code implementation for us to use directly. If you have enough knowledge about Go compiler and Go runtime you still can investigate and copy the implementation. I actually tried but found it's quite challenging and requires a lot of effort. It requires a lot of understanding about the compiler and runtime and also requires understanding of assembly code - which I feel is a little bit scary [1].

Luckily, Go have another package that helps to simplify the implementation of hash code for map. It is the package `maphash`. With the `maphash` package, we can easily implement the hash code the primitive types like string or numbers:

var h maphash.Hash
h.SetSeed(maphash.MakeSeed())
h.WriteString("hello")
hashCode := h.Sum64() // 10811116414953202048
Playground
number := 12.3
var h maphash.Hash
h.SetSeed(maphash.MakeSeed())
binary.Write(&h, binary.BigEndian, number)
hashCode := h.Sum64() // 4236663090134902550
Playground

But what if we have a key of the map is a struct that includes fields of different types like below:

type Employee struct {
	Name      string
	Age       int
	Address   string
	BirthDate time.Time
}

How can we implement the hash code? You might think that we just need to convert everything into string and contact them together to form a key? But it will cause some problems, the 2 following employees will have the same key and hence the same hash code:

e1 := Employee{
	Name: "Jack",
	Age:  15,
}
// Key: Jack15

e2 := Employee{
	Name: "Jack1",
	Age:  5,
}
// Key: Jack15
Playground

You can argue that we can add a separator between the fields... but then we need to identify which ones are safe to use. And if the separator collides with the value of the key?

Because of those complexities, we need another simple mechanism to apply. But unfortunately there is no guideline from Go on how to implement the hash code. Hence the best way I can imagine is to use the similar mechanism that I used to use while working with Java. This mechanism uses 2 primes numbers, one as the init of the hash code and another one as a multiply factor for the hash code. It is quite popular in the Java community. The steps can be described as below:

  1. Let call the hash code we want to calculate is `sum`
  2. Init `sum` with value is a prime number. i.e. 7: `sum = 7`
  3. For every single field `n` of the given struct, calculate its hash code `sum_n`.
    1. If type of the field is kind of byte or string, write directly to the `maphash.Hash`
    2. If type of the field is boolean, number, or slice values of them, use `binary.Write` to write to the `maphash.Hash`
    3. If it's time, get its binary value by calling `MarshalBinary()` and write to the `maphash.Hash`. Or you can simply get the `UnixNano()` and uses `binary.Write` for it.
    4. If you have a slice or map of types, iterate over it and apply the same principle to calculate its sum.
  4. Combine the `sum_n` with the total sum by multiple it with the multiply factor (which is also another prime number), i.e 31: `sum = sum * 31 + sum_n`
  5. Perform step #3 and #4 for all the fields of the struct that should be used for comparing the struct.

Applying the above rules, we can have the hash code for the struct `Employee` as below:

var seed = maphash.MakeSeed()

func hashEmployee(e Employee) uint64 {
	var h maphash.Hash
	h.SetSeed(seed)
	sum := uint64(7)

	h.WriteString(e.Name)
	sum = sum*31 + h.Sum64()

	h.Reset()
	binary.Write(&h, binary.LittleEndian, int64(e.Age))
	sum = sum*31 + h.Sum64()

	h.Reset()
	h.WriteString(e.Address)
	sum = sum*31 + h.Sum64()

	h.Reset()
	b, _ := e.BirthDate.MarshalBinary()
	h.Write(b)
	sum = sum*31 + h.Sum64()

	return sum
}

hashEmployee(e1) // 4034376227000349563
hashEmployee(e2) // 9233382178721626727
Playground

I think this mechanism is quite easy to use and its performance is acceptable for production code.

Unsafe Hack

If you are not happy with the performance of the above hash mechanism and still want to use the Go internal hash implementation, below is an unsafe hack that you might want to consider:

func UnsafeHashOf[T comparable]() func(seed maphash.Seed, v T) uint64 {
	h := getHashFunc[T](map[T]struct{}{})
	return func(seed maphash.Seed, v T) uint64 {
		return h.Sum64(seed, v)
	}
}

// internal/abi/type.go -> Type
type abiType struct {
	Size_       uintptr
	PtrBytes    uintptr // number of (prefix) bytes in the type that can contain pointers
	Hash        uint32  // hash of type; avoids computation in hash tables
	TFlag       uint8   // extra type information flags
	Align_      uint8   // alignment of variable with this type
	FieldAlign_ uint8   // alignment of struct field with this type
	Kind_       uint8   // enumeration for C
	// function for comparing objects of this type
	// (ptr to object A, ptr to object B) -> ==?
	Equal func(unsafe.Pointer, unsafe.Pointer) bool
	// GCData stores the GC type data for the garbage collector.
	// If the KindGCProg bit is set in kind, GCData is a GC program.
	// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
	GCData    *byte
	Str       int32 // string form
	PtrToThis int32 // type for pointer to this type, may be zero
}

// internal/abi/type.go -> MapType
type abiMapType[T comparable] struct {
	abiType
	Key    *abiType
	Elem   *abiType
	Bucket *abiType // internal type representing a hash bucket
	// function for hashing keys (ptr to key, seed) -> hash
	Hasher     hashFunc[T]
	KeySize    uint8  // size of key slot
	ValueSize  uint8  // size of elem slot
	BucketSize uint16 // size of bucket
	Flags      uint32
}

type hashFunc[T comparable] func(value *T, seed maphash.Seed) uintptr

func (h hashFunc[T]) Sum64(seed maphash.Seed, v T) uint64 { return uint64(h(&v, seed)) }

type emptyInterface struct{ typ *abiType }

func getHashFunc[T comparable](v interface{}) hashFunc[T] {
	return (*abiMapType[T])(unsafe.Pointer((*emptyInterface)(unsafe.Pointer(&v)).typ)).Hasher
}
Playground

Please note that the above hack is not safe and will be broken if Go change its internal types.

Performance

Performance of using maphash is much worse compared to the hack method but it's safe to use. Hence you can base on your use-case to decide which one is suitable.

goos: linux
goarch: amd64
pkg: github.com/pthethanh/cache
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkMapHashNormal-12        9833695               110.4 ns/op           184 B/op          3 allocs/op
BenchmarkMapHashHack-12         29772643                38.80 ns/op           64 B/op          1 allocs/op
PASS
ok      github.com/pthethanh/cache      2.411s
Playground

[1] You can take a look at the implementation of map in Go at map.gomap_fast32.go, map_fast64.go, map_faststr.go, alg.go and some relevant assembly code files can be found in the same run time package.  

Next In
golang
Go Private Module

Setting up Go private module behinds corporate proxy