Go Map Hash

maphash #

Map implementation is one of the interesting features in Go, it accepts anything as key, from string to numbers or even complex structs. And the magic thing is it is able to calculate the hash code of the given key effectively without the need to provide any hash code function. I feel this is really interesting and it stimulates my curiosity every single time I use map.

I would like to know how it's implemented and whether I can use the same implementation for some other contexts. But unfortunately, Go doesn't expose the hash code implementation for us to use directly. If you have enough knowledge about Go compiler and Go runtime you still can investigate and copy the implementation. I actually tried but found it's quite challenging and requires a lot of effort. It requires a lot of understanding about the compiler and runtime and also requires understanding of assembly code - which I feel is a little bit scary [1].

Luckily, Go have another package that helps to simplify the implementation of hash code for map. It is the package maphash. With the maphash package, we can easily implement the hash code the primitive types like string or numbers:

var h maphash.Hash
h.SetSeed(maphash.MakeSeed())
h.WriteString("hello")
hashCode := h.Sum64() // 10811116414953202048

Playground

number := 12.3
var h maphash.Hash
h.SetSeed(maphash.MakeSeed())
binary.Write(&h, binary.BigEndian, number)
hashCode := h.Sum64() // 4236663090134902550

Playground

But what if we have a key of the map is a struct that includes fields of different types like below:

type Employee struct {
Name string
Age int
Address string
BirthDate time.Time
}

How can we implement the hash code? You might think that we just need to convert everything into string and contact them together to form a key? But it will cause some problems, the 2 following employees will have the same key and hence the same hash code:

e1 := Employee{
Name: "Jack",
Age: 15,
}
// Key: Jack15

e2 := Employee{
Name: "Jack1",
Age: 5,
}
// Key: Jack15

Playground

You can argue that we can add a separator between the fields... but then we need to identify which ones are safe to use. And if the separator collides with the value of the key?

Because of those complexities, we need another simple mechanism to apply. But unfortunately there is no guideline from Go on how to implement the hash code. Hence the best way I can imagine is to use the similar mechanism that I used to use while working with Java. This mechanism uses 2 primes numbers, one as the init of the hash code and another one as a multiply factor for the hash code. It is quite popular in the Java community. The steps can be described as below:

  1. Let call the hash code we want to calculate is sum
  2. Init sum with value is a prime number. i.e. 7: sum = 7
  3. For every single field n of the given struct, calculate its hash code sum_n.
    1. If type of the field is kind of byte or string, write directly to the maphash.Hash
    2. If type of the field is boolean, number, or slice values of them, use binary.Write to write to the maphash.Hash
    3. If it's time, get its binary value by calling MarshalBinary() and write to the maphash.Hash. Or you can simply get the UnixNano() and uses binary.Write for it.
    4. If you have a slice or map of types, iterate over it and apply the same principle to calculate its sum.
  4. Combine the sum_n with the total sum by multiple it with the multiply factor (which is also another prime number), i.e 31: sum = sum * 31 + sum_n
  5. Perform step #3 and #4 for all the fields of the struct that should be used for comparing the struct.

Applying the above rules, we can have the hash code for the struct Employee as below:

var seed = maphash.MakeSeed()

func hashEmployee(e Employee) uint64 {
var h maphash.Hash
h.SetSeed(seed)
sum := uint64(7)

h.WriteString(e.Name)
sum = sum*31 + h.Sum64()

h.Reset()
binary.Write(&h, binary.LittleEndian, int64(e.Age))
sum = sum*31 + h.Sum64()

h.Reset()
h.WriteString(e.Address)
sum = sum*31 + h.Sum64()

h.Reset()
b, _ := e.BirthDate.MarshalBinary()
h.Write(b)
sum = sum*31 + h.Sum64()

return sum
}

hashEmployee(e1) // 4034376227000349563
hashEmployee(e2) // 9233382178721626727

Playground

I think this mechanism is quite easy to use and its performance is acceptable for production code.

Unsafe Hack #

If you are not happy with the performance of the above hash mechanism and still want to use the Go internal hash implementation, below is an unsafe hack that you might want to consider:

func UnsafeHashOf[T comparable]() func(seed maphash.Seed, v T) uint64 {
h := getHashFunc[T](map[T]struct{}{})
return func(seed maphash.Seed, v T) uint64 {
return h.Sum64(seed, v)
}
}

// internal/abi/type.go -> Type
type abiType struct {
Size_ uintptr
PtrBytes uintptr // number of (prefix) bytes in the type that can contain pointers
Hash uint32 // hash of type; avoids computation in hash tables
TFlag uint8 // extra type information flags
Align_ uint8 // alignment of variable with this type
FieldAlign_ uint8 // alignment of struct field with this type
Kind_ uint8 // enumeration for C
// function for comparing objects of this type
// (ptr to object A, ptr to object B) -> ==?
Equal func(unsafe.Pointer, unsafe.Pointer) bool
// GCData stores the GC type data for the garbage collector.
// If the KindGCProg bit is set in kind, GCData is a GC program.
// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
GCData *byte
Str int32 // string form
PtrToThis int32 // type for pointer to this type, may be zero
}

// internal/abi/type.go -> MapType
type abiMapType[T comparable] struct {
abiType
Key *abiType
Elem *abiType
Bucket *abiType // internal type representing a hash bucket
// function for hashing keys (ptr to key, seed) -> hash
Hasher hashFunc[T]
KeySize uint8 // size of key slot
ValueSize uint8 // size of elem slot
BucketSize uint16 // size of bucket
Flags uint32
}

type hashFunc[T comparable] func(value *T, seed maphash.Seed) uintptr

func (h hashFunc[T]) Sum64(seed maphash.Seed, v T) uint64 { return uint64(h(&v, seed)) }

type emptyInterface struct{ typ *abiType }

func getHashFunc[T comparable](v interface{}) hashFunc[T] {
return (*abiMapType[T])(unsafe.Pointer((*emptyInterface)(unsafe.Pointer(&v)).typ)).Hasher
}

Playground

Please note that the above hack is not safe and will be broken if Go change its internal types.

Performance #

Performance of using maphash is much worse compared to the hack method but it's safe to use. Hence you can base on your use-case to decide which one is suitable.

goos: linux
goarch: amd64
pkg: github.com/pthethanh/cache
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkMapHashNormal-12 9833695 110.4 ns/op 184 B/op 3 allocs/op
BenchmarkMapHashHack-12 29772643 38.80 ns/op 64 B/op 1 allocs/op
PASS
ok github.com/pthethanh/cache 2.411s

Playground


[1] You can take a look at the implementation of map in Go at map.go, map_fast32.go, map_fast64.go, map_faststr.go, alg.go and some relevant assembly code files can be found in the same run time package.