Go Map Hash
maphash #
Map implementation is one of the interesting features in Go, it accepts anything as key, from string to numbers or even complex structs. And the magic thing is it is able to calculate the hash code of the given key effectively without the need to provide any hash code function. I feel this is really interesting and it stimulates my curiosity every single time I use map.
I would like to know how it's implemented and whether I can use the same implementation for some other contexts. But unfortunately, Go doesn't expose the hash code implementation for us to use directly. If you have enough knowledge about Go compiler and Go runtime you still can investigate and copy the implementation. I actually tried but found it's quite challenging and requires a lot of effort. It requires a lot of understanding about the compiler and runtime and also requires understanding of assembly code - which I feel is a little bit scary [1].
Luckily, Go have another package that helps to simplify the implementation of hash code for map. It is the package maphash
. With the maphash
package, we can easily implement the hash code the primitive types like string or numbers:
var h maphash.Hash
h.SetSeed(maphash.MakeSeed())
h.WriteString("hello")
hashCode := h.Sum64()
Playground
number := 12.3
var h maphash.Hash
h.SetSeed(maphash.MakeSeed())
binary.Write(&h, binary.BigEndian, number)
hashCode := h.Sum64() // 4236663090134902550
Playground
But what if we have a key of the map is a struct that includes fields of different types like below:
type Employee struct {
Name string
Age int
Address string
BirthDate time.Time
}
How can we implement the hash code? You might think that we just need to convert everything into string and contact them together to form a key? But it will cause some problems, the 2 following employees will have the same key and hence the same hash code:
e1 := Employee{
Name: "Jack",
Age: 15,
}
e2 := Employee{
Name: "Jack1",
Age: 5,
}
Playground
You can argue that we can add a separator between the fields... but then we need to identify which ones are safe to use. And if the separator collides with the value of the key?
Because of those complexities, we need another simple mechanism to apply. But unfortunately there is no guideline from Go on how to implement the hash code. Hence the best way I can imagine is to use the similar mechanism that I used to use while working with Java. This mechanism uses 2 primes numbers, one as the init of the hash code and another one as a multiply factor for the hash code. It is quite popular in the Java community. The steps can be described as below:
- Let call the hash code we want to calculate is
sum
- Init
sum
with value is a prime number. i.e. 7: sum = 7
- For every single field
n
of the given struct, calculate its hash code sum_n
.- If type of the field is kind of byte or string, write directly to the
maphash.Hash
- If type of the field is boolean, number, or slice values of them, use
binary.Write
to write to the maphash.Hash
- If it's time, get its binary value by calling
MarshalBinary()
and write to the maphash.Hash
. Or you can simply get the UnixNano()
and uses binary.Write
for it. - If you have a slice or map of types, iterate over it and apply the same principle to calculate its sum.
- Combine the
sum_n
with the total sum by multiple it with the multiply factor (which is also another prime number), i.e 31: sum = sum * 31 + sum_n
- Perform step #3 and #4 for all the fields of the struct that should be used for comparing the struct.
Applying the above rules, we can have the hash code for the struct Employee
as below:
var seed = maphash.MakeSeed()
func hashEmployee(e Employee) uint64 {
var h maphash.Hash
h.SetSeed(seed)
sum := uint64(7)
h.WriteString(e.Name)
sum = sum*31 + h.Sum64()
h.Reset()
binary.Write(&h, binary.LittleEndian, int64(e.Age))
sum = sum*31 + h.Sum64()
h.Reset()
h.WriteString(e.Address)
sum = sum*31 + h.Sum64()
h.Reset()
b, _ := e.BirthDate.MarshalBinary()
h.Write(b)
sum = sum*31 + h.Sum64()
return sum
}
hashEmployee(e1)
hashEmployee(e2)
Playground
I think this mechanism is quite easy to use and its performance is acceptable for production code.
Unsafe Hack #
If you are not happy with the performance of the above hash mechanism and still want to use the Go internal hash implementation, below is an unsafe hack that you might want to consider:
func UnsafeHashOf[T comparable]() func(seed maphash.Seed, v T) uint64 {
h := getHashFunc[T](map[T]struct{}{})
return func(seed maphash.Seed, v T) uint64 {
return h.Sum64(seed, v)
}
}
type abiType struct {
Size_ uintptr
PtrBytes uintptr
Hash uint32
TFlag uint8
Align_ uint8
FieldAlign_ uint8
Kind_ uint8
Equal func(unsafe.Pointer, unsafe.Pointer) bool
GCData *byte
Str int32
PtrToThis int32
}
type abiMapType[T comparable] struct {
abiType
Key *abiType
Elem *abiType
Bucket *abiType
Hasher hashFunc[T]
KeySize uint8
ValueSize uint8
BucketSize uint16
Flags uint32
}
type hashFunc[T comparable] func(value *T, seed maphash.Seed) uintptr
func (h hashFunc[T]) Sum64(seed maphash.Seed, v T) uint64 { return uint64(h(&v, seed)) }
type emptyInterface struct{ typ *abiType }
func getHashFunc[T comparable](v interface{}) hashFunc[T] {
return (*abiMapType[T])(unsafe.Pointer((*emptyInterface)(unsafe.Pointer(&v)).typ)).Hasher
}
Playground
Please note that the above hack is not safe and will be broken if Go change its internal types.
Performance of using maphash is much worse compared to the hack method but it's safe to use. Hence you can base on your use-case to decide which one is suitable.
goos: linux
goarch: amd64
pkg: github.com/pthethanh/cache
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkMapHashNormal-12 9833695 110.4 ns/op 184 B/op 3 allocs/op
BenchmarkMapHashHack-12 29772643 38.80 ns/op 64 B/op 1 allocs/op
PASS
ok github.com/pthethanh/cache 2.411s
Playground
[1] You can take a look at the implementation of map in Go at map.go, map_fast32.go, map_fast64.go, map_faststr.go, alg.go and some relevant assembly code files can be found in the same run time package.