String Concatenation Performance in Go.
Summary: strings.Builder gives very good performance in general, but we can improve it even better.
Go string concatenation performance comparison
String concatenation is a basic task that I often do as a developer, but I have never considered much about its performance. This article is a result of my curiosity about this topic.
Navie #
The very normal approach that I often took when I first started with Go was to use a naive concatenation like this:
func concatNaive(ss ...string) string {
rs := ss[0]
for i := 1; i < len(ss); i++ {
rs += ss[i]
}
return rs
}
But this approach gives the worst performance. The statement rs += ss[i]
will allocate new memory each time it is called. And we all know that allocation reduces the performance of function calls. Hence, we have to avoid it as much as we can.
The simple way to avoid multiple allocations is to allocate them in advance and make sure we allocate enough memory to avoid any additional allocations during the function calls. I came up with another brilliant idea, where I allocate the memory in advance and assign it byte by byte:
func concatAssignIndex(ss ...string) string {
length := len(ss)
if length == 0 {
return ""
}
n := 0
for i := 0; i < length; i++ {
n += len(ss[i])
}
b := make([]byte, n, n)
idx := 0
for i := 0; i < length; i++ {
for j := 0; j < len(ss[i]); j++ {
b[idx] = ss[i][j]
idx++
}
}
return string(b)
}
I thought it should be great, but this turns out to be another naive solution. You can see the nested loops that cause the big performance issue in the above block of code.
So, I gave up and tried one of the solutions recommended by most people in the community, strings.Builder
.
String Builder #
strings.Builder
maintains an underlying byte buffer and exposes the Grow(int)
method so that we can use it to grow the underlying buffer. The code using the string builder will look like this:
func concatStringBuilder(ss ...string) string {
length := len(ss)
if length == 0 {
return ""
}
b := strings.Builder{}
n := len(ss[0])
for i := 1; i < length; i++ {
n += len(ss[i])
}
b.Grow(n)
for i := 0; i < length; i++ {
b.WriteString(ss[i])
}
return b.String()
}
Based on my testing, the strings.Builder
gives very good performance and should be used in general cases. But one thing to notice when using strings.Builder is that you have to calculate the buffer size and grow it accordingly. Otherwise, the buffer will need to grow more and more during the function call, which will slow it down.
I doubt if it's true that strings.Builder is the best one. I think we can improve it even further. Let's see how strings.Builder is implemented.
Improved String Builder #
If we look at the implementation of the strings.Builder, we can see that it maintains a slice of bytes, using make
with capacity to allocate memory in advance and append
to append the string to the buffer:
func (b *Builder) grow(n int) {
buf := make([]byte, len(b.buf), 2*cap(b.buf)+n)
copy(buf, b.buf)
b.buf = buf
}
func (b *Builder) WriteString(s string) (int, error) {
b.copyCheck()
b.buf = append(b.buf, s...)
return len(s), nil
}
This piece of code looks so good, and we can learn from it to make our own version. In our case, we don't really need b.copyCheck()
as we have a function instead of a method of a struct. We can eliminate this check and have a simple function like the one below, similar to what the strings.Builder tries to do:
func concatFast(ss ...string) string {
length := len(ss)
if length == 0 {
return ""
}
n := 0
for i := 0; i < length; i++ {
n += len(ss[i])
}
b := make([]byte, 0, n)
for i := 0; i < length; i++ {
b = append(b, ss[i]...)
}
return string(b)
}
The above function looks good, as we calculated and allocated the buffer to the exact size of the output string, but there's still a problem. The last statement, string(b)
causes a memory allocation and hence reduces the performance of the function call. Let's see the benchmark below, the concatFast has 2 allocations instead of 1, and its performance is slower than the strings.Builder:
goos: windows
goarch: amd64
pkg: github.com
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkSmall/concat_fast_using_append__________-12 8743449 148.8 ns/op 576 B/op 2 allocs/op
BenchmarkSmall/concat_string_builder_____________-12 12188716 95.24 ns/op 288 B/op 1 allocs/op
Is there any way to avoid that allocation? You might wonder! Yes, there is a way to avoid that allocation by using unsafe
package, and this is exactly the way strings.Builder does:
func (b *Builder) String() string {
return unsafe.String(unsafe.SliceData(b.buf), len(b.buf))
}
This hack is pretty cool, and we can apply it to our function, the improved version would look like this:
func concatFastImproved(ss ...string) string {
length := len(ss)
if length == 0 {
return ""
}
n := 0
for i := 0; i < length; i++ {
n += len(ss[i])
}
b := make([]byte, 0, n)
for i := 0; i < length; i++ {
b = append(b, ss[i]...)
}
return unsafe.String(unsafe.SliceData(b), n)
}
The above method looks pretty good and gives better performance compared to strings.Builder. An alternative to append
is copy
, and it would give a similar performance. Up to now, this has been the fastest solution so far.
One lesson I learn from this is that it's always a good idea to use built-in functions like copy
and append
as much as we can. And by talking about built-in, I wonder if the solution of using the concatenation operator would give a similar performance.
+ Operator #
This is one of the simplest solutions ever. Just do s[0] + s[2] + s[3]+ ... + s[n]
and we can have what we want. This solution gives a very good performance, which is similar to our improved version of strings.Builder.
I think the reason is because the way Go runtime also uses copy
in its implementation, but it seems its optimization is just for strings with length <= 32, so if our string is longer than that, the optimization seems not to be applied.
But notice that the challenge when you want to use this approach is that you have to know the length of the slice in advance. I think usage of this is limited.
Other solutions? #
Other solutions, including fmt.Sprintf
or using bytes.Buffer
give much worse performance than our improved version of string builder. But maybe there are still a lot of other solutions that I don't know yet.
The code and the benchmark #
I tested all the approaches with small, medium, and big slices with lengths of 10, 100, and 10,000 respectively to see the differences, and the winner is our improved version of strings.Builder.
You can see the code here in the gist. And below is the benchmark:
goos: windows
goarch: amd64
pkg: github.com
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkSmall/concat_naive______________________-12 2725071 449.6 ns/op 1520 B/op 9 allocs/op
BenchmarkSmall/concat_string_builder_without_grow-12 3993370 287.7 ns/op 984 B/op 5 allocs/op
BenchmarkSmall/concat_assign_index_______________-12 4681068 255.5 ns/op 288 B/op 1 allocs/op
BenchmarkSmall/concat_bytes_buffer_______________-12 6646387 179.3 ns/op 576 B/op 2 allocs/op
BenchmarkSmall/concat_fast_using_append__________-12 8743449 148.8 ns/op 576 B/op 2 allocs/op
BenchmarkSmall/concat_string_builder_____________-12 12188716 95.24 ns/op 288 B/op 1 allocs/op
BenchmarkSmall/concat_+_operator_________________-12 11912088 96.59 ns/op 288 B/op 1 allocs/op
BenchmarkSmall/concat_fast_improved______________-12 13002703 90.33 ns/op 288 B/op 1 allocs/op
BenchmarkSmall/concat_fast_improved_using_copy___-12 14131878 97.54 ns/op 288 B/op 1 allocs/op
BenchmarkMedium/concat_naive______________________-12 14560 82115 ns/op 509697 B/op 98 allocs/op
BenchmarkMedium/concat_string_builder_without_grow-12 172482 5974 ns/op 34240 B/op 12 allocs/op
BenchmarkMedium/concat_assign_index_______________-12 130524 8269 ns/op 9472 B/op 1 allocs/op
BenchmarkMedium/concat_bytes_buffer_______________-12 447615 2940 ns/op 18944 B/op 2 allocs/op
BenchmarkMedium/concat_fast_using_append__________-12 376473 2819 ns/op 18944 B/op 2 allocs/op
BenchmarkMedium/concat_string_builder_____________-12 655304 1619 ns/op 9472 B/op 1 allocs/op
BenchmarkMedium/concat_+_operator_________________-12 746950 1637 ns/op 9472 B/op 1 allocs/op
BenchmarkMedium/concat_fast_improved______________-12 749499 1613 ns/op 9472 B/op 1 allocs/op
BenchmarkMedium/concat_fast_improved_using_copy___-12 864777 1623 ns/op 9472 B/op 1 allocs/op
BenchmarkBig/concat_naive______________________-12 2 745434800 ns/op 4995658496 B/op 10072 allocs/op
BenchmarkBig/concat_string_builder_without_grow-12 1558 901401 ns/op 5241548 B/op 30 allocs/op
BenchmarkBig/concat_assign_index_______________-12 1155 941113 ns/op 999430 B/op 1 allocs/op
BenchmarkBig/concat_bytes_buffer_______________-12 2985 438088 ns/op 1998854 B/op 2 allocs/op
BenchmarkBig/concat_fast_using_append__________-12 2904 444509 ns/op 1998854 B/op 2 allocs/op
BenchmarkBig/concat_string_builder_____________-12 4792 245516 ns/op 999429 B/op 1 allocs/op
BenchmarkBig/concat_fast_improved______________-12 4142 242516 ns/op 999429 B/op 1 allocs/op
BenchmarkBig/concat_fast_improved_using_copy___-12 4478 260349 ns/op 999430 B/op 1 allocs/op
PASS
ok github.com 39.751s
Lessons Learn #
I think there are a couple of things we can learn from this:
- Avoid memory allocation as much as we can.
- Whenever a slice is used, try to allocate enough memory for it in advance.
- Take advantage of built-in functions like
append
, copy
,... - Sometimes
unsafe
package can provide some pretty good hacks for better performance, but they should be used with care. - Try to run benchmarks with multiple cases, small, medium, and large inputs to see the differences and improve the code from there.
- There are a lot of things we can learn from the Go SDK, like strings.Builder in this case.
- Explore the SDK code is a good way to learn and be a better Gopher.