String Concatenation Performance in Go.

Summary: strings.Builder gives very good performance in general, but we can improve it even better.

Go string concatenation performance comparison

String concatenation is a basic task that I often do as a developer, but I have never considered much about its performance. This article is a result of my curiosity about this topic.

The very normal approach that I often took when I first started with Go was to use a naive concatenation like this:

func concatNaive(ss ...string) string {
rs := ss[0]
for i := 1; i < len(ss); i++ {
rs += ss[i] // cause memory allocation
}
return rs
}

But this approach gives the worst performance. The statement rs += ss[i] will allocate new memory each time it is called. And we all know that allocation reduces the performance of function calls. Hence, we have to avoid it as much as we can.

The simple way to avoid multiple allocations is to allocate them in advance and make sure we allocate enough memory to avoid any additional allocations during the function calls. I came up with another brilliant idea, where I allocate the memory in advance and assign it byte by byte:

func concatAssignIndex(ss ...string) string {
length := len(ss)
if length == 0 {
return ""
}
// create & allocate the memory in advance.
n := 0
for i := 0; i < length; i++ {
n += len(ss[i])
}
b := make([]byte, n, n)
idx := 0
for i := 0; i < length; i++ {
for j := 0; j < len(ss[i]); j++ {
b[idx] = ss[i][j]
idx++
}
}
return string(b)
}

I thought it should be great, but this turns out to be another naive solution. You can see the nested loops that cause the big performance issue in the above block of code.

So, I gave up and tried one of the solutions recommended by most people in the community, strings.Builder.

String Builder #

strings.Builder maintains an underlying byte buffer and exposes the Grow(int) method so that we can use it to grow the underlying buffer. The code using the string builder will look like this:

func concatStringBuilder(ss ...string) string {
length := len(ss)
if length == 0 {
return ""
}
b := strings.Builder{}
// calculate the buffer size
n := len(ss[0])
for i := 1; i < length; i++ {
n += len(ss[i])
}
// grow the buffer to avoid memory allocation during writing new string.
b.Grow(n)
for i := 0; i < length; i++ {
b.WriteString(ss[i])
}
return b.String()
}

Based on my testing, the strings.Builder gives very good performance and should be used in general cases. But one thing to notice when using strings.Builder is that you have to calculate the buffer size and grow it accordingly. Otherwise, the buffer will need to grow more and more during the function call, which will slow it down.

I doubt if it's true that strings.Builder is the best one. I think we can improve it even further. Let's see how strings.Builder is implemented.

Improved String Builder #

If we look at the implementation of the strings.Builder, we can see that it maintains a slice of bytes, using make with capacity to allocate memory in advance and append to append the string to the buffer:

// grow copies the buffer to a new, larger buffer so that there are at least n
// bytes of capacity beyond len(b.buf).
func (b *Builder) grow(n int) {
buf := make([]byte, len(b.buf), 2*cap(b.buf)+n)
copy(buf, b.buf)
b.buf = buf
}

// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
b.copyCheck()
b.buf = append(b.buf, s...)
return len(s), nil
}

This piece of code looks so good, and we can learn from it to make our own version. In our case, we don't really need b.copyCheck() as we have a function instead of a method of a struct. We can eliminate this check and have a simple function like the one below, similar to what the strings.Builder tries to do:

func concatFast(ss ...string) string {
length := len(ss)
if length == 0 {
return ""
}
n := 0
for i := 0; i < length; i++ {
n += len(ss[i])
}
// create & allocate the memory in advance.
b := make([]byte, 0, n)
for i := 0; i < length; i++ {
b = append(b, ss[i]...)
}
// below statement causes a memory allocation.
return string(b)
}

The above function looks good, as we calculated and allocated the buffer to the exact size of the output string, but there's still a problem. The last statement, string(b) causes a memory allocation and hence reduces the performance of the function call. Let's see the benchmark below, the concatFast has 2 allocations instead of 1, and its performance is slower than the strings.Builder:

goos: windows
goarch: amd64
pkg: github.com
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkSmall/concat_fast_using_append__________-12 8743449 148.8 ns/op 576 B/op 2 allocs/op
BenchmarkSmall/concat_string_builder_____________-12 12188716 95.24 ns/op 288 B/op 1 allocs/op

Is there any way to avoid that allocation? You might wonder! Yes, there is a way to avoid that allocation by using unsafe package, and this is exactly the way strings.Builder does:

func (b *Builder) String() string {
return unsafe.String(unsafe.SliceData(b.buf), len(b.buf))
}

This hack is pretty cool, and we can apply it to our function, the improved version would look like this:

func concatFastImproved(ss ...string) string {
length := len(ss)
if length == 0 {
return ""
}
// create & allocate the memory in advance.
n := 0
for i := 0; i < length; i++ {
n += len(ss[i])
}
b := make([]byte, 0, n)
for i := 0; i < length; i++ {
b = append(b, ss[i]...)
}
return unsafe.String(unsafe.SliceData(b), n)
}

The above method looks pretty good and gives better performance compared to strings.Builder. An alternative to append is copy, and it would give a similar performance. Up to now, this has been the fastest solution so far.

One lesson I learn from this is that it's always a good idea to use built-in functions like copy and append as much as we can. And by talking about built-in, I wonder if the solution of using the concatenation operator would give a similar performance.

+ Operator #

This is one of the simplest solutions ever. Just do s[0] + s[2] + s[3]+ ... + s[n] and we can have what we want. This solution gives a very good performance, which is similar to our improved version of strings.Builder.

I think the reason is because the way Go runtime also uses copy in its implementation, but it seems its optimization is just for strings with length <= 32, so if our string is longer than that, the optimization seems not to be applied.

But notice that the challenge when you want to use this approach is that you have to know the length of the slice in advance. I think usage of this is limited.

Other solutions? #

Other solutions, including fmt.Sprintf or using bytes.Buffer give much worse performance than our improved version of string builder. But maybe there are still a lot of other solutions that I don't know yet.

The code and the benchmark #

I tested all the approaches with small, medium, and big slices with lengths of 10, 100, and 10,000 respectively to see the differences, and the winner is our improved version of strings.Builder.

You can see the code here in the gist. And below is the benchmark:

goos: windows
goarch: amd64
pkg: github.com
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkSmall/concat_naive______________________-12 2725071 449.6 ns/op 1520 B/op 9 allocs/op
BenchmarkSmall/concat_string_builder_without_grow-12 3993370 287.7 ns/op 984 B/op 5 allocs/op
BenchmarkSmall/concat_assign_index_______________-12 4681068 255.5 ns/op 288 B/op 1 allocs/op
BenchmarkSmall/concat_bytes_buffer_______________-12 6646387 179.3 ns/op 576 B/op 2 allocs/op
BenchmarkSmall/concat_fast_using_append__________-12 8743449 148.8 ns/op 576 B/op 2 allocs/op
BenchmarkSmall/concat_string_builder_____________-12 12188716 95.24 ns/op 288 B/op 1 allocs/op
BenchmarkSmall/concat_+_operator_________________-12 11912088 96.59 ns/op 288 B/op 1 allocs/op
BenchmarkSmall/concat_fast_improved______________-12 13002703 90.33 ns/op 288 B/op 1 allocs/op
BenchmarkSmall/concat_fast_improved_using_copy___-12 14131878 97.54 ns/op 288 B/op 1 allocs/op

BenchmarkMedium/concat_naive______________________-12 14560 82115 ns/op 509697 B/op 98 allocs/op
BenchmarkMedium/concat_string_builder_without_grow-12 172482 5974 ns/op 34240 B/op 12 allocs/op
BenchmarkMedium/concat_assign_index_______________-12 130524 8269 ns/op 9472 B/op 1 allocs/op
BenchmarkMedium/concat_bytes_buffer_______________-12 447615 2940 ns/op 18944 B/op 2 allocs/op
BenchmarkMedium/concat_fast_using_append__________-12 376473 2819 ns/op 18944 B/op 2 allocs/op
BenchmarkMedium/concat_string_builder_____________-12 655304 1619 ns/op 9472 B/op 1 allocs/op
BenchmarkMedium/concat_+_operator_________________-12 746950 1637 ns/op 9472 B/op 1 allocs/op
BenchmarkMedium/concat_fast_improved______________-12 749499 1613 ns/op 9472 B/op 1 allocs/op
BenchmarkMedium/concat_fast_improved_using_copy___-12 864777 1623 ns/op 9472 B/op 1 allocs/op

BenchmarkBig/concat_naive______________________-12 2 745434800 ns/op 4995658496 B/op 10072 allocs/op
BenchmarkBig/concat_string_builder_without_grow-12 1558 901401 ns/op 5241548 B/op 30 allocs/op
BenchmarkBig/concat_assign_index_______________-12 1155 941113 ns/op 999430 B/op 1 allocs/op
BenchmarkBig/concat_bytes_buffer_______________-12 2985 438088 ns/op 1998854 B/op 2 allocs/op
BenchmarkBig/concat_fast_using_append__________-12 2904 444509 ns/op 1998854 B/op 2 allocs/op
BenchmarkBig/concat_string_builder_____________-12 4792 245516 ns/op 999429 B/op 1 allocs/op
BenchmarkBig/concat_fast_improved______________-12 4142 242516 ns/op 999429 B/op 1 allocs/op
BenchmarkBig/concat_fast_improved_using_copy___-12 4478 260349 ns/op 999430 B/op 1 allocs/op
PASS
ok github.com 39.751s

Lessons Learn #

I think there are a couple of things we can learn from this: