Go String Concat Performance

Thanh Pham / Mon 10 Apr 2023
Summary.strings.Builder gives very good performance in general, but we can improve it even better.
Go string concatenation performance comparison
Go string concatenation performance comparison

String concatenation is a basic task that I often do as a developer, but I have never considered much about its performance. This article is a result of my curiosity about this topic.

Navie

The very normal approach that I often took when I first started with Go was to use a naive concatenation like this:

func concatNaive(ss ...string) string {
	rs := ss[0]
	for i := 1; i < len(ss); i++ {
		rs += ss[i] // cause memory allocation
	}
	return rs
}

But this approach gives the worst performance. The statement `rs += ss[i]` will allocate new memory each time it is called. And we all know that allocation reduces the performance of function calls. Hence, we have to avoid it as much as we can.

The simple way to avoid multiple allocations is to allocate them in advance and make sure we allocate enough memory to avoid any additional allocations during the function calls. I came up with another brilliant idea, where I allocate the memory in advance and assign it byte by byte:

func concatAssignIndex(ss ...string) string {
	length := len(ss)
	if length == 0 {
		return ""
	}
	// create & allocate the memory in advance.
	n := 0
	for i := 0; i < length; i++ {
		n += len(ss[i])
	}
	b := make([]byte, n, n)
	idx := 0
	for i := 0; i < length; i++ {
		for j := 0; j < len(ss[i]); j++ {
			b[idx] = ss[i][j]
			idx++
		}
	}
	return string(b)
}

I thought it should be great, but this turns out to be another naive solution. You can see the nested loops that cause the big performance issue in the above block of code.

So, I gave up and tried one of the solutions recommended by most people in the community, `strings.Builder`.

String Builder

`strings.Builder` maintains an underlying byte buffer and exposes the `Grow(int)` method so that we can use it to grow the underlying buffer. The code using the string builder will look like this:

func concatStringBuilder(ss ...string) string {
	length := len(ss)
	if length == 0 {
		return ""
	}
	b := strings.Builder{}
	// calculate the buffer size
	n := len(ss[0])
	for i := 1; i < length; i++ {
		n += len(ss[i])
	}
	// grow the buffer to avoid memory allocation during writing new string.
	b.Grow(n)
	for i := 0; i < length; i++ {
		b.WriteString(ss[i])
	}
	return b.String()
}

Based on my testing, the `strings.Builder` gives very good performance and should be used in general cases. But one thing to notice when using strings.Builder is that you have to calculate the buffer size and grow it accordingly. Otherwise, the buffer will need to grow more and more during the function call, which will slow it down.

I doubt if it's true that strings.Builder is the best one. I think we can improve it even further. Let's see how strings.Builder is implemented.

Improved String Builder

If we look at the implementation of the strings.Builder, we can see that it maintains a slice of bytes, using `make` with capacity to allocate memory in advance and `append` to append the string to the buffer:

// grow copies the buffer to a new, larger buffer so that there are at least n
// bytes of capacity beyond len(b.buf).
func (b *Builder) grow(n int) {
	buf := make([]byte, len(b.buf), 2*cap(b.buf)+n)
	copy(buf, b.buf)
	b.buf = buf
}

// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
	b.copyCheck()
	b.buf = append(b.buf, s...)
	return len(s), nil
}

This piece of code looks so good, and we can learn from it to make our own version. In our case, we don't really need `b.copyCheck()` as we have a function instead of a method of a struct. We can eliminate this check and have a simple function like the one below, similar to what the strings.Builder tries to do:

func concatFast(ss ...string) string {
	length := len(ss)
	if length == 0 {
		return ""
	}
	n := 0
	for i := 0; i < length; i++ {
		n += len(ss[i])
	}
	// create & allocate the memory in advance.
	b := make([]byte, 0, n)
	for i := 0; i < length; i++ {
		b = append(b, ss[i]...)
	}
	// below statement causes a memory allocation.
	return string(b)
}

The above function looks good, as we calculated and allocated the buffer to the exact size of the output string, but there's still a problem. The last statement, `string(b)` causes a memory allocation and hence reduces the performance of the function call. Let's see the benchmark below, the concatFast has 2 allocations instead of 1, and its performance is slower than the strings.Builder:

goos: windows
goarch: amd64
pkg: github.com
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkSmall/concat_fast_using_append__________-12             8743449               148.8 ns/op           576 B/op          2 allocs/op
BenchmarkSmall/concat_string_builder_____________-12            12188716                95.24 ns/op          288 B/op          1 allocs/op

Is there any way to avoid that allocation? You might wonder! Yes, there is a way to avoid that allocation by using `unsafe` package, and this is exactly the way strings.Builder does:

func (b *Builder) String() string {
	return unsafe.String(unsafe.SliceData(b.buf), len(b.buf))
}

This hack is pretty cool, and we can apply it to our function, the improved version would look like this:

func concatFastImproved(ss ...string) string {
	length := len(ss)
	if length == 0 {
		return ""
	}
	// create & allocate the memory in advance.
	n := 0
	for i := 0; i < length; i++ {
		n += len(ss[i])
	}
	b := make([]byte, 0, n)
	for i := 0; i < length; i++ {
		b = append(b, ss[i]...)
	}
	return unsafe.String(unsafe.SliceData(b), n)
}

The above method looks pretty good and gives better performance compared to strings.Builder. An alternative to `append` is `copy`, and it would give a similar performance. Up to now, this has been the fastest solution so far.

One lesson I learn from this is that it's always a good idea to use built-in functions like `copy` and `append` as much as we can. And by talking about built-in, I wonder if the solution of using the concatenation operator would give a similar performance.

+ Operator

This is one of the simplest solutions ever. Just do `s[0] + s[2] + s[3]+ ... + s[n]` and we can have what we want. This solution gives a very good performance, which is similar to our improved version of strings.Builder.

I think the reason is because the way Go runtime also uses `copy` in its implementation, but it seems its optimization is just for strings with length <= 32, so if our string is longer than that, the optimization seems not to be applied.

But notice that the challenge when you want to use this approach is that you have to know the length of the slice in advance. I think usage of this is limited.

Other solutions?

Other solutions, including `fmt.Sprintf` or using `bytes.Buffer` give much worse performance than our improved version of string builder. But maybe there are still a lot of other solutions that I don't know yet.

The code and the benchmark

I tested all the approaches with small, medium, and big slices with lengths of 10, 100, and 10,000 respectively to see the differences, and the winner is our improved version of strings.Builder.

You can see the code here in the gist. And below is the benchmark:

goos: windows
goarch: amd64
pkg: github.com
cpu: 12th Gen Intel(R) Core(TM) i7-1255U
BenchmarkSmall/concat_naive______________________-12             2725071               449.6 ns/op          1520 B/op          9 allocs/op
BenchmarkSmall/concat_string_builder_without_grow-12             3993370               287.7 ns/op           984 B/op          5 allocs/op
BenchmarkSmall/concat_assign_index_______________-12             4681068               255.5 ns/op           288 B/op          1 allocs/op
BenchmarkSmall/concat_bytes_buffer_______________-12             6646387               179.3 ns/op           576 B/op          2 allocs/op
BenchmarkSmall/concat_fast_using_append__________-12             8743449               148.8 ns/op           576 B/op          2 allocs/op
BenchmarkSmall/concat_string_builder_____________-12            12188716                95.24 ns/op          288 B/op          1 allocs/op
BenchmarkSmall/concat_+_operator_________________-12            11912088                96.59 ns/op          288 B/op          1 allocs/op
BenchmarkSmall/concat_fast_improved______________-12            13002703                90.33 ns/op          288 B/op          1 allocs/op
BenchmarkSmall/concat_fast_improved_using_copy___-12            14131878                97.54 ns/op          288 B/op          1 allocs/op

BenchmarkMedium/concat_naive______________________-12              14560             82115 ns/op          509697 B/op         98 allocs/op
BenchmarkMedium/concat_string_builder_without_grow-12             172482              5974 ns/op           34240 B/op         12 allocs/op
BenchmarkMedium/concat_assign_index_______________-12             130524              8269 ns/op            9472 B/op          1 allocs/op
BenchmarkMedium/concat_bytes_buffer_______________-12             447615              2940 ns/op           18944 B/op          2 allocs/op
BenchmarkMedium/concat_fast_using_append__________-12             376473              2819 ns/op           18944 B/op          2 allocs/op
BenchmarkMedium/concat_string_builder_____________-12             655304              1619 ns/op            9472 B/op          1 allocs/op
BenchmarkMedium/concat_+_operator_________________-12             746950              1637 ns/op            9472 B/op          1 allocs/op
BenchmarkMedium/concat_fast_improved______________-12             749499              1613 ns/op            9472 B/op          1 allocs/op
BenchmarkMedium/concat_fast_improved_using_copy___-12             864777              1623 ns/op            9472 B/op          1 allocs/op

BenchmarkBig/concat_naive______________________-12                     2         745434800 ns/op        4995658496 B/op    10072 allocs/op
BenchmarkBig/concat_string_builder_without_grow-12                  1558            901401 ns/op         5241548 B/op         30 allocs/op
BenchmarkBig/concat_assign_index_______________-12                  1155            941113 ns/op          999430 B/op          1 allocs/op
BenchmarkBig/concat_bytes_buffer_______________-12                  2985            438088 ns/op         1998854 B/op          2 allocs/op
BenchmarkBig/concat_fast_using_append__________-12                  2904            444509 ns/op         1998854 B/op          2 allocs/op
BenchmarkBig/concat_string_builder_____________-12                  4792            245516 ns/op          999429 B/op          1 allocs/op
BenchmarkBig/concat_fast_improved______________-12                  4142            242516 ns/op          999429 B/op          1 allocs/op
BenchmarkBig/concat_fast_improved_using_copy___-12                  4478            260349 ns/op          999430 B/op          1 allocs/op
PASS
ok      github.com        39.751s

Lessons Learn

I think there are a couple of things we can learn from this:

  • Avoid memory allocation as much as we can.
  • Whenever a slice is used, try to allocate enough memory for it in advance.
  • Take advantage of built-in functions like `append`, `copy`,...
  • Sometimes `unsafe` package can provide some pretty good hacks for better performance, but they should be used with care.
  • Try to run benchmarks with multiple cases, small, medium, and large inputs to see the differences and improve the code from there.
  • There are a lot of things we can learn from the Go SDK, like strings.Builder in this case.
  • Explore the SDK code is a good way to learn and be a better Gopher.
Next In
golang
Go Map Hash

How to implement hash code in Go

Hash Function