How to extract a substring in Go

Support for extracting a part of a string is built into Go, but there’s one important aspect to consider.

Although Go does not provide a variant of the substring() method that’s found in other programming languages, it’s easy enough to extract a subset of a string in the language. The way to do so is to use the slice syntax as shown below:

func main() {
	str := "This is a string"
	substr := str[0:4]
	fmt.Println(substr) // This
}

The substring returned is based on the start index and end index specified. In the above example, the start index is 0 and the end index is 4. The former is inclusive while the latter is exclusive (which means that the character in that position is not included in the substring).

One major thing to note is that the above technique only works for ASCII strings. That is, strings that contain only English letters, Arabic numerals and a few symbols. For strings that contain non-ASCII Unicode characters (such as emojis, characters from other languages, e.t.c.), a conversion to a rune slice is required before taking a subslice. Otherwise, you will get inaccurate or corrupted results.

// Wrong way
func main() {
	str := "I ♥ emojis 😍"
	substr := str[2:3]
	fmt.Println(substr) // empty string instead of ♥ character
}

// Right way
func main() {
	str := "I ♥ emojis 😍"
	runes := []rune(str) // convert string to rune slice
	substr := string(runes[2:3]) // take subslice of rune and convert back to string
	fmt.Println(substr) // ♥
}

The reason is for this is that one ASCII character corresponds to one byte. But a Unicode character may not necessarily correspond to one byte, but can be composed of several bytes.

func main() {
	str := "This is a string"
	substr := str[:4]
	fmt.Println(substr) // This
}

If you’re slicing from the start of the string, you may omit the start index as I’ve done above.

func main() {
	str := "This is a string"
	substr := str[10:]
	fmt.Println(substr) // string
}

And if you’re slicing to the end of the string, you may omit the end index as I’ve done above.

func main() {
	str := "This is a string"
	substr := str[0:4:10] // invalid operation str[0:4:10] (3-index slice of string)
	fmt.Println(substr)
}

When creating a slice from another slice, it’s possible to specify the maximum capacity of the slice using the 3-index syntax shown above. It should be noted that this syntax is invalid for slicing strings.

func main() {
	str := "This is a string"
	substr := str[10:17] // slice bounds out of range [:17] with length 16
	fmt.Println(substr)
}

It’s also important to ensure that the indexes provided are within the bounds of the string. Otherwise, the program will panic with an error such as the one shown above.

Wrap up

As you can see, extracting a part of a string is pretty straightforward in Go. My recommendation is that you always convert the string to a rune slice first before slicing, unless you can guarantee that the string contains ASCII characters only. In that case, slicing the string directly should be sufficient.

Thanks for reading, and happy coding!