Go Rune
A rune is an alias for 32-bit integer values. They represent Unicode codepoints. A Unicode code point is an integer value that uniquely represents the character.
For example, the rune literal of 'a' is actually the number 97.
Runes
In old programming languages, such as C, there is no difference between a character and a byte, that is char
and byte
are the same type.
As a reminder, a byte is a sequence of 8 bits, whose value can be between 0 and 255.
However, in Go, there is the concept of a rune
which is a character that may be represented by more than one byte. For example, the character é
is represented by two bytes: 0xc3
and 0xa9
. This is a trick used to represent characters that are not in the original ASCII table (the characters used by American computers in the 1960s).
Thus a slice of bytes and a slice of runes are not the same thing.
var s string = "é"
fmt.Println(len(s)) // 2
var s string = "é"
fmt.Println(len([]rune(s))) // 1
Rune literals
A rune literal is a single character enclosed in single quotes, for example 'Ñ'
or '€'
.
You can also use the \u
escape sequence to represent a rune, for example \u00A1
or \u03B2
in case you don't have the character on your keyboard.
They can be asign to a variable of type rune
:
var r rune = 'Ñ'
The len()
function
Note that the example in the introduction to determine the length of a string only works with strings that can be converted to runes, that is strings of a single (possibly multi-byte) character.
Finding the length of rune
Here is the correct way to determine the length of a string in runes:
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
fmt.Println(utf8.RuneCountInString("Hello")) // 5
fmt.Println(utf8.RuneCountInString("Hello, 世界")) // 9
}
Output
5
9