Skip to end of metadata
Go to start of metadata

A Character is a single token from the Unicode basic multilingual plane. It can also convert to the lowermost 16 bits of an integer.

Each Unicode character belongs to a certain category, which we can inspect using getType():

The surrogate category is divided into the high surrogates and the low surrogates. A Unicode supplementary character is represented by two Characters, the first from the high surrogates, the second from the low. Integers, known as code points, can also represent all Unicode characters, including supplementary ones. The code point is the same as a Character converted to an integer for basic plane characters, and its values continue from 0x10000 for supplementary characters. The upper 11 bits of the code point Integer must be zeros. Methods accepting only char values treat surrogate characters as undefined characters.

To convert a Unicode character between a code point and a Character array:

We can enquire of code points in a char array or string:

Every character also has a directionality:

Every character is part of a Unicode block:

Character assists integers using different radixes:

We can find the Unicode block for a loosely-formatted textual description of it:

Constructing and Using Characters

We can't represent Characters directly in our programs, but must construct them from a string:

There's a number of Character utility methods, accepting either a code point or a basic-plane character, that test some attribute of the character:

We can use characters instead of numbers in arithmetic operations:

We can auto-increment and -decrement characters:

Some miscellaneous methods:

  • No labels