About
Text data is encoded/stored/transformed on the computer in bytes thanks to a character set that maps text to bytes.
Historically, the character sets were single-byte character sets that could hold 256 characters. It meant that one character needed only one byte. But with the globalization of the world, it was not enough anymore to hold all character's languages, and the multi-byte character sets were born.
The problems and Oracle error
The issues are that :
- when using multi-byte character sets, namely that a VARCHAR2(N) doesn't necessarily hold N characters, but rather N bytes.
- the maximum length in bytes:
Leadings to these Oracle errors :
- ORA-06502: PL/SQL: numeric or value error: character string buffer too small
- ORA-12899: value too large for column
How to specify the length unit (byte or char)?
With SQL
The VARCHAR2 and CHAR SQL data types support two methods of specifying lengths:
- In bytes to support up to 10 bytes of data (possibly as few as two characters in a multi-byte character sets).
VARCHAR2(10 byte)
- In characters to support to up 10 characters of data (and use 40 byte of storage)
VARCHAR2(10 char)
With the NLS_LENGTH_SEMANTICS setting
The session or system parameter NLS_LENGTH_SEMANTICS changes the default text length unit from BYTE to CHAR.
- Session
alter session set nls_length_semantics=char;
alter session set nls_length_semantics=byte;
- System (not recommended)
alter system set nls_length_semantics=char scope=both;
alter system set nls_length_semantics=byte scope=both;
With the Text functions
You can specify the length unit in the textual functions by choosing:
- INSTR, LENGTH and SUBSTR to specify characters
- INSTRB, LENGTHB, and SUBSTRB to specify byte.
For instance,
- with the substr function, the positions 3 and 4 are characters
SUBSTR('abcdefg',3,4)
- with the substrb function, the positions 3 and 4 are bytes. The oracle database looks up and works with the character set defines in the database to transform abcdefg in bytes.
SUBSTRB('abcdefg',3,4)