GoogleSQL for Bigtable supports collation. Collation defines rules to sort and compare strings in certain operations, such as conditional expressions, joins, and groupings.
By default, GoogleSQL sorts strings case-sensitively. This means that a
and
A
are treated as different letters, and Z
would come before a
.
Example default sorting: Apple, Zebra, apple
By contrast, collation lets you sort and compare strings case-insensitively or according to specific language rules.
Example case-insensitive collation: Apple, apple, Zebra
Collation is useful when you need fine-tuned control over how values are sorted, joined, or grouped in tables.
Operations that propagate collation
Collation can pass through some query operations to other parts of a query. When collation passes through an operation in a query, this is known as propagation. During propagation:
- If an input contains no collation specification or an empty collation specification and another input contains an explicitly defined collation, the explicitly defined collation is used for all of the inputs.
- All inputs with a non-empty explicitly defined collation specification must have the same type of collation specification, otherwise an error is thrown.
GoogleSQL has several functions, operators, and expressions that can propagate collation.
Functions
The following example functions propagate collation.
Function | Notes |
---|---|
ANY_VALUE |
|
ARRAY_AGG |
Collation on input arguments are propagated as collation on the array element. |
ARRAY_FIRST |
|
ARRAY_LAST |
|
ARRAY_OFFSET |
|
ARRAY_OFFSETS |
|
ARRAY_SLICE |
|
ARRAY_TO_STRING |
Collation on array elements are propagated to output. |
CONCAT |
|
FORMAT |
Collation from format_string to the returned string is propagated. |
FORMAT_DATE |
Collation from format_string to the returned string is propagated. |
FORMAT_DATETIME |
Collation from format_string to the returned string is propagated. |
FORMAT_TIME |
Collation from format_string to the returned string is propagated. |
FORMAT_TIMESTAMP |
Collation from format_string to the returned string is propagated. |
GREATEST |
|
LEAST |
|
LEFT |
|
LOWER |
|
LPAD |
|
MAX |
|
MIN |
|
NORMALIZE |
|
NORMALIZE_AND_CASEFOLD |
|
NULLIFERROR |
|
REPEAT |
|
REPLACE |
|
REVERSE |
|
RIGHT |
|
RPAD |
|
SOUNDEX |
|
SPLIT |
Collation on input arguments are propagated as collation on the array element. |
STRING_AGG |
|
SUBSTR |
|
UPPER |
Operators
The following example operators propagate collation.
Operator | Notes |
---|---|
|| concatenation operator |
|
Array subscript operator | Propagated to output. |
STRUCT field access operator |
When getting a STRUCT , collation on the STRUCT field is propagated as the output collation. |
UNNEST |
Collation on the input array element is propagated to output. |
Expressions
The following example expressions propagate collation.
Expression | Notes |
---|---|
ARRAY |
When you construct an ARRAY , collation on input arguments is propagated on the elements in the ARRAY . |
CASE |
|
CASE expr |
|
IF |
|
IFNULL |
|
NULLIF |
|
STRUCT |
When you construct a STRUCT , collation on input arguments is propagated on the fields in the STRUCT . |
Collation specification details
A collation specification determines how strings are sorted and compared in collation-supported operations. Binary collation is used by default and indicates that the operation should return data in Unicode code point order. You can't set binary collation explicitly.