Internet Engineering Task Force (IETF) M. Davis
Request for Comments: 6067 Google
Category: Informational A. Phillips
ISSN: 2070-1721 Lab126
Y. Umaoka
IBM
December 2010
BCP 47 Extension U
Abstract
This document specifies an Extension to BCP 47 that provides subtags
that specify language and/or locale-based behavior or refinements to
language tags, according to work done by the Unicode Consortium.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are a candidate for any level of Internet
Standard; see Section 2 of RFC 5741.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc6067.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Davis, et al. Informational [Page 1]
RFC 6067 BCP 47 Unicode Locale Extension December 2010
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 2
2. BCP 47 Required Information . . . . . . . . . . . . . . . . . . 2
2.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1. Canonicalization . . . . . . . . . . . . . . . . . . . 5
2.2. Registration Form . . . . . . . . . . . . . . . . . . . . . 6
3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 6
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
5. Security Considerations . . . . . . . . . . . . . . . . . . . . 7
6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6.1. Normative References . . . . . . . . . . . . . . . . . . . 7
6.2. Informative References . . . . . . . . . . . . . . . . . . 7
1. Introduction
[BCP47] permits the definition and registration of language tag
extensions "that contain a language component and are compatible with
applications that understand language tags". This document defines
an extension for identifying Unicode locale-based variations using
language tags. The "singleton" identifier for this extension is 'u'.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. BCP 47 Required Information
Language tags, as defined by [BCP47], are useful for identifying the
language of content. They are also used as locale identifiers (or
can be mapped to locales) in many operating environments and APIs.
However, many locale identifiers also require additional "tailorings"
or options for specific values within a language, culture, region, or
other variation. This extension provides a mechanism for using these
additional tailorings within language tags for general interchange.
The Unicode Consortium defines a standardized, structured set of
locale data and identifiers for locale data in the "Common Locale
Data Repository" or "CLDR". The maintaining authority for the
extension defined by this document is the Unicode Consortium:
Davis, et al. Informational [Page 2]
RFC 6067 BCP 47 Unicode Locale Extension December 2010
+---------------+---------------------------------------------------+
| Item | Value |
+---------------+---------------------------------------------------+
| Name | Unicode Consortium |
| Contact Email | cldr-contact@unicode.org |
| Discussion | cldr-users@unicode.org |
| List Email | |
| URL Location | cldr.unicode.org |
| Specification | Unicode Technical Standard #35 Unicode Locale |
| | Data Markup Language (LDML), |
| | http://unicode.org/reports/tr35/ |
| Section | Section 3 Unicode Language and Locale Identifiers |
+---------------+---------------------------------------------------+
The specification of extension subtags is provided by Section 3, Key
Type Definitions of Unicode Technical Standard #35: Unicode Locale
Data Markup Language [UTS35]. As required by BCP 47, subtags follow
the language tag ABNF and other rules for the formation of language
tags and subtags, are restricted to the ASCII letters and digits, are
not case sensitive, and do not exceed eight characters in length.
Note that any "well-formed" language tag (see RFC 5646, Section 2.2.9
[BCP47]) is also a well-formed locale identifier.
LDML [UTS35] specifies a canonical representation. LDML is available
over the Internet and at no cost, and is available via a royalty-free
license at http://unicode.org/copyright.html. LDML is versioned, and
each version of LDML is numbered, dated, and stable. Extension
subtags, once defined by LDML, are never retracted and never change
in meaning in a substantial way.
The structure of the Unicode locale extension is determined by the
Unicode CLDR Technical Committee, in accordance with the policies and
procedures in http://www.unicode.org/consortium/tc-procedures.html,
and subject to the Unicode Consortium Policies on
http://www.unicode.org/policies/policies.html.
Changes that can be made by successive versions of LDML [UTS35] by
the Unicode Consortium without requiring a new RFC include: the
allocation of new attributes, keywords, and types; clarifications or
non-material changes to an existing attribute, keyword, or type; and
compatible extensions to the overall syntactic structure of
attributes, keywords, and types. A new RFC would be required for
material changes to an existing attribute, keyword, or type, or an
incompatible change to the overall syntactic structure of attributes,
keywords, and types; however, such a change would be contrary to the
policies of the Unicode Consortium, and thus is not anticipated.
Davis, et al. Informational [Page 3]
RFC 6067 BCP 47 Unicode Locale Extension December 2010
2.1. Summary
The subtags available for use in the 'u' extension consist of a set
of attributes, keys, and types. Attributes, keys, types, and their
respective meanings are defined in Section 3 (Unicode Language and
Locale Identifiers) of [UTS35]. The following is a summary of that
definition:
o An 'attribute' is a subtag with a length of three to eight
characters following the singleton and preceding any 'keyword'
sequences. No attributes were defined at the time of this
document's publication.
o A 'keyword' is a sequence of subtags consisting of a 'key' subtag,
followed by zero or more 'type' subtags (so a 'key' might appear
alone and not be accompanied by a 'type' subtag). A 'key' MUST
NOT appear more than once in a language tag's extension string.
The order of the 'type' subtags within a 'keyword' is sometimes
significant to their interpretation.
A. A 'key' is a subtag with a length of exactly two characters.
Each 'key' is followed by zero or more 'type' subtags.
B. A 'type' is a subtag with a length of three to eight
characters following a 'key'. 'Type' subtags are specific to
a particular 'key' and the order of the 'type' subtags MAY be
significant to the interpretation of the 'keyword'.
For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
o The base language tag "de-DE" (German as used in Germany), exactly
as defined by [BCP47] using subtags from the IANA Language Subtag
Registry.
o The singleton 'u', identifying this extension.
o The attribute 'attr', which is an example for illustration (no
attributes were defined at the time this document was published).
o The keyword 'co-phonebk', consisting to the key 'co' (Collation)
and the type 'phonebk' (Phonebook collation order).
Only the first occurrence of an attribute or key conveys meaning in a
language tag. When interpreting tags containing the Unicode locale
extension, duplicate attributes or keywords are ignored in the
following way: ignore any attribute that has already appeared in the
tag and ignore any keyword whose key has already occurred in the tag.
Davis, et al. Informational [Page 4]
RFC 6067 BCP 47 Unicode Locale Extension December 2010
Successive versions of [UTS35] could define additional attributes,
keys, and types. Once defined, attributes, keys, and types will
never be removed.
Beginning with CLDR version 1.7.2, machine-readable files are
available listing the valid attributes, keys, and types for each
successive version of [UTS35]. These releases are listed on
http://cldr.unicode.org/index/downloads. Each release has an
associated data directory of the form
"http://unicode.org/Public/cldr/