[Ietf-not43] issues and questions on FIRS

Eric A. Hall ehall at ehsco.com
Sun Aug 17 14:58:47 EDT 2003


on 8/16/2003 11:17 AM Andrew Newton wrote:

> #1 - Naming Syntax of inetDnsDomainSyntax (Section 3 in 
> draft-ietf-crisp-firs-dns).
> 
> First, I do not understand why the domain name is being specified as 
> UTF-8.  I understand that this is being done in consideration of IDN's, 
> but this is a service about the actually registered domain names.  While 
> the UTF-8 equivalences are important, the key for the domain names (the 
> rdn in this case) should be the wire equivalence since this about the 
> registration of these domain names.  In other words, I think the key 
> should be the 7-bit ASCII versions that show up in zone files and via 
> dig, etc... and not the 8-bit versions.

This is a complex issue but the simple answer is that the UTF-8 form will
be the preferred form eventually, and it will be simpler in the long run
to do UTF-8 now rather than try to accomodate two systems later. Now for
the complex answer:

1) I had to choose one. It is likely that clients will exist that only
accept seven-bit ASCII for input/output, but it is also likely that there
will be clients which will use UTF-8, and it is also likely that some
clients will require some other local encoding (eg, UCS-2 on Win32).
However, protocol efficiency requires that a single canonical form be used
for the message data. The "majority" is going to have to do some kind of
conversion no matter what. None of them are significantly cheaper at this
exact moment in time.

All systems are required to do validation anyway. The cost of local
conversion is likely to be relatively minor if it done during the
validation process, so there's no significant incentive for any format
here either.

The wire format for LDAP protocol is UTF-8. Since all LDAP libraries
already have to support UTF-8 conversion in order to render LDAP data for
local input and output (EG, a company name, or a street address), this
means that reusing the existing UTF-8 conversion logic is going to be the
cheapest for everybody, since they code would have to exist already.

Furthermore, non-trivial comparisons are easier using UTF-8 than ACE.
Sub-string searches, soundex searches, etc., any of which may already be
implemented in LDAP servers for UTF-8, can be reused pretty easily (or
extended partially to allow for things like dot-separators between
labels). With something like IDNA, however, the servers would need
separate searches for all of these, with the first step being to produce a
normalized (UCS) instance. Using UTF-8 as the default makes this stuff
much much easier, and this is a very compelling argument.

2) i18n domain names have a canonical representation as UCS characters.
The IDNA (ASCII-compatible) representation is just one of many possible
UCS-to-<target> encodings for IDNs. The RFCs don't hammer this point
strong enough for my tastes, but the references to different "slots" makes
this clear: applications can use whatever representation of a canonical
IDN they wish, including IDNA but also including any other encoding that
the application and/or protocol supports.

At the CURRENT TIME, delegation entities are seeing IDNA as the dominant
encoding, but this is symptomatic rather than deliberate. For example, at
this particular moment DNS delegations and nameserver entries are limited
to the hostname syntax (letter-digit-hyphen), but that doesn't mean that
alternative encodings cannot be used in the future (I've done a fair bit
of work in this space, and my research shows that it is possible, and that
the problems are political not technological). Similarly, tools like dig
are currently constrained to ASCII output, but there is no reason to
believe that folks are not going to be developing IDN-aware tools that
perform IDNA-to-local conversions for input and output (this would
actually be pretty simple). I mean, there's nothing about dig that
prevents this; if we expect stuff like web browsers and email clients to
perform conversion, then certainly dig and hostname and the like can too.

Cumulatively this means that the use of ASCII-compatible sequences for
domain name ~management is an artificat of the CURRENT generation of
services and tools, and not a restriction of the technology. In all
likelihood, the tools and services will adapt, and then sooner or later
we'll have local encodings of the canonical IDNs being the default view,
rather than IDNA being the default.

3) If FIRS is chosen, and if the directory expands into the edges of the
user-to-user space, the use of LDAP tools to manipulate raw data will
become more common. This means that more and more people will eventually
be exposed to the protocol representation directly (modulo any wholesale
conversion that is needed for their platform).

People will want to work with the IDN they bought and/or see, not an
illegible encoding of that domain name. I mean, we certainly don't expect
that people will *want* to work with IDNA encodings in web pages or email
addresses, and I think that the CRISP user community is going to have the
same kinds of desires for the ~whois service too. With this in mind, the
smart play in terms of adoption is to give them what we already know
they're going to want, which is internationalized views. Ask your
marketing department if they are selling i18n domain names or encoded
representations of i18n domain names, and let that answer apply to all of
the other domain-based usages that might be plugged into the directory in
the long haul.

This is true from the user side too. When I get my first IDN spam, I'm
going to want to find details about the "exämple.com" domain, not
"xn--exmple-cua.com".

This is also where points 1 and 2 come together. The default UTF-8
encoding is mandatory in all cases and is therefore guaranteed to be
supported, while the tools which are currenlty used to manage domain names
suffer from version-specific restrictions and are likely to be eventually
upgraded so that they provide i18n representations by default. In other
words, the tools are likely to catch up to the user's desires and the
capabilities of the technology, rather than being a permanent restriction
on the entire system.

There is another minor point here, which is that the official IAB policy
calls for UTF-8 as the preferred encoding. Since we are implicitly hoping
for reuse of common data, this is a strong argument in favor of using
UTF-8 for (most) directory data regardless of its underlying definition.

So given all of that, using the UTF-8 representation makes the most sense
both in the short-term and the long-term.

On the other side of the coin is the operator convenience issue. Working
with a representation of the domain names which is different from the
representation used by the current generation of tools and services is
admittedly inconvenient. I don't think this is compelling enough to
justify forcing all users and layered applications into working with an
illegible encoding format by default. The tools will change eventually.
Furthermore, as was already stated, all names have to be validated anyway
and doing the conversion as part of the validation process (or more
likely, as part of populating the database) is not an egregious expense.

> Second, why are the 8-bit versions escaped?  Is there an issue with 
> supporting UTF-8 or Unicode?  Just curious.

DNS uses raw octet values, but there are no "characters" in the UCS
repertoire for referencing octet values. Instead, the octet values would
refer to other characters which do not represent the values themselves.

For example, a valid DNS domain name (not a hostname) can contain the
octet value of 0xC4, and this value must be preserved across all instances
of that domain name. In UCS, however, the character code C4 refers to
uppercase "A" with diaeresis. In many instances, the UCS character would
get normalized to lowercase "a" with diaeresis, which has the character
code value of E4. If this value is mapped back to DNS, the original domain
name would be destroyed.

The escape syntax is provided in order to support octet values in domain
names while preventing them from being interpreted as characters (so that
they aren't normalized, for example).

In theory, this could be avoided by limiting the domain name syntax to
hostnames, except for a couple of non-trivial issues. First and foremost,
there is not a strict definition of what constitutes a valid IDN, so all
values have to be allowed anyway. But if all values are allowed then
there's nothing to stop the users from entering octet values either, so an
escape mechanism of some kind is mandatory in any event. Secondarily, if
we want this service to be useful for domain names in the general case
(rather than being limited to the narrow purpose of "delegation
management") then we need to design for legitimate domain names rather
than attempt to enforce an arbitrary restriction tp the hostname subset.

> Third, point (a) in that section states how the escaped values must be 
> stored.  I don't know if this is valid or not.  It could be right, but 
> I'm a little concerned that this is making an assumption about how 
> information is stored in a registry/registrar that just doesn't exist. 
> As in, it MUST be this version of the ASCII escaped UTF-8, when what is 
> stored might actually be the ASCII version and the Unicode version 
> (non-escaped).

As to the first part, domain registrations are limited to the hostname
subset by definition (the owner and data domain names of an NS RR have to
fit in the hosts.txt database). As such, these rules are really provided
for usages beyond "delegation management".

EG, the inetDnsRR specs implicitly allow the contents of a zone to be
stored in the directory (this isn't the stated goal, but a possibility
here is out-of-band zone replication, benefiting from ACLs and the other
buttons and knobs), and that some of the domain names in a zone are not
likely to conform to the hostname rules and will need to be escaped.
Whether or not a registry or registrar chooses to offer an ancillary
service will probably determine the extent to which they need to be
concerned with this kind of stuff. In the usual case, it won't be
necessary since input masking filters should keep the extended syntaxes
out of the delegation-specific data.

I'm not sure I understand the second part. If you're concerned that I'm
dictating underlying database formats, that's not the goal, and I can be
clearer in the text if you want. The real need here is for the protocol's
view of the database to be consistent. However that happens is an
operational concern.

> #2 - The implementation of 3.1.8.
> 
> There was mention of 3.1.8 in the meeting by Peter, but I do not see it 
> addressed in the drafts.  Looking through the jabber logs, Peter 
> proposed various methods to do this.  These were defined as: using the 
> bind operation, special policy entries, and server-side extended operations.

Peter and I talked about this some in a private exchange. My suggestion
was to leave it as described in section 5.3.3 of firs-core-02 (with
"unwillingToPerform" as the generic response) for a later point where we
could talk about it in sufficient detail. Since we can add "explicit"
restrictions at a later point without breaking the implicit default
restrictions, there's no immediate rush.

However, if folks want to go ahead and start working on this, we certainly
can. I'd like to get a better analysis on the objective first though. Are
we wanting to provide something like a counter ("50 out of 50 possible
queries already performed, try back tomorrow"), a simple tooManyQueries
response, or an array of different responses?

Note that there are also some possible synchronization issues here, such
as the timezone the server is using.

> Because LDAP supports a generic query syntax and because service 
> providers are likely to deny all queries not explicitly allowed to 
> prevent data mining, this requirement seems more important for the FIRS 
> proposal.

Yeah, I agree.

> #3 - The implementation of 3.2.8.
> 
> Looking at the jabber logs, there is also discussion of this item with 
> theorizing about how it might be done but no mention of it in the 
> drafts.  Nor can I find it.  I think the conversation in the jabber logs 
> is correct with regard to this only needing to be done on the LDAP 
> attribute level and not the LDAP value level, which Peter seemed to 
> think would be harder to do.  However, either the control or the 
> reserved values (or whatever) do need to be specified.

Section 5.3.4 of firs-core-02 tries to lay this off onto the LDAP specs:

 |  Clients MUST NOT equate the absence of any attributes with the
 |  absence of data, and SHOULD assume that the user is not authorized
 |  to view any data which has not been provided.
 |
 |  If a client specifically requests an entry or an attribute which
 |  the server is unable or unwilling to provide due to policy
 |  constraints, the server MUST use the appropriate LDAPv3 error
 |  message. For example, if the user is unable to view an entry or a
 |  requested attribute because it has not yet provided sufficient
 |  authentication credentials, the server MUST return the
 |  "invalidCredentials" error. Similarly, if the client has request
 |  an entry or attribute which the server is unwilling to provide due
 |  to policy reasons, the server MUST return the unwillingToPerform
 |  error to the client.

Those examples don't enumerate all of the possible reasons on purpose,
although I could do so. See the response to your next question below.

> #4 - Enumeration of error codes.
> 
> Given the recent discussion on error codes and the mention of these in 
> the jabber logs, I think it is necessary to fully enumerate them so that 
> we can better understand what needs further clarification.  Some of the 
> error codes would naturally map to existing LDAP error codes, but as we 
> discussed with the bags, some will not.  This will help with listing 
> which new codes needed to be defined.

I recognize and appreciate the desire to be explicit. However there is
another equally valid consideration, which is over-specifying to the point
where changes to LDAP are not accomodated, or cannot be accomodated,
resulting in FIRS becoming version-locked. I'd really like to just point
people to the LDAP specs for the proper response, and only give examples
where ambiguities exist or where illustration is needed. I'll accomodate
the consensus of course, but let's not go overboard.

I don't mind detailing new codes where necessary either.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/



More information about the Ietf-not43 mailing list