[Ietf-not43] Domain Names, IDNs and CRISP

Leslie Daigle leslie at thinkingcat.com
Thu Sep 4 11:42:53 EDT 2003


Feeling I wasn't being clear in my points on domain names
and IDNs last week, I paused, reread the IDN-related RFCs,
and am attempting to write a short letter.

As I said last week, I do not believe it is appropriate to conflate
the handling of domain names and IDNs.

Syntax:

As Eric has noted, valid domain (not host) names, even pre-IDN-era,
can include binary characters (bytes with the 8th bit set).
It is inappropriate to apply the IDN algorithms to those names --
not because they may contain bytes that are invalid in IDNs, but
rather because they may not.  The IDN algorithms include normalization
and mapping rules that will *change* the byte values.  If the
original domain label was not ever intended as an IDN, it is
quite possible that two different domain names would yield the
same IDN representation in UTF-8.  For example, if they differed
in only one byte that happened to map to nothing in the NAMEPREP
process, the distinction would be lost.  

Furthermore, although the IDN spec says that nothing other than
an IDNA encoded IDN SHOULD be registered with the "xn--" prefix,
it does not say (and cannot enforce) that they MUST not.  This
means that there could be valid domain names starting with "xn--"
that do not "un-IDNA" successfully.

In short -- (some representations of) IDNs share the same
space of bytes as domain names, but they are still not the same
thing. 


Therefore:  for the purposes of a protocol that is capable
of handling queries for *all* valid domain names, it MUST be 
possible to express domain names in a fashion that will be
unmolested by the IDN processing algorithms.


Semantics:

I believe a "match" for an IDN has different semantics than
for a domain name.  Whether it is "just" applying the
IDN normalization and mapping rules, or allowing 
other approximative matches based on locale knowledge, IDNs
match differently than byte-by-byte matching of unadulterated
input of domains.


Overhead:

As Andy has already noted, allowing a distinction of the expression
of IDNs (queries, results) from domain names does NOT necessarily
imply doubling the data store.  It is a mechanism for the protocol
to express intention.



Therefore, I reiterate that I believe it is important that the protocol
have separate queries and response elements for domain name based 
queries and IDN ones.  To do otherwise is not, in my opinion, preparing
for the future,  or allowing flexibility, so much as it is
creating a system that doesn't handle its full range of inputs properly,
or provide the users the ability to properly express what they mean.


Leslie.


-- 

-------------------------------------------------------------------
"Reality:
      Yours to discover."
                                 -- ThinkingCat 

Leslie Daigle
leslie at thinkingcat.com
-------------------------------------------------------------------





More information about the Ietf-not43 mailing list