[Ietf-not43] issues and questions on FIRS
Leslie Daigle
leslie at thinkingcat.com
Sun Aug 17 23:56:32 EDT 2003
Eric,
You and I have already disagreed about this off-list, but:
Eric A. Hall wrote:
> on 8/16/2003 11:17 AM Andrew Newton wrote:
>
>
>>#1 - Naming Syntax of inetDnsDomainSyntax (Section 3 in
>>draft-ietf-crisp-firs-dns).
>>
>>First, I do not understand why the domain name is being specified as
>>UTF-8. I understand that this is being done in consideration of IDN's,
>>but this is a service about the actually registered domain names. While
>>the UTF-8 equivalences are important, the key for the domain names (the
>>rdn in this case) should be the wire equivalence since this about the
>>registration of these domain names. In other words, I think the key
>>should be the 7-bit ASCII versions that show up in zone files and via
>>dig, etc... and not the 8-bit versions.
>
>
> This is a complex issue but the simple answer is that the UTF-8 form will
> be the preferred form eventually, and it will be simpler in the long run
> to do UTF-8 now rather than try to accomodate two systems later. Now for
> the complex answer:
I disagree that it is appropriate to be architecting the CRISP
system such that UTF-8 is sent in a query that is supposed to be
about domain names.
Today, domain names are still defined as ascii, and the
requirements talk about the ability to lookup domain names, not
"internationalized domain names".
For the very reasons you cite below, the client has to do the
normalization, and for DNS that means drilling down to
the ascii-encoding.
If this working group believes it is important to be able to
do queries on the internationlized form of domain names, then
that should be added to the requirements documents.
Otherwise, IMO, we wind up in a world of hurt.
Leslie.
>
> 1) I had to choose one. It is likely that clients will exist that only
> accept seven-bit ASCII for input/output, but it is also likely that there
> will be clients which will use UTF-8, and it is also likely that some
> clients will require some other local encoding (eg, UCS-2 on Win32).
> However, protocol efficiency requires that a single canonical form be used
> for the message data. The "majority" is going to have to do some kind of
> conversion no matter what. None of them are significantly cheaper at this
> exact moment in time.
>
> All systems are required to do validation anyway. The cost of local
> conversion is likely to be relatively minor if it done during the
> validation process, so there's no significant incentive for any format
> here either.
>
> The wire format for LDAP protocol is UTF-8. Since all LDAP libraries
> already have to support UTF-8 conversion in order to render LDAP data for
> local input and output (EG, a company name, or a street address), this
> means that reusing the existing UTF-8 conversion logic is going to be the
> cheapest for everybody, since they code would have to exist already.
>
> Furthermore, non-trivial comparisons are easier using UTF-8 than ACE.
> Sub-string searches, soundex searches, etc., any of which may already be
> implemented in LDAP servers for UTF-8, can be reused pretty easily (or
> extended partially to allow for things like dot-separators between
> labels). With something like IDNA, however, the servers would need
> separate searches for all of these, with the first step being to produce a
> normalized (UCS) instance. Using UTF-8 as the default makes this stuff
> much much easier, and this is a very compelling argument.
>
> 2) i18n domain names have a canonical representation as UCS characters.
> The IDNA (ASCII-compatible) representation is just one of many possible
> UCS-to-<target> encodings for IDNs. The RFCs don't hammer this point
> strong enough for my tastes, but the references to different "slots" makes
> this clear: applications can use whatever representation of a canonical
> IDN they wish, including IDNA but also including any other encoding that
> the application and/or protocol supports.
>
> At the CURRENT TIME, delegation entities are seeing IDNA as the dominant
> encoding, but this is symptomatic rather than deliberate. For example, at
> this particular moment DNS delegations and nameserver entries are limited
> to the hostname syntax (letter-digit-hyphen), but that doesn't mean that
> alternative encodings cannot be used in the future (I've done a fair bit
> of work in this space, and my research shows that it is possible, and that
> the problems are political not technological). Similarly, tools like dig
> are currently constrained to ASCII output, but there is no reason to
> believe that folks are not going to be developing IDN-aware tools that
> perform IDNA-to-local conversions for input and output (this would
> actually be pretty simple). I mean, there's nothing about dig that
> prevents this; if we expect stuff like web browsers and email clients to
> perform conversion, then certainly dig and hostname and the like can too.
>
> Cumulatively this means that the use of ASCII-compatible sequences for
> domain name ~management is an artificat of the CURRENT generation of
> services and tools, and not a restriction of the technology. In all
> likelihood, the tools and services will adapt, and then sooner or later
> we'll have local encodings of the canonical IDNs being the default view,
> rather than IDNA being the default.
>
> 3) If FIRS is chosen, and if the directory expands into the edges of the
> user-to-user space, the use of LDAP tools to manipulate raw data will
> become more common. This means that more and more people will eventually
> be exposed to the protocol representation directly (modulo any wholesale
> conversion that is needed for their platform).
>
> People will want to work with the IDN they bought and/or see, not an
> illegible encoding of that domain name. I mean, we certainly don't expect
> that people will *want* to work with IDNA encodings in web pages or email
> addresses, and I think that the CRISP user community is going to have the
> same kinds of desires for the ~whois service too. With this in mind, the
> smart play in terms of adoption is to give them what we already know
> they're going to want, which is internationalized views. Ask your
> marketing department if they are selling i18n domain names or encoded
> representations of i18n domain names, and let that answer apply to all of
> the other domain-based usages that might be plugged into the directory in
> the long haul.
>
> This is true from the user side too. When I get my first IDN spam, I'm
> going to want to find details about the "exämple.com" domain, not
> "xn--exmple-cua.com".
>
> This is also where points 1 and 2 come together. The default UTF-8
> encoding is mandatory in all cases and is therefore guaranteed to be
> supported, while the tools which are currenlty used to manage domain names
> suffer from version-specific restrictions and are likely to be eventually
> upgraded so that they provide i18n representations by default. In other
> words, the tools are likely to catch up to the user's desires and the
> capabilities of the technology, rather than being a permanent restriction
> on the entire system.
>
> There is another minor point here, which is that the official IAB policy
> calls for UTF-8 as the preferred encoding. Since we are implicitly hoping
> for reuse of common data, this is a strong argument in favor of using
> UTF-8 for (most) directory data regardless of its underlying definition.
>
> So given all of that, using the UTF-8 representation makes the most sense
> both in the short-term and the long-term.
>
> On the other side of the coin is the operator convenience issue. Working
> with a representation of the domain names which is different from the
> representation used by the current generation of tools and services is
> admittedly inconvenient. I don't think this is compelling enough to
> justify forcing all users and layered applications into working with an
> illegible encoding format by default. The tools will change eventually.
> Furthermore, as was already stated, all names have to be validated anyway
> and doing the conversion as part of the validation process (or more
> likely, as part of populating the database) is not an egregious expense.
>
>
>>Second, why are the 8-bit versions escaped? Is there an issue with
>>supporting UTF-8 or Unicode? Just curious.
>
>
> DNS uses raw octet values, but there are no "characters" in the UCS
> repertoire for referencing octet values. Instead, the octet values would
> refer to other characters which do not represent the values themselves.
>
> For example, a valid DNS domain name (not a hostname) can contain the
> octet value of 0xC4, and this value must be preserved across all instances
> of that domain name. In UCS, however, the character code C4 refers to
> uppercase "A" with diaeresis. In many instances, the UCS character would
> get normalized to lowercase "a" with diaeresis, which has the character
> code value of E4. If this value is mapped back to DNS, the original domain
> name would be destroyed.
>
> The escape syntax is provided in order to support octet values in domain
> names while preventing them from being interpreted as characters (so that
> they aren't normalized, for example).
>
> In theory, this could be avoided by limiting the domain name syntax to
> hostnames, except for a couple of non-trivial issues. First and foremost,
> there is not a strict definition of what constitutes a valid IDN, so all
> values have to be allowed anyway. But if all values are allowed then
> there's nothing to stop the users from entering octet values either, so an
> escape mechanism of some kind is mandatory in any event. Secondarily, if
> we want this service to be useful for domain names in the general case
> (rather than being limited to the narrow purpose of "delegation
> management") then we need to design for legitimate domain names rather
> than attempt to enforce an arbitrary restriction tp the hostname subset.
>
>
>>Third, point (a) in that section states how the escaped values must be
>>stored. I don't know if this is valid or not. It could be right, but
>>I'm a little concerned that this is making an assumption about how
>>information is stored in a registry/registrar that just doesn't exist.
>>As in, it MUST be this version of the ASCII escaped UTF-8, when what is
>>stored might actually be the ASCII version and the Unicode version
>>(non-escaped).
>
>
> As to the first part, domain registrations are limited to the hostname
> subset by definition (the owner and data domain names of an NS RR have to
> fit in the hosts.txt database). As such, these rules are really provided
> for usages beyond "delegation management".
>
> EG, the inetDnsRR specs implicitly allow the contents of a zone to be
> stored in the directory (this isn't the stated goal, but a possibility
> here is out-of-band zone replication, benefiting from ACLs and the other
> buttons and knobs), and that some of the domain names in a zone are not
> likely to conform to the hostname rules and will need to be escaped.
> Whether or not a registry or registrar chooses to offer an ancillary
> service will probably determine the extent to which they need to be
> concerned with this kind of stuff. In the usual case, it won't be
> necessary since input masking filters should keep the extended syntaxes
> out of the delegation-specific data.
>
> I'm not sure I understand the second part. If you're concerned that I'm
> dictating underlying database formats, that's not the goal, and I can be
> clearer in the text if you want. The real need here is for the protocol's
> view of the database to be consistent. However that happens is an
> operational concern.
>
>
>>#2 - The implementation of 3.1.8.
>>
>>There was mention of 3.1.8 in the meeting by Peter, but I do not see it
>>addressed in the drafts. Looking through the jabber logs, Peter
>>proposed various methods to do this. These were defined as: using the
>>bind operation, special policy entries, and server-side extended operations.
>
>
> Peter and I talked about this some in a private exchange. My suggestion
> was to leave it as described in section 5.3.3 of firs-core-02 (with
> "unwillingToPerform" as the generic response) for a later point where we
> could talk about it in sufficient detail. Since we can add "explicit"
> restrictions at a later point without breaking the implicit default
> restrictions, there's no immediate rush.
>
> However, if folks want to go ahead and start working on this, we certainly
> can. I'd like to get a better analysis on the objective first though. Are
> we wanting to provide something like a counter ("50 out of 50 possible
> queries already performed, try back tomorrow"), a simple tooManyQueries
> response, or an array of different responses?
>
> Note that there are also some possible synchronization issues here, such
> as the timezone the server is using.
>
>
>>Because LDAP supports a generic query syntax and because service
>>providers are likely to deny all queries not explicitly allowed to
>>prevent data mining, this requirement seems more important for the FIRS
>>proposal.
>
>
> Yeah, I agree.
>
>
>>#3 - The implementation of 3.2.8.
>>
>>Looking at the jabber logs, there is also discussion of this item with
>>theorizing about how it might be done but no mention of it in the
>>drafts. Nor can I find it. I think the conversation in the jabber logs
>>is correct with regard to this only needing to be done on the LDAP
>>attribute level and not the LDAP value level, which Peter seemed to
>>think would be harder to do. However, either the control or the
>>reserved values (or whatever) do need to be specified.
>
>
> Section 5.3.4 of firs-core-02 tries to lay this off onto the LDAP specs:
>
> | Clients MUST NOT equate the absence of any attributes with the
> | absence of data, and SHOULD assume that the user is not authorized
> | to view any data which has not been provided.
> |
> | If a client specifically requests an entry or an attribute which
> | the server is unable or unwilling to provide due to policy
> | constraints, the server MUST use the appropriate LDAPv3 error
> | message. For example, if the user is unable to view an entry or a
> | requested attribute because it has not yet provided sufficient
> | authentication credentials, the server MUST return the
> | "invalidCredentials" error. Similarly, if the client has request
> | an entry or attribute which the server is unwilling to provide due
> | to policy reasons, the server MUST return the unwillingToPerform
> | error to the client.
>
> Those examples don't enumerate all of the possible reasons on purpose,
> although I could do so. See the response to your next question below.
>
>
>>#4 - Enumeration of error codes.
>>
>>Given the recent discussion on error codes and the mention of these in
>>the jabber logs, I think it is necessary to fully enumerate them so that
>>we can better understand what needs further clarification. Some of the
>>error codes would naturally map to existing LDAP error codes, but as we
>>discussed with the bags, some will not. This will help with listing
>>which new codes needed to be defined.
>
>
> I recognize and appreciate the desire to be explicit. However there is
> another equally valid consideration, which is over-specifying to the point
> where changes to LDAP are not accomodated, or cannot be accomodated,
> resulting in FIRS becoming version-locked. I'd really like to just point
> people to the LDAP specs for the proper response, and only give examples
> where ambiguities exist or where illustration is needed. I'll accomodate
> the consensus of course, but let's not go overboard.
>
> I don't mind detailing new codes where necessary either.
>
More information about the Ietf-not43
mailing list