OpenURLs, Citations, and Two-Level SRV-Record-Based Resolution

Richard L. Goerwitz III

This article retraces (from the standpoint of academic researchers) the fundamental issues behind the development of open linking strategies, particularly OpenURLs, and shows that, despite many advances afforded by the emerging OpenURL specification, OpenURLs lack not only the overall robustness but also the portability necessary for use in a broad academic context. Left unremedied, the portability problem alone would be enough to doom OpenURLs to oblivion. By establishing a two-level SRV-record-based OpenURL resolution system, however, this problem can be overcome - and a pathway can be opened up towards more robust support of real-world academic usage scenarios.


Table of Contents



Citation and URLs

As scholarly resources have made their way over the last thirty years into electronic form, particularly onto the web, scholars have found themselves dealing, increasingly, with the problem of citing these resources. Why do electronic resources pose a problem for citation? Because a citation refers to a resource outside the document it occurs in and, ideally, incorporates enough metadata to allow the reader to consult a local or otherwise easily accessible copy of the resource being cited. A typical citation might, for example, specify an author, title, publisher, and date; or an author, article title, serial title, year, volume, and page number. These metadata are easily scribbled down on a notecard, taken to one's local card catalog or interlibrary loan office, and then used to obtain a copy of the cited document - which may then be read locally.

One might think that electronic resource citations would be even easier than traditional print ones to resolve locally (just a matter of pointing and clicking, that is). In reality, though, this is not always the case. Why? Because the resources referred to in electronic resource citations are usually accessed via URLs. Although it is possible, often mandated, to add extra metadata to electronic resource citations (see, e.g., the APA guide), there is currently no standard way to incorporate these metadata into a universal, clickable citation, i.e., one that has a consistent basic form and semantics, and that doesn't have to be massaged to fit either the particular web server housing the resource or the environment in which it is read. As a result, the only practical way to construct a clickable citation is to use a URL - and to pray that the URL stays live for more than the measly average of 44 days.

So why is it that URLs typically can't be counted on to stay live? That is, why are URLs so fickle?

URLs, Opacity, and Ephemerality

The fundamental reason for URLs' fickleness is that they are 1) opaque and 2) ephemeral. By opaque I mean that the core information carrying portion of a URL (that is, the part after the hostname and optional port) has no meaning outside the context of a single Internet host - which is free to interpret that portion of the URL largely as it pleases. By ephemeral I mean that the URL may or may not continue to resolve as such, or retain its previous semantics, depending on the availability and current configuration of the host it points to. Internet hosts, after all, come and go. They get upgraded and reorganized. Their names or domains change. Network connections to them also fail. They are, by nature, emphemeral, and so, therefore, are the URLs that use them.

Because of their ephemerality, and opacity, URLs lack the kind of generality and persistence necessary for use as a medium of scholarly citation.

This raises an obvious question: Are there alternatives?

URNs

Since the dark ages of the World-Wide Web (the early 1990s), there has existed at least a rhetorical distinction between universal resource locators (URLs) and universal resource names (URNs). The thought was that URNs would provide permanent names, i.e., "persistent, location-independent, resource identifiers" (see, e.g., RFC 2141), for objects that would end up being actually accessed via URLs.

Unfortunately, few Internet denizens have even heard of URNs, still less seen them at work, because the distributed resolution mechanisms they require did not begin to crystallize until relatively recently, with the introduction of naming authority pointer (NAPTR) records. NAPTR records form part of what is perhaps the most successful distributed database ever constructed, the domain name system (DNS). To make a long story short, NAPTR records leverage DNS's delegation, replication, and fail-over mechanisms to offer a reasonably robust, efficient means of determining, for any given URN, what Internet host to talk to when it comes time to convert that URN to a URL (i.e., to something that can be resolved and actually fetched). See RFCs 2168 and 2915 for more information on NAPTR records.

At least five years will have passed before NAPTR records (or their successors) become widely maintained and stock web browsers take full advantage of them. As of the fall of 2001 the IETF has not yet even set up a urn.arpa zone - the starting point for fetching the NAPTR records necessary for resolving all URNs. And even in cases where URN resolution is handled by a proxy autoconfiguration (PAC) file and/or a proxy server that knows how to use NAPTR records (which requires not only special-purpose servers, but also a lot of user support), there is still the "bookmark" problem; i.e., the problem that when the user goes to bookmark a resource, it is the destination URL that will be recorded, and not the original, un-dereferenced URN.

URNs, therefore, although theoretically more serviceable than URLs as a medium of citation, are of limited practical use.

URNs as URLs

Because URNs were, until recently, more of a theoretical goal than a real emerging standard - and still remain unsupported by stock web browsers - a number of efforts have been made to retrofit URLs for URN-like persistence, distributed resolution, and fail-over capabilities. (The goal being to find a way to stick with URLs, which can be used by stock web browsers.)

One such effort, undertaken by the Association of American Publishers (AAP) (joined later by the International Publishers Association and the International Scientific, Technical and Medical Publishers Association), resulted in the development of the digital object identifier (DOI) specification. DOIs are two-part identifiers consisting of 1) a Publisher ID registered with the International DOI Foundation, and 2) an Item ID assigned internally the registrant denoted by (1). These two components together constitute a globally unique, persistent identifier that a central DOI resolver (or its delegate) can map to a non-persistent URL, which can, in turn, be resolved and actually fetched by, e.g., a stock web browser.

Here are a few sample DOI -> URL mappings (with the DOI server's base URL prepended). For those who have never seen or used a DOI before, I would suggest simply clicking on the links below and watching where the browser ends up going:

Base URL + DOIResolves To (as of 13 Oct 2001)
http://dx.doi.org/10.1000/7-> http://www.doi.org/about_the_doi.html
http://dx.doi.org/10.1000/182-> http://www.doi.org/handbook_2000/index.html
http://dx.doi.org/10.1007/s102110100050-> http://link.springer.de/link/service/journals/10211/contents/01/00050/
http://dx.doi.org/10.1045/june98-goerwitz-> http://www.dlib.org/dlib/june98/stg/06goerwitz.html

To enable publishers to manage how their own DOIs resolve, DOIs are typically processed via the Handle System - an efficient, extensible, persistent, global name-registration and resolution specification developed by the Corporation for National Research Initiatives (CNRI). The Handle System defines, among other things, a method by which central handle server(s) can offload responsibility for resolving particular handles onto servers controlled by individual publishers or their delegates. In this case, central DOI servers that implement the Handle System are configured to hand users off to individual publishers' DOI servers, which can then do the job of getting the users to the right place.

Other DOI-like frameworks include OCLC's Persistent URL (PURL) specification. The overall PURL framework offers persistence and delegated, hierarchical resolution. Sporadic work has also been done to integrate PURLs with the Handle System and with the URN framework.

Because both DOIs and PURLs are URL based, and URLs must point at a specific host, DOIs and PURLs must therefore also point at a specific host - and thus, by implication, have a single point of failure (i.e., the root DOI or PURL resolver). This is not as big a drawback as might at first appear. Root resolvers can be clustered and/or tied to a special content-delivery network that makes them accessible at various key points around the Net. Although it might be argued that dependence on a root resolver renders DOIs PURLs, in a sense, location-dependent, this situation will likely change as URNs and DNS NAPTR records begin to break into common use (DOI/handles are basically ready for URN integration right now).

One major problem with DOIs is that it is not always obvious, when looking at a particular resource on a publisher's website, what its corresponding DOI should be. To help alleviate this difficulty many publishers provide utility forms that convert URLs to DOIs (see, for example, the one provided by IDEAL). Many also provide DOI links on their web pages - at least for recent publications. For older articles, etc., it's usually possible to figure out what the DOI should be from the format of DOIs used in more recent material. And mechanisms are currently being tested to convert metadata into DOIs (e.g., the CrossRef metadata database [MDDB]). Even if the DOI for a resource can be determined, however, there is still the "bookmark" problem, i.e., the problem that browsers (in order to use the DOI) must dereference (URL-ize) it. Bookmarks will refer to the dereferenced form of the DOI.

Probably the biggest disadvantage to DOIs and PURLs is that they tacitly assume a 1:1 mapping from resource names to the resources themselves. The reason for this assumption lies in the original goals of the projects that produced them. PURLs were designed simply to increase the probability that a reference URL would resolve correctly, thereby reducing the burden of maintaining it. DOIs were designed for similar purposes, although with a focus on the literature provided by publisher-members of the DOI Foundation itself. In both cases the fundamental motivation was link-maintenance. Neither PURLs nor DOIs had as an original design goal to support general scholarly citations.

Scholarly citations are admittedly difficult to support because they are typically nondeterminate; that is, citations may resolve to zero, one, or more local resources (in traditional print terms, "copies", "editions", "printings", etc.). With e-resources, the potential for multiple resolution looms even larger.

Although attempts have been made to extend DOIs, for example, to include traditional citation metadata (see, e.g., the demo doi-eb site), and there has been much discussion of how to integrate metadata generally into the DOI framework, the fact remains that the DOI infrastructure was designed originally as a determininistic, permanent link-maintenance strategy. Although the DOI specification itself does not exclude the possibility of multiple resolutions, in practice the DOI resolver currently does. Time alone will tell whether DOIs can, or should, be reworked to serve a general medium of citation (and whether the necessary metadata extensions will end up in the DOI-URLs or in associated databases, as with the CrossRef MDDB). The likelihood is that DOIs will become an important piece of the puzzle - but that in working libraries they will end up being leveraged and/or subsumed by a broader standard such as the OpenURL. For more information on this topic, see, e.g., the International DOI Foundations's Technical Note on DOI-OpenURL integration.

OpenURLs

Yet another method of retrofitting URLs to perform URN-like functions - one that does not rely on any tacit 1:1 name -> resource mapping - is the OpenURL. The OpenURL specification provides, among other things, a set of guidelines for injecting traditional citation metadata like author, title, and date into the query-string portion of URLs. OpenURLs are not so much persistent object identifiers as they are electronic equivalents of traditional citations, which makes their correspondence to local library holdings potentially a bit fuzzier and more flexibly defined. An OpenURL designating an article in an e-journal may, e.g., resolve at once to several aggregators' sites (who all carry the journal in question), to corresponding print holdings in a local OPAC, and to one or more online bookstores (where the relevant journal back-issue may be purchased). OpenURLs have multiple resolution support built in as part of their basic design parameters, as also support for context-dependent behavior (e.g., the ability to vary behavior depending on the source and destination context, i.e., the resource and/or vendor furnishing a given OpenURL and the environment in which the user resolves it).

OpenURLs consist of a base URL (e.g., http://sfx.anywhere.edu/library), an origin description somewhat akin to a DOI (but having a fixed value both for the publisher and the resource), a global identifier (usually a DOI), object-metadata, and a local identifier.

OpenURLs have gathered considerable momentum in the marketplace especially over the last year. Ex Libris, the library automation vendor that is leading the OpenURL charge, provides a public web form that anyone can use to convert traditional citation metadata into an OpenURL (to see some a sample of its output, follow this link; here is the original article). As a side benefit, this form also provides a nice, interactive facility for learning the names of the various OpenURL components and how they function. Those unfamiliar with OpenURLs are encouraged to take a look.

The OpenURL specification was fast-tracked by the National Information Standards Organization (NISO) in March, 2001, and, in augmented and revised form, should become a NISO standard some time during the year 2002.

OpenURLs, Ephemerality, and Opacity

Although OpenURLs solve the general problem of translating metadata into URL components, they do not overcome the basic problem of ephemerality. As with all linking strategies that leverage URLs, OpenURLs must point at an internet host, which, for various reasons, may or may not be available. Instead of pointing at a unique, top-level resolver host (as with, e.g., DOIs and PURLs), though, OpenURLs are typically coded so as to point at a local resolver host maintained by the local library (on the personal link page concept, see also below). Although coding OpenURLs to point at a local resolver offers greater local control over the resolution process, it also necessitates that every institution maintain a resolver and that publishers and authors who utilize OpenURLs tailor them to the institution of the user who is reading their material.

Either way, OpenURLs are as failure-prone as any other URL-based system. Worse still, they overload URLs with so much information that their sheer length can exceed the 255-character recommended URL path limit (RFC 2616). Worst of all OpenURLs' host component is specific to one resolver - making general use of them in academic citations therefore, at least for current OpenURL infrastructures, problematic.

Also, as with DOIs and PURLs, there is no easy way to know, given a set of citation metadata, how exactly to format an OpenURL that will resolve to the desired resource. The problems with OpenURLs, however, are more complex than with DOIs and PURLs. Why? Because, despite (or perhaps because of) the "openness" of the OpenURL spec, publishers, if they provide OpenURLs at all, often populate the object-metadata section of their OpenURLs only partially. And worse yet, many publishers require additional out-of-band queries to their servers to flesh the metadata out (in an ironic twist, this sometimes requires queries to CrossRef - which must be purchased, if it isn't available already, adding yet another task and expense to the whole process).

Often it is possible to fall back to a DOI embedded in the global-identifier section when attempting to resolve an OpenURL. But the OpenURL specification does not guarantee the DOI's presence. And even if vendors succumb to pressure to regularize and fully populate their URLs with this sort of information, libraries must still face the more general problem of maintaining the complex, unwieldy tables that map the source OpenURLs emanating from the vendors' sites to a set of locally available targets. (It is not currently known just how much work maintaining these tables will require over the long haul, but it seems safe to say that the amount will be greater than the already considerable effort currently required to maintain online A-Z journal listings and populate proxy forwarding tables for remote patron access.)

Of course if the local resolver's translation tables are out of date, or if the server itself goes offline, then patrons are out of luck. There is no formal fail-over mechanism the same way there is for, e.g., DNS. Link translation servers must therefore be maintained as extreme high-availability items - at least in institutions making systematic use of OpenURLs. They become yet another link in a fragile chain of services needed in order to support OpenURLs.

To sum the situation up: Although environments can be constructed in which OpenURLs appear to work extremely well, those environments, at present, are intra-institutional, and involve a lot of (often expensive and time-consuming) integration work with vendors, with local tables and systems, and with proxy servers as well. They also run up against inherent browser limitations (e.g., URL-path lengths) and basically fall subject to all the ills of URLs. If any part of this fragile infrastructure needed to support OpenURLs fails, patrons will find themselves potentially unable to resolve a given OpenURL.

OpenURLs and Institution Specificity

Although the overall fragility of the OpenURL infrastructure leaves it open to a wide range of criticisms, the factor really blocking the success of OpenURLs as a scholarly linking strategy is that they rely on a local or "preferred" OpenURL resolver. One major goal of OpenURLs has been to facilitate local control over the resolution process. And so there is currently no central system for doing DOI-like centralized resolution (although a few test services exist). Because current OpenURL infrastructures assume a local or otherwise "preferred" resolvers, OpenURLs differ from institution to institution (or personal to person), making them difficult to use in cross-institutional collaboration or scholarly articles.

Although one might be tempted to view OpenURLs, disparagingly, as just another specious attempt at stretching URLs to discharge functions more properly reserved for URNs, the fact is that OpenURLs still have one advantage that URNs do not: They work, more or less, with current web browsers. They also define facilities that go beyond those of a classic reference link, allowing extended services to be folded in, such as enhanced resolution options leading, e.g., to abstracts as well as full text, or to various relevant repositories and book or journal vendors. This all occurs under local (or user) direction. Publishers have no direct say in what options are presented to users and how.

This last point (independence from publishers) can hardly be overestimated. Contrast the situation here, for example, with the publisher-dependent situation we see with DOIs. DOIs were designed to point to specific resources on specific vendors' systems, and to ride atop a handle-based delegation system from which libraries are excluded. The International DOI Foundation has recently made moves to help remove these constraints, and may yet achive some success, e.g., through CrossRef and through cookie pushing mechanisms (on which, see my criticisms below). OpenURLs, however, have always lacked these constraints. And they can incorporate DOIs, as well as other persistent identifier systems besides.

Despite their strong points and advantages, however, OpenURLs, unlike DOIs, are difficult to use in a scholarly context for the reason emphasized above: The way the OpenURL infrastructure is currently implemented it is largely institution (or link-page) specific. Despite the huge vistas opened up by they way they incorporate metadata, there is no escaping the tragic fact that a scholar can't just cut and paste an OpenURL he or she finds in a library e-resource page into an article and expect people in other institutions to be able to follow it the way they would, at least theoretically, a DOI. If people can follow an OpenURL in a published work it is because the OpenURL was added by a vendor whose systems have been specially altered to perform this function (normally all you get in this situation is a button; the actual author citations are untouched). There is no denying, then, the fact that today's OpenURL infrastructure is fundamentally antithetical to one of the basic mechanisms of scholarly publication: citation.

The only way that OpenURLs can be generalized, and integrated into their broader academic context, is to move towards an infrastructure that supports institution-independent use in scholarly citations.

Institution Specificity and Two-Level Resolution

Short of full URN integration, the most obvious solution to the problem of institution-specificity in the current OpenURL infrastructure is a centralized resolution system, much like what is already used to resolve DOIs, in which all requests are passed through a central resolution service, then handed off, as needed, to delegates (the central DNS A record is not as much of a problem here as it might seem; see my comments above). In such a system one OpenURL fits all. No institutional tailoring is needed.

Unlike the publisher-centric DOI resolution system, however, the OpenURL resolution system must tie its delegation mechanism to the patron who is attempting to resolve the OpenURL and to his or her institutional affiliations or preferred resolver (rather than to a persistent ID associated with a publisher and/or a specific instantiation of an electronic resource). Without this functionality patrons and individual libraries will lose their ability to intervene directly in the resolution process and customize it to their liking, and we will be right back where we were with CrossRef-less DOIs.

Two-Level Resolution, Client Domains, and SRV Records

So how is the delegation system supposed to determine the user's institutional affiliations and/or preferred resolver? There are actually several ways this could be done. One way is to set up a central personal link page server or thin portal that allows users not only to resolve OpenURLs directly (e.g., via embedded DOIs), but also to specify one or more base-URL strings that allow it to pass OpenURLs on to other resolvers. The problem with such systems is that they are too heavyweight to act as central resolvers. The more verbiage users are confronted with by the central resolver, the slower and more intrusive it will seem. And the more likely people are to prefer something else. Similar criticisms apply to uses of the cookie pusher mechanism, now being tested for both DOIs and OpenURLs, which works by arranging for patrons who download certain pages or click on specific images to receive a third-party cookie containing their local OpenURL resolver's base URL. Unfortunately, newer browsers by default don't permit third-party cookies.

There is also a problem, in general, with systems that utilize HTTP cookies (as all these systems do), in part simply because some people prefer not to use cookies - and just turn them off. More importantly, though, forcing library patrons (most of whom haven't ever even heard of an OpenURL resolver or a base URL) to go to certain pages, click on certain images, or input base URLs and set cookies (and this for every machine they use!) is going to lead to support problems - especially at cluster machines and kiosks. I personally also find it annoying to have to destroy cookies or keep going back to cookie-pusher pages to change or un-do previous settings. And I find it a matter of concern that the NISO OpenURL standards committee, as of the summer and fall of 2001, leans in the direction of a cookie-based resolution system.

In my view, the central OpenURL resolver should be unobtrusive. It should accept a base URL, if the user cares to supply one. But by default the central resolver should be transparent. That is, it should, where possible, auto-discover the OpenURL resolver appropriate for the user, and pass the user on to it. If I take my laptop from my home institution to another place and I want to resolve an OpenURL, the resolution process should take my new location, invisibly, into account. It should not start with a cookie set to point at my "home" institution's OpenURL resolver. If it did that, I'd end up getting referred to resources I couldn't reach from my current IP address. The resolution process should, rather, auto-discover the appropriate local resolver without my having to know or care about cookies or base URLs.

One way to handle auto-discovery would be to do it as part of a simple two-level system that leveraged reverse lookup (RFC 1034, 2317), SRV records (RFC 2782), and HTTP redirects (RFC 2068, 2616) to route clients transparently to a local OpenURL resolver service. This is not quite the same as plain client-DNS-name-based resolver discovery (which has seen considerable discussion in the OpenURL community already). A resolver that uses SRV records affords finer/more flexible control than a pure client-DNS-name-based system, allowing rotation, fail-over, and breaking ties to any particular resolver DNS name. Such a system works by starting the patron, who has just clicked on an OpenURL, at a central resolver - which would figure out the DNS name of the patron's machine, then use that DNS name to infer (using DNS SRV records) whether there are any an OpenURL resolvers in his or her DNS domain(s). If so, the system would simply redirect the patron to the highest-priority resolver matching his or her DNS name the most closely for final processing. If no such resolver was found, this system would present a simple greeting page where the user could manually set his or her base URL (which could point either at an institutional resolver or at a personal link page). The full algorithm used by the central resolver would look like this:

  1. Determine the client's IP address
  2. Convert the IP address to a DNS name (via reverse lookup)
  3. If reverse lookup fails, abort with an informative message (e.g., suggest that the user contact his or her local ISP or DNS administrator)
  4. Create a base OpenURL resolver domain by removing the leftmost "hostname" element from the client's DNS name (not including the period); e.g., client.my.university.edu -> .my.university.edu (we can't start with the fully qualified domain name [FQDN] as the domain because the FQDN may use a wildcard DNS record)
  5. If the base OpenURL resolver domain contains one period, fail (e.g., indicate that no OpenURL resolver is defined for the user's current DNS domain; suggest perhaps connecting through an institutional proxy server; offer the user the option of entering an explicit base URL)
  6. Create an SRV record name string of the form _openresolver._tcp + the base OpenURL resolver domain (created in step 4 above; e.g., _openresolver._tcp.my.university.edu)
  7. Use DNS to try to resolve the SRV record name string
  8. If the string does not resolve, then set the client DNS name equal to the OpenURL resolver domain (see step 4), remove the leading period, then go to step 4
  9. If the SRV record name resolves, extract and store the resulting host value(s) in a list (the "resolver name list")
  10. Sort the resolver name list by order and weight (the precise algorithm is specified in RFC 2782)
  11. For each resolver in the resolver name list:
  12. If the "user resolver" is not set, fail (e.g., notify the user that none of the resolvers in the resolver name list were available and suggest some corrective action)
  13. Otherwise, rewrite the requested OpenURL (i.e., the OpenURL the user originally submitted to the server) so that it points at the user's OpenURL resolver instead of the central resolver and redirect the user to this newly rewritten URL
  14. (The user's OpenURL resolver will then work as it always has, throwing up a menu of choices from which the user can select a locally accessible resolution target).

Although it may be possible to infer institutional affiliation and resolver preferences in the future via client certificates, so-called "junk certs," or distributed directory services tied to distributed authentication/directory/object-instantiation services like Microsoft's Passport (+ Hailstorm), the SRV-based system outlined here does a very good interim job. It is very easy to implement. It can accommodate new strategies for tying patrons to resolverse as these emerge. And it supplies all the machinery needed for a basic, functional, two-level, SRV-based, delegated OpenURL resolution system that requires little (or no) user intervention.

Note that although the central resolver should operate with as little mandatory user intervention as possible, it should also provide a way of overriding the user's auto-detected OpenURL resolver - in effect falling back to the cookie-based resolution systems being argued against here. Note also that many users accessing e-resources from outside their institutional LANs will be using a proxy server, which will present an IP address that can be mapped to an institutional OpenURL resolver (hence obviating the need for overriding anything). So there is no need to worry about what will happen if ISPs decide to commandeer the resolution process and put up their own resolvers!

In a similar vein, there is no need for concern over individual departments commandeering the resolution process. The vast, vast majority of institutions will have one main OpenURL resolver. The expense and time is too great for it to make sense to set up more than that, and so it's extremely unlikely that departmental resolvers will pop up everywhere. If any do pop up, they will be simple personal link page services. And those will almost always offer people the option of selecting the main resolver as the default resolver. If they don't, that's a bizarre choice that a department may make. Remember that it's always possible to override that choice with a cookie (or, conversely, to make that choice the default from every location - again with a cookie).

Rather than strain over unlikely situations affecting at most a fraction of one percent of users, it is more valuable to concentrate on setting up a system that will work for the much more common scenario in which patrons use a recent browser that, by default, disables third-party cookies, or one in which patrons use privacy-enhancing software - or one in which they just turn off cookies altogether because they don't like them, or are leary of services like Microsoft's Passport (this last scenario alone applies to one half of one percent of users - more than will ever be affected by, say, a rogue departmental resolver!).

The key point here is that by outfitting OpenURLs with a largely transparent two-level delegated resolution system, and by tying this resolution system to client DNS names and SRV records, OpenURLs can be rendered fit for completely transparent use by scholars whose institutions have set up an OpenURL resolver - and at the same time leave the existing infrastructure of local OpenURL resolvers and personal link pages largely unchanged. All that will be required is the addition of a few new DNS SRV records (a two or three minute job in the hands of a typical skilled network administrator; see the next section for instructions) - and possibly also an alias or redirect link on the local resolver(s) (on which, see below).

An experimental central OpenURL resolver service is currently available at:

http://www.openresolver.net:8888/Default

To use this service, all a user must do is construct an OpenURL (or alter an existing OpenURL) so that it uses the above URL, http://www.openresolver.net:8888/Default as its base, then follow the resulting OpenURL. If the user's institution has set up the required SRV records, and has utilized an /Default path prefix (more on which below), then the central OpenURL resolver service should take the user, via his or her local OpenURL resolver, to an OpenURL resource menu, from which point a target may be selected. If the user's institution has not set up the required SRV records, he or she will be shown 1) a form allowing him or her to enter a base URL manually and 2) instructions on setting up SRV record(s) that can be taken to the local network administrator and acted upon.

Once enough institutions have done the requisite setup, it will be possible to pass http://www.openresolver.net:8888/Default prefixed OpenURLs to colleagues, or use them in citations, and they will generally work as-is, with no institution-specific tailoring.

If a user types in just the URL above, http://www.openresolver.net:8888/, then he or she will be taken to a simple introductory page that offers him or her the option of overriding the default OpenURL resolver detected for his or her domain.

Although the above DNS name, www.openresolver.net, resolves to a single host, I will happily add additional hosts, or work out a more elaborate rotation scheme, if other institutions would like to experiment with central resolver services of their own. It is as yet unclear what new portability issues we will face, and we can only flesh this issue out by subjecting the system to real-world use.

New Portability Issues

One clear portability issue that can be foreseen at this point is the use of the OpenURL local-identifier field. This field is typically used together with the origin-description to hardwire site-specific behaviors. Site-specific behaviors are naturally incompatible with global resolution and site-independence. If the local-identifier field must be used, individual sites are encouraged to use it in a way that does not hinder resolution in other locations.

Another clear barrier to portability is use of site-specific URL path components in base URLs. Paths can't be stored as part of SRV records because SRV records' purpose is to store host names and ports, not full URLs. Administrators of OpenURL resolvers should therefore make their default services available via a known URL path, /Default (i.e., via the base URL, http://<resolver>:<port>/Default), even if this means simply setting up an alias or redirection server. More will be said about this issue below.

Technical Details: SRV Record Examples

To help DNS and network administrators outfit their zone tables with the SRV records needed to make the system outlined here work, here are some sample Berkeley Internet Domain Daemon (BIND) SRV records for the (as yet nonexistent) domain .mycampus.edu which, in this made-up scenario, runs three Ex Libris SFX servers as local OpenURL resolvers. The gist of the setup below is that there are three OpenURL resolvers, two with priority zero and one with priority one. The first priority-zero resolver, i.e., sfx.mycampus.edu, has a higher weight than the second, sfx2.mycampus.edu. The resolution process will favor the first, therefore. Both machines will be favored over the third, sfx3.mycampus.edu, which will only be used if the other servers are down. The point of this example is not to introduce anyone to the innards of DNS or BIND, but rather to illustrate how flexible and powerful DNS SRV records can be, and to offer DNS/network administrators examples to work with. See RFC 2782 for more details on the exact semantics of SRV records:

service.protoTTL classtype priorityweight porttarget
_openresolver._tcp360 INSRV 02 8888sfx.library.mycampus.edu.
_openresolver._tcp360 INSRV 01 8888sfx2.library.mycampus.edu.
_openresolver._tcp360 INSRV 10 80backup.library.mycampus.edu.

The above scenario is more complex than needed for most institutions. Most institutions will, unfortunately, be running just a single OpenURL resolver, with no fail-over. In such a situation, it will be possible to get away with a single SRV record (the TTL value isn't really needed, and so is omitted):

_openresolver._tcp INSRV 00 8888sfx.mycampus.edu.

In some institutions, all SRV records of the form _service._tcp will have been delegated to a Microsoft Active Directory server. So the updates will need to be made there. Alternatively, it is possible to keep _tcp records on the main institutional nameserver but allow dynamic updates from one or more Active Directory servers. Either way, network administrators should make sure to insert the above SRV record at the correct point. Otherwise it will not propagate correctly.

Even with the potential added level of indirection introduced by delegation of _tcp SRV records to a separate server, adding a record like this to an institutions's top-level DNS zone file is a trivial operation - maybe a three-minute job for a skilled network administrator.

More Technical Details

As noted above, the system outlined here requires a known base URL prefix, http://<resolver><port>/Default, which must either be, or act as an alias for, the local resolver's default base URL. This is necessary for two reasons: 1) because there is no easy way to include URL path information in an SRV record, and 2) because the central resolver requires a known valid URL that it can use to test specific resolvers' availability (and route users to alternates where necessary).

To this end, systems administrators for the local resolver(s) should ensure 1) that http://<resolver>:<port>/Default is (or acts as an alias for) their default OpenURL base URL, and 2) that sending an HTTP "GET /Default HTTP/1.0" request to the resolver(s) will trigger an HTTP status 200, 301, 302, or 307 response.

One final technical issue that systems administrators may wish to attend to is placing the resolver, www.openresolver.net behind their institutional proxy, so that the resolver will detect the the domain of the institutional proxy rather than the domain of the patron's ISP, for patrons who are off-site are are using the proxy to connect to campus e-resources.

Benefits to Publishers

Even if the idea of a general two-level SRV-based OpenURL resolver service falls flat, and institutions continue to utilize institution-specific OpenURLs, there is still some benefit to be derived from having set up the SRV records and /Default OpenURL path as outlined above: Publishers (who currently must maintain records tying specific customers to specific IP ranges, and these, in turn, to specific OpenURL prefixes) may now simply auto-detect the correct OpenURL prefix for a given user by applying the same algorithm proposed for the OpenURL resolver above. The DOI proxy could implement a similar system in place of cookie pushing as well.

Summary and Conclusion

In summary, by leveraging DNS SRV records to locate appropriate OpenURL resolvers, we can take advantage of robust, proven caching and fail-over capabilities that DNS provides and, at the same time, lay the groundwork for things like registration as a URN namespace, and for population of a new openurl.urn.arpa DNS zone with NAPTR records pointing to a central OpenURL resolver service. This will provide for a smooth upgrade path when, and if, stock web browsers support URNs and resource discovery via NAPTR records.

Most importantly, however, by instituting a two-level universal OpenURL resolution system and thereby solving the nagging OpenURL citation problem, we can break down the biggest remaining barrier to OpenURLs' broader acceptance in the academic community, and, at the same time, reinforce the dominance of flexible, locally controlled, open-standards-based URL-resolution systems over resolution systems controlled by library automation vendors, aggregators, and publisher consortiums. Although eventually it will become possible to tie OpenURL resolution to institutional affiliations and resolver preferences stored in client certificates (e.g., DLF Certs), this is still several years in the future. At the very least we will have to wait until browsers provide decent certificate interfaces. It may also be possible to extract this information from distributed directory/object-instantiation services such as Microsoft's Passport. In this case, though, we will have to wait until services like this have proven themselves from a security standpoint and have shed the glittery ephemerality and proprietary flavor that characterizes them now. Until then SRV-based OpenURL resolution (with cookie-based backup) can provide a solid interim strategy for resolving OpenURLs and for solving the problem of using OpenURLs in scholarly citations.

Solving the nagging problem of citation does not, I might emphasize, resolve or answer the ongoing question of whether the burden imposed by OpenURLs on publishers and on the individual libraries themselves will prove too heavy to bear.

Ultimately, such issues will have to be resolved during the course of actual, practical use. My contention is that, to get to the point where OpenURLs are fit for practical use, they must support scholarly citation. I often hear librarians - already defensive about having spent a great deal of time and/or money on an OpenURL resolver - argue that citation really isn't their responsibility, and that they are really more interested in link maintenance and overall integration. Though understandable in context, this view fails to reckon with the primary reason libraries exist: To facilitate research. If we are setting up environments in which faculty and students can't cut and paste OpenURLs into their papers and email (or in which they must hand-edit them, fool with cookies, or, worse yet, resort to URLs - which we already know are inadequate) then we simply are not attending to their research needs. Any usable linking strategy, must, in order to be of practical use in a research environment, offer general support for scholarly citation.

In order for a linking strategy to support scholarly citation, it must be portable. And in order for it to be portable, it must offer a DOI/handle-like hierarchical resolution system - and yet do so without impinging on local libraries' ability to control the resolution process. Although DOIs are rapidly gaining ground through systems like CrossRef, neither they nor OpenURLs fully meet both of these requirements. My goal here has therefore been to take one of these linking specifications, OpenURLs, and to suggest a way of extending the current OpenURL infrastructure (consisting largely of personal link pages and SFX or SFX-like servers) so as to meet these requirements. OpenURLs already allow for local control of the resolution process. So all they really need, in order to work in general scholarly citations, is to be coupled with a resolution system like the one offered here.

Although more elaborate systems may be required in the future, the system described here is simple and easy to implement. It leverages well-known, robust mechanisms like reverse DNS lookup, SRV records, and HTTP redirects. And it will work with only trivial additions to our existing infrastructures. In so doing, it offers an immediate and very practical means of rendering OpenURLs usable in a broader academic context.

Bibliography

Beit-Arie, Oren; Caplan, Priscilla; et al., "Linking to the Appropriate Copy," D-Lib Magazine, September, 2001 (DOI)

Brand, Amy, "CrossRef Turns One" D-Lib Magazine, May, 2001 (DOI)

Corporation for National Research Initiatives, Handle System Overview (handle)

Distributed Systems OpenURL Demonstrator (URL)

Goerwitz III, Richard L, posting to Web4Lib mailing list, 6 Nov 2001 (URL)

NISO Committee AX, Development of an OpenURL Standard (URL)

OCLC PURL site (URL)

International DOI Foundation, DOI Handbook (DOI)

International DOI Foundation, Technical Note: DOI and OpenURL Integration (DOI)

National Library of Australia, Persistent Identification Systems, appendix 5 (URL)

Powell, Andy and Lyon, Liz, The DNER Technical Architecture: scoping the information environment (URL)

Shafer, Keith; Weibel, Stuart; et al., Introduction to Persistent Uniform Resource Locators (URL)

Van de Sompel, Herbert; Hochstenbach, Patrick; and Beit-Arie, Oren; OpenURL Syntax Description (draft; URL)


Richard L. Goerwitz III
Goerwitz IT Consulting, LLC