This paper describes a simple Apache-based intranet gateway - one that is faster and easier to set up and support than a VPN, more flexible and secure than a normal pass-through proxy, costs nothing, and discharges many of the duties normally reserved for high-end pass-though proxies and portal servers.
One of the central problems facing web-savvy organizations today is that of providing convenient, but secure, access to the intranet from off-site, i.e., from outside the institutional LAN or WAN. The primary reason for this is that demand for this sort of access is increasing; yet it is becoming increasingly difficult to offer it securely.
Over the last three years, both corporate and nonprofit web servers have fallen victim increasingly to partially or wholly automated break-ins and denial-of-service attacks, and the Web generally has witnessed something of a security meltdown. CERT, the Internet security information clearinghouse run by the Software Engineering Institute at Carnegie Mellon University, reports that the number of reported security "incidents" rose abruptly from their 1994-1998 levels of two to four thousand a year to 9,859 in 1999, then 21,756 in 2000, and 34,754 as of Q3 2001.*
These, of course, are only the reported incidents. The actual number of security breaches is much higher. The number, for example, of web servers compromised just in the first month of the Nimda virus's life has been estimated at well over half a million.*
As the quantity of compromises has increased, so also has their quality. The Code Red worm released in July of 2001, for example, leverages bugs across two different operating systems, Solaris and Windows, to gain unauthorized access to web servers. The Nimda worm, released two months later, utilizes multiple invasion pathways, making use of bugs in Microsoft Word, Internet Explorer (IE), Internet Information Server (IIS), and the Windows operating system itself. Worms such as these have demonstrated just how easily and effectively web servers and their associated networked infrastructures can be compromised - and confirms the existence of a growing threat.
Whether out of faith, defiance, or sheer inertia, web-enablement of both corporate and nonprofit IT resources continues virtually unabated despite these disturbing trends. Legacy systems are finding new life as web back-ends; early-mid 90s-style client-server applications are giving way to three-tier systems that leverage the new universal client, the web browser, as their primary user interface. Corporate reliance on web-based systems is not only a fact of life these days, but something we can count on to continue expanding in depth and scope throughout the next decade (especially with the advent of cheap wireless PDAs).
The challenge for network and systems administrators during these times will be make these "internal" software offerings readily accessible from the outside, but manage to do so in a way that is easy to administer and secure enough to meet the increasingly stringent demands of a hostile networked world.
One frequently encountered method of providing secure off-site intranet access is the virtual private network (VPN). Although VPNs are terrific for scenarios requiring full access to the institutional local-area network (LAN) or wide-area network (WAN), they are overkill for just web services. VPNs require that people install special, usually proprietary, software that is often quite difficult to set up correctly and support. More importantly, VPNs add enough overhead to the connection to seriously impact users' ability to get to the material they need when connecting via modems, ISDN, slow DSL lines, or basically any sub-optimal link.
A higher performance alternative to the VPN, is simply to protect web services and systems meant for internal consumption with passwords, and to expose those systems to the Internet (thereby making them accessible from off-site). Unfortunately, exposing services and systems to the Internet is a dangerous thing, and the more services and systems so exposed the greater the risk. Passwords themselves are also difficult to accept securely and platform-independently without secure socket layer (SSL [TLS]) encryption. And SSL itself adds to the transmission overhead in much the same way as a VPN. Although SSL opens up the possibility of alternate authentication methods such as client X.509 certificates with asymmetric cryptographic signatures ("client certs"), client certs presume a public key infrastructure (PKI) that is difficult to implement and maintain and that leaves us again with a familiar performance penalty. Ultimately, therefore, direct web service or server connections to the Internet lose us ground in the area of security without gaining us a great deal in the way of speed over VPNs.
Yet another solution to the problem of providing remote access to the institutional intranet - one that lacks the overhead and setup of a VPN, but that doesn't require any services to be directly exposed to the Internet - is the reverse or pass-through proxy (which, when outfitted with an HTTP cache, is sometimes called an accelerator). Pass-through proxy servers are essentially real-time mirrors. To the web clients they look like origin web servers. In fact, though, they simply replicate the content of one or more back-end servers and then re-present that content as if it originated on the pass-through proxy. When outfitted with a cache, a pass-through proxy can often heighten performance, especially in situations where back-end servers house large, memory-hungry web applications. And, because they are simpler in overall design than back-end web servers, pass-through proxies are easier to secure than the back-end web servers they mirror. They can even be configured to filter out worms and otherwise suspicious or malformed HTTP requests.
Unfortunately, pass-through proxy servers require that URLs on back-end server pages be specially constructed. URLs there must either point at the proxy, or else must consist only of a /path?query-string, (i.e., a path-only URL, without the scheme://hostname:port prefix). Normal pass-through proxies also offer little or no extra help with authentication. Authentication issues with a pass-through proxy are basically the same as they would be for a server directly connected to the Internet - but with the extra complication that user credentials must somehow be accepted by the proxy and forwarded to back-end servers if needed there, or, if not needed there, prevented from being forwarded (or, as in most cases, left for the back-end servers to collect on their own).
Commercial pass-through proxies such as Netegrity's Secure Reverse Proxy Server simplify the problem of authentication by utilizing various sophisticated permission and session management tools - making the reverse proxy into more of a general-purpose gateway to back-end Web-based servers and services than a traditional pass-through proxy. A similar, but in some sense more general, approach to this same basic problem is reflected in the iPlanet Portal Server, which provides permission and session management facilities, and which functions as a general content-aggregation and management system.
At $90,000.00 per CPU for the iPlanet Portal Server, solutions such as this lie beyond the reach of most small to medium-sized businesses and non-profits.
What most businesses need is a way into the corporate intranet that doesn't cost an arm and a leg, but yet provides secure, fast, flexible access to the intranet.
In late 1997, the Brown University Library was faced with a problem: How to get bona fide members of its community access to web resources it was licensing from various publishers and database-vendor ASPs (resources that were IP restricted and, at that time, only accessible from on-campus).
What Brown needed was a way to authenticate off-campus users against our local Kerberos key distribution center and to pass them through some sort of proxy server that would lend users the appearance of an on-campus origin point. What they wanted, in essence, was a secure portal to internal online Library holdings (a specific instance of the more general "secure intranet access" problem).
At the time VPNs were only just beginning to come into common use, and (as they often still do) would have meant substantial setup, support, and connection overhead. The seemingly logical solution to the problem, then, was to go instead with a standard HTTP proxy server. Unfortunately, standard HTTP proxy servers required browser reconfiguration, and would not have worked for users coming in through firewalled ISPs that enforced use of an internal proxy. In addition, there were no authentication mechanisms defined for standard proxy servers that met Brown's security requirements. So there was really no reasonable way Brown could use one.
Standard HTTP proxies being out of the question, Brown did some experimenting with what are called URL rewriters, which turned out to break HTTP cookies and therefore most forms of HTTP session management - and were thus unsuitable for our environment. As a way out two of us, Anne Nolan (Assistant Head of Reference at the Brown Library) and I, came up with a scheme to use a pass-through proxy server instead. Pass-through proxies could be secured in the same ways as an origin web server. The main challenge we faced with a pass-through proxy was that, in order to function properly, we were required to construct URLs on back-end servers be carefully, so that they made no reference to the origin servers. The reason this requirement presented a challenge to us is that we could not reasonably ask all our publisher and database-vendor ASPs to rewrite their systems to work with our pass-through proxy.
What we ended up doing was constructing a novel pass-through proxy that actually reformatted URLs found on back-end servers on the fly, in such a way that they met the prerequisites for use with our pass-through proxy server. What we did, that is, was to create a server that sucked content in from back-end servers and then massaged it into a form in which it could be effectively re-presented by our pass-through proxy without any action on the part of our ASPs (who generally didn't even realize we had installed a proxy). The key innovation of this system was that it created a distinct virtual host on the proxy for every back-end server. Doing this enabled us to leave URL paths on back-end server pages untouched. All we had to do was to rewrite the actual host names for the relatively few URLs that had them (most self-referential URLs in web pages being relative, and containing only path and query-string components). This approach to the problem of page-filtering allowed us to keep the rewriting code small, and enabled us to process, easily, even wildly malformed HTML pages.
The Brown pass-through proxy (called Libproxy) generates all needed virtual hosts (1, 2, and 3 in the above illustration) automatically. By tweaking a Libproxy configuration variable the systems administrator can control whether these virtual hosts are port-based or name-based. Yet another configuration variable allows the systems administrator to specify, if name-based hosts are being used, whether to embed the names of the back-end servers in the virtual host names or whether to generate the virtual host names using unique opaque strings that, in effect, hide the DNS names of the back-end servers.
Several white papers have been published on Libproxy, as well as a write-up in D-Lib Magazine. This particular type of proxy is now well known among the Library information-systems community, and, in addition to the Brown version, there are also at least two newer commercial offerings targeted specifically at libraries: EZProxy and an ASP-hosted service, Obvia.
Since 1997, our rewriting pass-through proxy server (Libproxy) has undergone several partial and yet another complete rewrite. Originally implemented in C as an extension to the standard Apache proxy module and administered by hand-editing plain text configuration files, Libproxy now runs as a Perl module written to the Apache mod_perl API, and comes with a full web-based administrative interface, documentation, and its own set of APIs.
Up until the spring of 2001 Libproxy was used almost purely as a tool for remote access in libraries and had seen only sporadic use in other contexts (e.g., as a front-end for systems running Microsoft's Internet Information Server).
In May of 2001, however, Brian Payst, Network Technology Manager for the Arlington-based nonprofit, the Nature Conservancy (TNC), contacted me, asking if Libproxy could be extended to function as a single sign-on gateway to their intranet. The TNC intranet consisted of a number of internal web servers and systems, some of them WebDAV-enabled file repositories. They were using Radius (in the form of Microsoft's Internet Authentication Service [IAS]) for authentication, and planned on using Active Directory and its LDAP interface for authorization.
Payst's query, and our subsequent collaboration, triggered a series of revisions to Libproxy, the most significant of which being:
Libproxy went into production as the intranet portal for the Nature Conservancy in August of 2001, and since then has met with great success - mainly because it is faster and easier to use than TNC's existing VPN-based solution. Although the VPN is still required in certain cases, Libproxy has become the remote-access method of choice for nearly all web-based intranet systems at TNC.
TNC's deployment demonstrates, among other things, the viability of this technology, not only as a gateway to ASP-based Library resources, but also as a gateway or portal to an institutional intranet.
The heart of Libproxy's security infrastructure is its authentication scheme.
By default, Libproxy uses a cookie-based authentication scheme that accepts usernames and passwords over an SSL connection, verifies these credentials against a back-end authentication service (e.g., a Radius server, an NT domain controller, or a Kerberos KDC). If the username and password verify correctly, Libproxy issues a time-limited service ticket supplied to the client as an HTTP cookie.
The service ticket-cookie contains the client's IP address, the IP address of the Libproxy server, a switch indicating whether the client is using a proxy or not, the authentication method used, a timestamp and a lifetime (both in seconds), the ID of the user, and a series of MD5 hashes based on the ticket itself, the user's HTTP agent, and a secret known only to the Libproxy server. If this ticket-cookie is truncated, altered, augmented, or replayed from another machine, Libproxy will force the user to re-authenticate. Libproxy will also, optionally, encode the user's password into the ticket (encrypted using DES3).
The following diagram illustrates how a typical unauthenticated (i.e., ticketless) request to Libproxy normally proceeds, and how the ticket-cookie is generated. The red arrows indicate portions of the transaction that must, mandatorily, be run over SSL-ized connections. Note that Libproxy and its authentication module, though logically distinct, reside by default on the same host. Although it is possible to decouple them, this involves extra maintenance and introduces an extra failure point. Decoupling them can provide extra security, though, especially if the authentication module resides in a more protected area of the institutional LAN or WAN:
![]()
Key: 1. initial request, 2. redirect to authentication server, 3. authentication request, 4. credential verification request, 5. credential verification response, 6. authentication response, 6. set-cookie + redirect, 7. second request (same as [1], but now with a ticket-cookie), 8. request to back-end host (relayed), 9. response by back-end host, 10. response by Libproxy (contains whatever was returned in step [9], but with URLs and various headers filtered).
The main weakness in this system is one that it shares with all cookie-based authentication schemes: It is susceptible to replay attack if the client happens to be using an HTTP proxy on a shared broadcast LAN - in which case, anyone using the same proxy could theoretically sniff and steal the ticket, then re-use it. See the illustration below (tickets forwarded to Libproxy from inside the shared LAN box can, given the right scripts and network tools, be seen and re-used by any of the three clients there):
Libproxy tries to make replay attacks as difficult as possible by placing time limits on cookies and by frequently testing and re-issuing cookies as the user moves from one remote service to another. In reality, though, the shared-LAN scenario is less of a problem than it used to be, as institutions move to switched networks, and as home users work with modems and DSL lines (broadband "cable" subscribers may still be susceptible to attack).
In cases where security is an important concern, the entire transaction (from client to the back-end server) can be run over SSL, with Libproxy supplying a client certificate to the back-end server if needed (the red arrows below indicate SSL-ized connections, which prevent ticket-cookies from being seen and re-used by other machines on the LAN[s]):
Utilizing SSL-ized connections effectively remedies the problem of replay attacks in the shared-LAN scenario. If the users are CONNECTing through an HTTPS proxy, however, all bets are off if that proxy itself is either compromised or corruptly administered.
Libproxy's grouping facilities ride atop its authentication scheme. In order to function correctly, that is, Libproxy's grouping modules rely on the availability of an authenticated user ID. The two facilities are de-coupled, although they both must share some common pool of user identifier strings (the authenticated user IDs).
Groups themselves are basically just collections of users and can be characterized using one or more of three basic criteria: 1) IP address ranges, 2) ID patterns, or 3) arbitrary LDAP queries.
Of these three, LDAP queries are the preferred criterion. They work simply enough: Users whose IDs appear in the result lists of LDAP queries associated with a given group are counted as members of that group. A number of macros are provided to make it easy to build a series of LDAP queries, each of which builds on the previous one's result list. Organizations that have set up a Microsoft Active Directory server will have ready access to the LDAP interface that server exposes, and will be able to use the grouping facilities provided there out of the box. So also will organizations that utilize recent versions of Novell NDS, or who house their institutional whitepages in an LDAP database such as the iPlanet directory server. So groups should not be very difficult for most organizations to set up. (I'd really recommend iPlanet at this point, at least for insititution-wide whitepages and as a group repository).
Groups are an important element in Libproxy's process of deciding whether it will process a transaction or not. Group membership alone, however, does not generally constitute a sufficient basis for making these (authorization) decisions. Authorization decisions must generally also take into account the web resource that the user wishes to access.
In order to tie the resource that the user wishes to access into the security equation, Libproxy allows back-end hosts to be attached to groups. Attachment occurs, by default, at the back-end host's root document level and applies to every URL on that host. It is also possible, however, to limit attachment to specific locations and paths.
For example, suppose Libproxy is being used to provide access (among other things) to an internal WebDAV-enabled server with three working areas, one for institution's new top-level page redesign, another working area that anyone may drop pages into, and yet another area used for joint work with another organization. As long as these three working areas correspond to distinct URL paths, it is possible simply to attach those directories on that WebDAV-enabled server to three different groups, and then set the membership of those groups to correspond to some appropriate LDAP query. If a user wanting to access a particular directory belongs to one of the groups it is attached to, he or she is allowed access.
Libproxy offers a number of extra authentication-related features that, together, can be used to create a single sign-on intranet environment. These are:
Various single sign-on environments can be created by combining these three facilities in various ways.
Probably the easiest single sign-on environment to set up is one in which all back-end servers are configured to accept connections only from the Libproxy server's IP address, in effect delegating authentication and authorization to Libproxy. If a higher security level is desired, it is possible to run everything over SSL, with Libproxy supplying a client cert to back-end servers and a server cert to users' web browsers (see [3] above). In this scenario, all users are routed through Libproxy, which allows or denies requests based on information configured into its own host and group tables.
A slightly more complex single sign-on environment (one that is still fairly easy to set up - although it requires back-end servers to be running Apache) has Libproxy passing authenticated user IDs on to back-end servers using a custom header (see [1] above). To facilitate this usage scenario an add-on module, Libproxy::Apache::LibDelAuth, has been included with the base Libproxy distribution (look for it in the ./LibDelAuth directory). This module, which installs trivially on any mod_perl-enabled Apache server, allows the server to accept Libproxy's custom HTTP header in lieu of HTTP basic or digest authentication. Although no one has actually done it yet, writing an equivalent C module would be easy to do; and in environments where mod_perl is seen as insufficiently secure, it would, naturally, be necessary. For higher-security environments, it is also possible to configure everything to run over SSL, as in the previous scenario described above.
In cases where there are non-Apache back-end servers, it is also possible to configure Libproxy to perform authentication on the user's behalf, utilizing HTTP basic or digest authentication - whichever the back-end server asks for (see [2] above). This scenario may be mixed with the custom-header scenario ([1] above). Servers that can't use the custom header simply ignore it and ask for basic or digest authentication. Naturally, passing basic authentication tokens over the LAN or WAN mandates use of SSL, partly because of the insecurities inherent in the HTTP basic authentication protocol, but also because the user's password must, in this scenario, be encrypted and stored in his or her ticket-cookie (to be forwarded later to back-end servers as needed).
Because it utilizes a proprietary, and still poorly documented, protocol, Windows NTLM authentication should be avoided in all usage scenarios - not only those involving Libproxy. Sadly, NTLM actually assumes a specific threading model and an implicit network connection state. NTLM authentication (as well as its successor, Integrated Windows Authentication, which will use Kerberos in addition to NTLM), is thus difficult to implement for non-Microsoft platforms and proxies. For institutions who value flexibility and cross-platform compatibility NTLM should be shunned in favor of either HTTP digest or basic-over-SSL authentication.
Because of Libproxy's roots in the library community, and its ability to massage web pages from back-end hosts on the fly to conform to the requirements of a pass-through proxy, integrating remote, ASP-hosted services into a Libproxy-based portal is a simple - in some cases, trivial - task.
In brief, ASP-hosted services can be treated in exactly the same way as locally hosted applications. ASP-hosted systems can be set to accept SSL-based connections only from the server running Libproxy, and to accept user credentials from Libproxy either via a custom HTTP header or via HTTP basic or digest authentication.
Libproxy is designed to run over top of Apache, which has an imperfect, but nevertheless extremely good, security track record (much better than that of IIS, for example). Libproxy also leverages Perl, mod_perl, and Linux (or Solaris), which have extremely good security track records as well. Naturally, the machine on which the Libproxy software runs should be regularly updated with patches issued by the operating-system vendor. It should also have all unnecessary services turned off, leaving Apache (qua Libproxy) as the only potential invasion point.
Libproxy is known to run under Linux (RedHat, Mandrake) and Sun Solaris. Other Unix and Unix-like operating systems other than Linux and Solaris are likely to work with some tweaking. Linux is the preferred platform because it comes with excellent log-rotation and authentication facilities that Libproxy can take advantage of. Although a Microsoft Windows version would theoretically be possible, Microsoft's hostility to open-source projects makes it questionable whether such a version would be wise - or worth the effort.*
Doing the initial Libproxy install is, frankly, a headache, and requires knowledge of Perl, Apache, and MySQL, as well as a general familiarity with TCP/IP services, authentication, and firewalls. The most recent Libproxy source archive may be downloaded from http://www.goerwitz.com:31265/libproxy/dist/. To unpack it, cd into the directory where the source code will reside, and then un-tar the archive. Take a look at the INSTALL file that comes as part of that archive. It offers full and detailed instructions on how to install and configure prerequisite software and systems, as well as Libproxy itself:
cd /usr/local/src
tar -zxvf /path/to/libproxy.current.tar
cd `ls -c libproxy* | head -1`
less INSTALL
Documentation on the various authentication options is included in the INSTALL file. A full list of configuration options, with copious comments and documentation, may be found in the sample Libproxy configuration file included with the base Libproxy distribution, ppf.conf.
Libproxy is written entirely in Perl. It runs under Apache as a series of Perl modules written to the Apache-Perl (mod_perl) API which, in turn, use a series of MySQL database tables containing information about back-end hosts, proxied domains, and groups.
When Libproxy is started, what actually happens is that a copy of Apache is started and initialized off a large, unpleasantly complex configuration file with an extended <Perl> section (conf/ppf.httpd.conf relative to the Libproxy installation directory). This <Perl> section has three main functions: 1) To read in the Libproxy configuration file, ppf.conf, and change Libproxy's default behavior according to the directives contained there; 2) to go through Libproxy's MySQL databases, determine what back-end servers are being proxied, and set up one virtual host for each of them; and, finally, 3) to hand off control to Apache, which is set to trap HTTP requests and pass most of the processing on to Libproxy's mod_perl "handlers."
Libproxy offers a number of opportunities for customization, from basic configuration-file directives to full-blown APIs. The simplest and most useful way to customize Libproxy is just to edit its settings from the administrative interface, which installs at https://libproxy-server-name:1080/admin/ by default. A bit less direct, but nevertheless still extremely simple, is to edit the ppf.conf file and then restart Libproxy. A lot can be done with the system through just the administrative interface and the ppf.conf file.
Those comfortable with Apache .htaccess files may, however, want to set the SeparateDocRoots directive in the Libproxy ppf.conf file to true, then restart Libproxy. Upon seeing this directive, Libproxy will create in ./proxy a separate document root directory for each back-end host (relative to the Libproxy installation directory). The naming convention for these directories is host:port. Libproxy adds an htaccess.sample file to each of these directories, which may be renamed as .htaccess and edited to taste.
Those comfortable with Perl, and who are interested in the mod_perl modules used to handle various Apache request transaction phases, should look at the POD sections of the main Perl modules that Libproxy installs:
export PERL5LIB=/usr/local/libproxy/lib/perl:$PERL5LIB
perldoc Apache::BrownTicket
perldoc Apache::ProxyPassFilter
perldoc Apache::ProxyPassFilter::ProxyPassFinder
Those with an in-depth understanding of mod_perl should also take a look at the Apache ppf.httpd.conf file that initializes and configures the Apache/Libproxy server at start-up time.
Libproxy provides a full API for customizing the rewriting process for both domains and individual back-end hosts within those domains. This API (which by default is disabled) may be enabled by setting the AllowFilters directive in the ppf.conf file to true and restarting Libproxy. A sample filter, Sample.pm, is available inside the ./filters directory (relative to the Libproxy installation directory).
My goal here has been to describe a method for constructing a solid, working, Apache-based intranet gateway. As a means to this end I've introduced a piece of free software, Libproxy, which has proven, in real-world usage scenarios, to function well in this capacity - i.e., as a useful, albeit minimal, intranet gateway.
Libproxy has the advantages of being relatively simple in its overall design; faster and easier to set up and support than a VPN; more flexible and secure than a normal pass-through proxy; and, at zero up-front cost, capable of discharging many of the duties normally reserved for high-end pass-though proxies and portal servers.
My hope is that by introducing this software, and by offering a brief overview of its history, principles of operation, and APIs, as well as examples of deployment strategies, I have provided systems and network administrators with a low-cost battle plan for meeting the challenge of making their institution's "internal" web offerings readily, but securely, accessible from outside the institutional LAN or WAN.