The Resource Catalog
Keith Moore
Computer Science Department
University of Tennessee, Knoxville
ABSTRACT
The Resource Catalog is a simple, highly available and very scalable
resolution service, which was designed to support various scalable computing
and metacomputing applications. This memo describes the Resource Catalog,
its goals and characteristics, the protocol, the implementation, and experience
with its use.
1. Introduction
The Resource Catalog (RC for short) is a
simple, highly available, and very scalable resolution service, which was
designed to support various scalable computing and metacomputing applications.
By ``resolution service'', we mean a service for mapping a resource name
onto a set of attributes or characteristics of that resource, which are
sometimes called metadata. A resolution service is different from a directory,
in that a resolution service maps a resource name into its associated attributes,
while a directory is intended to allow for searching of the attributes
themselves to identify matching resources. In general, a directory provides
a superset of the functions provided by a resolution service, since a directory
can generally access a set of attributes by name. Common resolution services
include the Domain Name System [1] used in the Internet, and the Network
Information Services (NIS and NIS+) [2] used on UNIX systems.
The Resource Catalog was originally designed
to support the Resource Cataloging and Distribution System (RCDS) [3].
Since RCDS presented some unique requirements that could not be provided
by off-the-shelf directory or resolution products, and since RCDS did not
require the additional services provided by a directory, we designed a
new resolution service that could meet RCDS's requirements. The Resource
Catalog has also been used in the Scalable Network Information Processing
Environment (SNIPE) [4].
We believe that flexible resolution services
may find use as middleware in a wide variety of applications, and that
some of the techniques adopted for RCDS may be useful when high availability
or scalability are needed for such applications.
2. Design Goals
The RCDS project was largely concerned with
improving the availability of network-accessible resources, including the
World Wide Web; and the Resource Catalog was the centerpiece of RCDS. Given
the large number of resources on the Web and the large number of anticipated
users, the Resource Catalog was designed with the following goals in mind:
-
Scalability. The Resource Catalog was designed to be able to handle
hundreds of millions of resources, and more importantly, millions of users
for any particular set of resources.
-
Reliability and Fault-Tolerance. The information in the Resource
Catalog is replicated across multiple servers. Clients can make queries
from any server for a particular subset of URL space. Should that server
become unavailable, the client can choose a different server.
-
Efficiency. The protocol used to access the Resource Catalog was
designed to make efficient use of network resources and to minimize the
delay in responding to queries.
-
Security. The Resource Catalog provides strong authentication for
updates to the catalog; it also provides end-to-end (producer-to-consumer)
authentication of resources and metadata.
-
Flexibility. The metadata stored in the Resource Catalog is generally
opaque to the Catalog; RC therefore places few constraints on the format
of resource metadata. Attribute names are simple ASCII strings, and values
are strings of octets. In addition, the end-to-end authentication is also
extensible to new schemes without changes to the RC servers or client libraries.
This allows for easy extensibility to accommodate new schema and new authentication
schemes.
-
Ease of deployment. RC was designed to be easy to deploy, to require
as few new components as possible.
-
Low Cost. RC was designed to be simple, and to have a low cost of
operation.
3. Service Overview
At the simplest conceptual level, RC provides
two basic services: (1) allow an authorized user to make assertions about
a resource, and (2) allow a user to retrieve the assertions made about
a resource. However, this description hides a lot of the subtlety in the
particular design choices made to allow a high degree of availability:
3.1. Format of resource names and assertions
RC is designed to use Uniform Resource Identifiers
(URIs) as resource names. URIs include the familiar Uniform Resource Locators
(URLs) [5] in addition to Uniform Resource Names (URNs) [6] and other resource
names. The RC client library can parse several different kinds of URI and,
by consulting the domain name system (DNS), determine which RC servers
are likely to be authoritative for that particular resource.
Conceptually, assertions are simply statements
of the form ``name=value'', where a name is a string of printable characters,
and a value can be any string of octets. Each assertion also includes the
``type'' of the value, the identity of the asserter, a time-to-live field,
an expiration date, a revision number, and an ``index'' which is used in
digital signatures.
3.2. Assertions by multiple parties; conflict resolution on client
Assertions about a resource may be made
by multiple parties (``asserters''), as long as each asserter has permission
to do so. Multiple parties may each provide assertions with the same name
about the same resource; each party's assertions are kept separate from
the others. Thus, no party's assertions can override the others. It is
up to the RC client to determine how to resolve apparent conflicts - whether
a particular party's assertions are valid, or which parties' assertions
are believed in preference to others.
3.3. Replication
The assertions about a resource can be replicated
across several RC servers. There is a mechanism by which a user (RC client)
can determine which servers store the characteristics of a particular resource.
Once the client knows the set of servers for a resource name, it may issue
a query for the resource characteristics to any of those servers which
is reachable. RC also supports ``multiple masters'', so that assertions
about a resource may be posted to any of several replicas. Each RC server
informs its peers of any changes to the characteristics of a resource.
RC does not guarantee that all replicas
of the characteristics of a resource are synchronized. Rather, it ensures
that each replica contains a snapshot of each asserter's assertions about
a resource at any point in time. If there are multiple asserters, it is
possible for server A to have a current copy of asserter X's assertions
about resource R and an old copy of asserter Y's assertions about resource
R. At the same time, server B has the current copy of asserter Y's assertions
and an old copy of asserter X's assertions. This might be thought of as
single-copy serializability on a per-asserter basis.
3.4. Support for digital signatures
Any subset of assertions about a resource
can be digitally signed. Signing a set of assertions is taken to mean ``this
set of assertions taken together is self-consistent''. There is no requirement
that a set of signed assertions have any lexicographic relationship to
one another. Multiple signatures can appear within a resource description.
An essentially arbitrary number of signature algorithms can be supported
with no change to the RC servers. Multiple subsets of a single resource's
assertions may be signed, even if those subsets overlap.
Signatures are provided for the benefit
of the user who is attempting to determine the validity of one or more
assertions. The RC server does not attempt to verify signatures. Signatures
are opaque to the RC server, except that the RC server can determine which
assertions are used in a particular signature.
3.5. Client-specified selection of assertions and signatures
A client may specify, in each query, the
kinds of assertions and signatures which are desired. This includes a specification
of which assertions the client is interested in, whether signatures are
desired, and the signature algorithms that the client supports.
3.6. Support for proxies and cacheing
Each assertion contains a time-to-live field
and an expiration-date field to allow effective cacheing. The RC client
library also has the ability to make queries and updates via a proxy. Such
proxies may provide access through a security firewall, provide resolution
for locally-available resources which might not be available to the general
public, serve as a gateway to other resolutions serveices, and in general,
provide a local interpretation of resource names.
3.7. Separation of location data and metadata
RC version 1.0 maintains the list of locations
for a resource, separately from the metadata for a resource. The locations
are provided by the file servers that maintain replicas of the files, while
the metadata are maintained by the content creators. This two-level architecture
allows the content replication system in RCDS, to maintain the current
set of resource locations, independently of any assertions about the resources
itself (which are assumed to be provided by the content-creator). In other
words, the RCDS system separates the functions of authoring and publishing
a resource, from the task of distribution of the resource, and the RC division
of locations from metadata reflects this separation.
3.8. Support for long-term resolution of resource names
Because the structure of resource names
can be expected to change over time, the RC server treats resource names
as opaque strings. They are used as search keys to index the metadata,
but are not parsed by the RC server.
The RC server's protocol also supports
redirects. This allows the metadata for any single resource name, or for
any portion of URI space matching a pattern, to be moved and maintained
on another server.
3.9. Efficient query and update protocol
The RC server's protocol was designed to
support fast queries in order to minimize the additional latency incurred
in accessing a resource via RCDS. In particular, for small payloads it
uses UDP transport rather than TCP, because the bandwidth cost (and latency)
of TCP connection setup is several times that of the cost of a single UDP
packet exchange. For most of RCDS's purposes, a single UDP request and
response packet are sufficient to carry the necessary data; any effort
spent on TCP connection setup would be wasted.
4. Protocol
This section describes the protocol used
by RC, including the format of request and response messages, the means
of authentication, and the means used to locate a server for a particular
resource name.
4.1. Use of ONC RPC
The RC protocol is based on the Open Network
Computing Remote Procedure Call (ONC RPC) facility [7]. This was done because
it was felt that ONC RPC provided ease of specification, ease of prototyping,
and minimal overhead. Calls were made to port 9272, rather than using RPC's
``portmapper'' facility to determine which port to use. Since the expected
usage pattern for RCDS clients would not have clients making multiple RC
queries to any one server, using a well-known port rather than portmapper
cut the overhead in half (from two RPC calls to one).
4.2. Authentication
The authentication methods available with
most implementations of ONC RPC were not sufficiently secure for our purposes,
we implemented our own authentication mechanism on top of RPC. This mechanism
used shared secrets, timestamps, and keyed MD5 [8]. The last modification
time of each metadata record by each writer was stored in the record, and
was used to thwart replay attacks. The shared secrets were high-quality
random numbers of sufficient length. For ease of transcription, these were
represented as sequences of English words in a manner similar to that used
by S/KEY. [9]
An authenticated request used the following
XDR data structure:
struct rc_authenticated_req {
rc_authtype authtype;
rc_date request_time;
string authname<>;
opaque authbits<>;
opaque real_args<>;
};
enum rc_authtype {
RC_AUTH_NONE = 0,
RC_AUTH_MD5 = 1
/* other types to be defined */
};
/*
* Dates are used both in the RC protocol as well as being one of
the
* data types which can be associated with an Assertion.
*
* 'day' is # days since 1 Jan 1970 midnight GMT
* 'msec' is # milliseconds since midnight GMT
* (a single 32-bit integer isn't large enough)
*/
struct rc_date {
int day;
unsigned int msec;
};
The authtype field indicates which type of authentication is being used.
The request_time field gives the date and time of the request (according
to the client's clock). The authname field identifies the user making the
request. The authbits field is an opaque bit string which is specific to
the authentication type. Finally, the real_args field contains the actual
procedure arguments encoded in XDR.
4.3. URIs, URLs, URNs, and LIFNs
RC can be used with several different kinds
of identifiers. URLs are the familiar Uniform Resource Locators currently
used by web browsers and several other applications. They are typically
of the form protocol://host[:port]/filename.
URNs or Uniform Resource Names, are designed to be long-term stable names,
and therefore are not bound to any particular host, protocol, port, or
filename. They are of the form URN:namespace:suffix.
The term Uniform Resource Identifier
(URI) has been coined to refer to any of several kinds of identifier, including
URNs and URLs, which share a particular set of characters and delimiters,
and are distinguishable from one another by a prefix such as "http:" or
"URN:".
Location Independent File Names (LIFNs)
are long-term stable URIs designed for use with RCDS. A LIFN refers to
a particular instance of a resource which produces the same result independent
of location or the time it is accessed. Within RC, a LIFN is used as a
link between a resource's metadata and its current locations. A resource
may have one or more LIFNs (say, one LIFN for each version) as one of its
metadata attributes; each LIFN then names the content of a particular instance
of the resource. Given a resource's URN or URL, RC will supply the metadata
attributes; given a resource's LIFN, RC will supply the current locations
of that resource.
4.4. Finding an RC server for a URI
The method used to find an RC server is
derived from [10]. If the resource name is a URN, the string ``urn.net''
is appended to the name space identifier portion of the URN, and a DNS
query is made for the NAPTR resource records corresponding to the resulting
name. These resource records contain a regular expression replacement string,
which when applied to the URN, either produces a DNS name at which service
locations (SRV resource records [11]) can be found, or another DNS name
with an NAPTR record. The process terminates when SRV records are found;
these indicate the host names of various types of servers (including RC
servers) for that portion of URN-space.
A similar method is used to find servers
for URLs, except that the URL protocol is treated as the namespace for
the purpose of the initial DNS query. Also, if the initial query fails
to locate any NAPTR records, and the URL is in a format that includes a
domain name, the DNS is queried for A (IP address) records using the name
rcds.domain.
4.5. Functions
A total of six functions are implemented
in version 1 of RC. update_name is used to provide assertions
about a resource, update_lifn is used to provide locations for
a resource, query_name is used to query a server for assertions
about a resource, and query_lifn is used to query a server for
resource locations. Finally, create_uri is used to generate a
globally-unique URI for later use, and set_debug is used to enable
or disable generation of verbose debugging and/or logging messages.
A common set of integer status codes
is shared by all RC functions. These codes report either successful completion
of the operation (RC_SUCCESS), temporary failure (RC_TEMPFAIL),
or any of various reasons for failure including syntax errors, authentication
errors, permissions errors, etc.
Each of the functions of the RC server
is described below, using XDR notation for the function call and return
parameters. The notation has been re-arranged and in some cases simplified
for readability. This is not meant to serve as a reference for implementation.
4.5.1. update_name
The update_name request consists
of a URI for which the assertions are being updated, a list of assertions
being updated for the URI, and a list of certificates. The certificates
are digitally signed sets of assertions.
const MAX_URI_LEN = 1024;
typedef string rc_uri<MAX_URI_LEN>;
const MAX_NEW_ASSERTIONS = 512;
const MAX_NEW_CERTIFICATES = 512;
struct rc_update_name_req {
rc_uri urn;
rc_assertion_update
assertions<MAX_NEW_ASSERTIONS>;
rc_certificate_update
certificates<MAX_NEW_CERTIFICATES>;
};
Each assertion consists of a name, type,
assertion date, expiration date, value, and flags. The type is used to
indicate the data type associated with the value. In most cases the type
is opaque to the server, and is only included for the sake of clients that
wish to ``print out'' or dump the contents of the value for debugging purposes.
However, the type can also be specified in queries to allow the client
to request, say, a Unicode string rather than an ASCII string, and the
RC_LIFN_T
type has special side effects.
The flags control how this update is
treated. If the RC_AU_DEL_PREV flag is set, all previous assertions
with this name, from this asserter, are deleted at the same time that the
new assertion is added. If the RC_AU_DEL_ALL flag is set, all
previous assertions with this name from this asserter are deleted, and
no new assertions are added. If the RC_AU_WILDCARD flag is set,
the two previous flags apply to all previous assertions from this asserter,
that have this name as a prefix. These flags allow new assertions to effectively
replace old ones, and they remove the need for a separate ``delete assertion''
call.
const MAX_NAME_LEN = 256;
struct rc_assertion_update {
string name<MAX_NAME_LEN>;
unsigned int type;
rc_date assertion_time;
rc_date expiration_time;
opaque value<>;
unsigned int flags;
};
/*
* bit values for 'rc_assertion_update.flags'
*/
#define RC_AU_DEL_PREV
01 /* del prev assertions w/this name */
#define RC_AU_DEL_ALL
02 /* del all assertions w/this name */
#define RC_AU_WILDCARD
04 /* match all assertions that begin
* with this name */
Each certificate consists of a list of assertions
(indices into the assertions array above), the date and time of the certificate,
a certificate algorithm identifier, and a signature on the assertions.
The certificate algorithm identifier field identifies the algorithm used
to generate the signature. A certificate algorithm consists of a canonicalization
step (to massage the assertion data into a different format), followed
by a digital signature step, and the resulting signature itself may be
packaged in a particular format, say a PGP signature [12] or an X.509 certificate
[13].
Note that the signature is opaque to
the Resource Catalog. The RC server understands nothing about any particular
signature algorithms; it only knows that a particular set of assertions
is signed. The RC server supports certificates in two ways. First, if a
query requests certificates for a particular assertion, RC will return
all of the assertions necessary to verify that certificate. Second, RC
will not delete any assertion that is still covered by a certificate, until
all assertions covered by that certificate are marked for deletion.
struct rc_certificate_update {
unsigned int assertions<>;
rc_date certificate_time;
unsigned int certificate_algorithm_id;
opaque signature<>;
};
The response to the update_name
call consists of an integer status value which indicates either success
or the reason for failure.
struct rc_update_name_resp {
unsigned int status;
};
4.5.2. update_lifn
An update_lifn call establishes
a binding between a LIFN and a URL, and indicates that a particular file
server is making the file with that LIFN accessible for a time using that
URL. The arguments to the call consist of: the LIFN, the URL, the time
at which the file server believes it will delete the file, and the maximum
amount of time that a client should cache the LIFN-to-URL binding. Both
times are expressed as deltas from the current time. This removes the need
for client and server to have synchronized clocks.
The response to the update_lifn
call is a single status code.
struct rc_update_lifn_req {
rc_uri lifn;
rc_uri url;
u_int file_server_reap_time;
u_int client_cache_ttl;
};
struct rc_update_lifn_resp {
unsigned int status;
};
4.5.3. query_name
A query_name call consists of a
URN, a list of assertion requests, a list of certificate algorithms that
the client is interested in, and a list of URL protocols (e.g. "http",
"ftp", ...) that the client understands.
const MAX_PROTO_LEN = 32;
typedef string rc_url_proto<MAX_PROTO_LEN>;
struct rc_query_name_req {
rc_uri urn;
rc_assertion_request
assertion_requests<>;
unsigned int certificate_algorithm_ids<>;
rc_url_proto url_protocol_list<>;
};
Each assertion request consists of a name,
type, and flags. The type may be RC_ANY_T to match any assertion
with that name regardless of its type, or it may be a specific type in
which case the request will only match assertions of the same type.
The AR_WILDCARD flag specifies
that the client wants to match any assertion with the same prefix as the
request (rather than requiring that the entire name be matched). The AR_WANT_CERTIFICATES
flag indicates that the client is interested in any digital signatures
which apply to this assertion. The AR_WANT_OLD_VERSIONS flag indicates
that the client is interested in old versions of this assertion (not just
the most recent version from each asserter), and the AR_WANT_LOCATION_INFO
flag indicates that the client is interested in LIFN-to-URL bindings if
the value of this assertion is a LIFN.
struct rc_assertion_request {
rc_uri name;
unsigned int type;
unsigned int flags;
};
/*
* values for assertion_request flags
*/
#define AR_WILDCARD
01
#define AR_WANT_CERTIFICATES 02
#define AR_WANT_OLD_VERSIONS 04
#define AR_WANT_LOCATION_INFO 010
The response to a query_name call
consists of a URN, a status code, a serial number, a list of assertions,
a list of certificates, a list of redirects, a list of locations, and a
list of cached DNS records. The URN is the URN for which the query was
made. The status code indicates success, partial success, or the reason
for failure of the query. Every time a record is updated, its serial number
is incremented by the RC server; this serial number is returned in response
to a query_name call.
struct rc_query_name_resp {
rc_uri urn;
unsigned int status_code;
unsigned int serial;
rc_assertion assertions<>;
rc_certificate certificates<>;
rc_redirects redirects;
rc_lifn_locations locations<>;
};
Each assertion consists of a name, type,
value, assertion time, and expiration time that were supplied by the asserter
in an update_name call. The assertion also contains the identity
of the asserter (expressed is an integer that is only meaningful to the
server), and the serial number of the record from the time that the assertion
was posted.
struct rc_assertion {
string name<MAX_NAME_LEN>;
unsigned int type;
opaque value<>;
rc_date assert_time;
rc_date expiration_time;
unsigned int asserter;
unsigned int serial;
};
Each certificate consists of a list of assertions
(the list represented by integer indices into the assertions list), the
time of the certificate, the algorithm ID, and the signature. It also includes
the identity of the asserter and the record serial number when the certificate
was posted.
struct rc_certificate {
unsigned int assertions<>;
rc_date certificate_time;
unsigned int algorithm_id;
opaque signature<>;
unsigned int asserter;
unsigned int serial;
};
Each redirect consits of a URI prefix, a
time to live, a preference value, and IP addresses of the servers to which
future queries should be referred. The meaning of the redirect is that
future queries beginning with that particular URI prefix, should be made
to the servers reported in the redirect, rather than the servers indicated
by DNS records.
const MAX_SERVER_ADDR = 16;
struct rc_redirect {
rc_uri prefix;
unsigned int ttl;
unsigned int preference;
opaque server_addr<MAX_SERVER_ADDR>;
};
const MAX_REDIRECT = 8;
struct rc_redirects {
rc_redirect list<MAX_REDIRECT>;
};
Each lifn-location set consists of a LIFN, and a list of locations. Each
location consists of a URL, the time-to-live and expiration for the LIFN-URL
binding. It also optionally includes cached DNS A records (in DNS's native
encoding) that give the IP addresses associated with the URL. The native
encoding is used so that DNS SIG records can also be conveyed without loss
of information.
struct rc_lifn_locations {
rc_uri lifn;
rc_location_resp list<>;
};
struct rc_location_resp {
rc_uri url;
unsigned int cache_ttl;
unsigned int expiration_date;
opaque cached_dns<>;
};
4.5.4. query_lifn
A query_lifn call returns the URLs
associated with a LIFN. The arguments to a query_lifn call are
the LIFN, and a list of URL protocols that the client is interested in.
struct rc_query_lifn_req {
rc_uri lifn;
rc_url_proto url_protocol_list<>;
};
The response consists of a status code,
a list of locations, and a list of redirects. The format of locations and
redirects are the same as for query_name.
struct rc_query_lifn_resp {
unsigned int status;
rc_lifn_locations location;
rc_redirects redirects;
};
4.5.5. create_uri
xxx code fragment goes here
4.5.6. set_debug
5. Implementation
5.1. Server implementation
The server consists of about 7000 lines
of C code, not counting the database back-end, the code which is common
to both client and server, and the code which is generated by rpcgen.
The server is structured as follows:
the main portion of the program reads configuration files, initializes
the databases, and opens the network sockets, and then sits in a loop servicing
incoming requests and dispatching them to appropriate routine. Another
module handles authenticated requests and verifies their credentials. There
is an implementation module for each of the server functions listed above.
A replacable "dbglue" module provides an interface to the back-end that
indexes and stores records in the database. We have experimented with several
different database back-ends in an attempt to find a good balance of reliability
and performance. There are also modules for converting between the C data
structures used to hold metadata and LIFN-URL bindings, and their on-disk
formats. The on-disk formats are engineered so that the files can be copied
from one machine architecture to another; we have found this extremely
useful in setting up new replicas.
5.1.1. Configuration
The server has two configuration files,
the user database and the authority file. Each is an ordinary ASCII text
file with fields separated by colons, in the style of /etc/passwd.
The user database contains information
about which users are authorized to make changes to the database. Each
line of the file is of the form:
username:user-id:permissions:authtype:authbits
where username is the identity string that the user uses for authentication,
user-id is the integer id for that user that is used by the RC server,
permissions denotes the set of operations that the user is allowed to perform,
authtype specifies the kind of authentication that can be used, and authbits
contains any additional information needed to verify the user's authentication
credentials. Available permissions include urn, which confers
the ability to perform update_name operations, lifn,
which confers the ability to perform update_lifn operations, and
debug,
which confers the ability to perform set_debug operations.
The authority file contains information
about redirects. Each line of the file is of the form:
uri-prefix:server-list
where server-list is a comma-separated list of addresses of RC servers
to which future queries should be redirected.
5.1.2. Dealing with duplicated requests
It is possible for an authenticated request
to be duplicated by the IP network. Similarly, if a response is dropped
by the network, the client will retransmit the request. It is therefore
important to ensure that duplicated update requests will not result in
multiple updates to a record. Each record in the database therefore stores
the serial number of the last request from each writer, along with the
result. Duplicate requests with the same serial number do not change the
record, and the same result is returned as for the original request. Duplicate
requests with older serial numbers are ignored; the presumption being that
the client will not issue a new request with a new serial number until
the previous request has been acknowledged.
5.2. Client library implementation
The client library consists of around 2500
lines of C code, exclusive of code generated by rpcgen, common code shared
between client and server, DNS resolver libraries, and SONAR [14] client
code. Much of the client library is automatically generated by rpcgen.
The additional functionality in our client library provides authentication
which is not included in ONC RPC, determines the host address to contact
given a URI, directs calls to a well-known port (instead of using portmapper),
interfaces to RC server proxies, and interfaces to SONAR for proximity
estimation. There are also a few ``helper functions'' to assist callers
in building the necessary data structures to pass to query_name
and query_lifn.
5.3. Example programs
There are several small C programs which
serve as examples of how to use the client library. They also provide command-line
interfaces to the RC server which can be used from shell scripts.
6. Results
We have used the Resource Catalog in several
projects:
-
RCDS. The Resource Cataloging and Distribution System (RCDS) is
a system for large-scale replication of world wide web content. RCDS uses
the Resource Catalog to keep track of locations of web content, along with
metadata for the web pages. In particular, the metadata is used to allow
content-providers to digitally sign their web pages, by including a cryptographic
hash function such as MD5 [15] of the web page as one of the assertions
describing that web page, and digitally signing that assertion using RCDS.
-
Program Builder. The program builder is a tool which assembles computer
programs out of their source code components. It can download a description
of the program describing the components needed to build a program and
instructions for building it, retrieve the necessary components from various
locations, verify the authenticity and integrity of each, configure them
according to the target environment, compile them, and optionally install
them. The program builder uses RC to catalog the various packages, and
to digitally sign their content to allow for later verification.
-
Netbuild. Similar to the Program Builder, netbuild is a link-editor
which allows seamless access to remote repositories of pre-compiled subroutine
libraries, such as the mathematical libraries in the Netlib repository
[16]. Netbuild uses RC to catalog the subroutine libraries, to identify
which versions of the libraries are appropriate for the target platform,
and to digitally sign the libraries and their metadata to allow netbuild
to verify the authenticity and integrity of the libraries before they are
linked in.
-
SNIPE. The Scalable Networked Information Processing Environment
(SNIPE) is a large-scale, fault-tolerant, distributed computing environment
layered on top of the Resource Catalog. It uses the Resource Catalog to
store metadata about computing resources, processes, resource managers,
multicast groups, programs, data files, and checkpointed programs. The
SNIPE environment utilizes RC's fault tolerance to allow it to survive
multiple host failures. Processes can migrate from one host to another
for load balancing or to avoid host failures; their communications peers
will automatically re-establish communications at the new location. As
in other projects, signed RC metadata is used to verify authenticity and
integrity of programs, data files, and checkpoint files. RC metadata is
also used to list communications ports by which a process can be reached,
thus allowing a peer to choose the best available path when establishing
a connection.
In general, we have found it easy to interface
to the resource catalog. Many of our programs were simple shell or perl
scripts which used the command-line interfaces; these have been adequate
to establish a testbed consisting of tens of thousands of files replicated
across several servers around the United States. The easy extensibility
of the RC metadata has allowed us to add new functionality on short notice,
and to adapt RC to a wide range of unanticipated purposes.
6.1. RPC problems
There were a number of problems with RPC.
The C language structures generated by rpcgen were unwieldly and made the
code difficult to read. The library itself was inefficient and did not
easily adapt to our authentication or use of a well-known port. Although
RPC tools are widely available on UNIX platforms, they are not as widely
available on others, making it difficult for us to port our code to those
platforms, and also making it difficult for others to make use of our source
code. Finally, our code was complicated by the need to respect the length
limitations of RPC responses.
6.2. Authentication problems
Our use of keyed-MD5 authentication was
complicated by the (unanticipated) need to support simultaneous requests
from the same user from several different hosts. The original mechanism
for dealing with duplicate requests (and thwarting replay attacks) saved
the last result from any request by any user. But when the same user was
issuing requests from several different hosts, many of those requests would
be ignored because they were ``too old''. This was because the sequence
numbers were derived from the hosts' clocks, which were imperfectly synchronization.
The RC server was changed to save the result of the last request on a per-user,
per-record basis, in the actual record. This solved the problem at the
expense of slowing down processing of duplicated requests. An alternative
would have been to keep track of all responses for requests issued within
the past several seconds; this would have consumed a lot of memory and
slowed down processing of all requests.
6.3. Limitations of the protocol
The integer asserter field was quickly found
to be inadequate. It should have been a string, which would have allowed
globally-scoped user names.
The data structures used for URN lookup
turned out to be sufficiently flexible that they could also be used for
LIFN lookup. Collapsing the two into a single function would make the code
simpler and more flexible. Similarly, the "look up LIFNs as a side effect",
while it did increase performance tremendously, is both a kludge and a
lot less flexible than we'd like. The next version of the protocol is planned
to have a new option for assertion queries, that would mean "use the value
of this assertion as key for a side-effect lookup". This would allow more
general handling of side-effect lookups than just for LIFNs. For instance,
when the value of an assertion contained a URL, it could be used to look
up the IP addresses corresponding the URL's domain name.
Security based on shared secrets doesn't
scale very well, especially when the secrets are stored in the clear on
the server. We need to add a public key system. Though not essential, it
would be useful if the user authentication information were also stored
as RCDS metadata.
RCDS assumes that the metadata are separated
into individual, named, unordered fields, which can then be selected (or
updated) individually or in groups with a common attribute name prefix.
Other systems require the metadata to be in some particular format such
as XML. RCDS was not designed to preserve the original order of assertions,
nor does it represent hierarchical relationships between assertions. Even
though it would be difficult to define an ``original order'' of multiple
assertions submitted by different parties, at different times, to different
servers, some systems depend on order or hierarchy to be preserved. RCDS
can preserve the order of assertions that are signed by a certificate (and
the certificate can use the ``NULL'' signature algorithm), but does not
have a way to preserve hierarchy except by attribute naming conventions.
6.4. Performance
6.5. Database problems
We have had a number of problems with finding
a suitably robust indexed file package for the back-end of the RC server.
GDBM exibited memory leaks, while Berkeley DB would eventually trash its
files, requiring them to be rebuilt. Other packages imposed unacceptable
limits on the length of a record. In order to test the other functions
of RC and to have a stable base for experimentation with RC, we wrote an
indexed file package which uses the UNIX file system to store each record
in a separate file, with the filename derived from an MD5 hash of the key.
This did work reliably (if slowly) for small numbers of records, though
for large numbers of records it could exhaust the inodes on a file system.
6.6. Configuration Issues
By far the greatest difficulties we had
was in initial configuration of new clients and servers. The Resource Catalog
server requires two configuration files: one to list the portions of URI-space
that the server has authority over, and another to list privileged users,
their permissions, and their authentication credentials. These files have
a traditional UNIX format of unlabelled fields separated by ":" characters.
The meanings of the various fields are not clear and users tended to guess
wrong. The various client programs read their authentication credentials
from yet another kind of file, which must have the same shared secret as
the one on the server. The client programs require that certain environment
variables be set, so that they will know which server to talk to when updating,
and which credentials to use. Finally, the shared secrets used by the clients
and servers were random bit strings encoded as sequences of short English
words; these were generated by a special program. All of these together
imposed a significant barrier to use and adoption of the software. We are
working on maintenance tools to make it easier to set up new users, maintain
their permissions, and distribute their credentials to them.
A related difficulty was detecting and
diagnosing setup problems. In an attempt to address these problems we have
added
a great deal of logging to the server.
6.7. Firewall and NAT Issues
A number of Resource Catalog clients have
needed to communicate with Resource Catalog servers on opposite sides of
a firewall or Network Address Translator (NAT). While RC is designed to
be compatible with proxies, commonly deployed firewalls do not support
RC, and many firewalls are configured to block transit of UDP packets for
unknown ports, and establishment of TCP connections to unknown ports.
NAT boxes cause even more problems. Because
the view of IP address space on one side of a NAT is different from that
of the other side, the IP addresses communicated in the RC protocol (returned
in a LIFN-to-URL lookup) may not be valid when viewed from the client.
Normal DNS lookups are handled by a NAT box which translates remote addresses
into local address space. However, with RC, the DNS lookup may be done
by an RC server on the opposite side of a NAT from the RC client. If RC
did not carry IP addresses, the client would need to look up the IP address
of the URL for each location of a replicated file before passing them to.
One way to solve the problem would be to have the NAT box translate IP
addresss in RC responses also, but this would still invalidate DNS signatures.
These deployment barriers are faced by
any new protocol, espeically one which carries IP addresses.
7. References
| [1] |
P. V. Mockapetris. Domain names: Concepts and facilities. RFC
882, November, 1983. ftp://ftp.rfc-editor.org/in-notes/rfc882.txt |
| [2] |
Sun Microsystems. NIS+ and FNS Administration Guide. 1995. |
| [3] |
Keith Moore, Shirley Browne, Jason Cox, and Jonathan Gettler. Resource
Cataloging and Distribution System. University of Tennessee Computer
Science Department Technical Report UT-CS-97-346, January 1997. http://www.cs.utk.edu/~library/TechReports/1997/ut-cs-97-346.ps.Z |
| [4] |
Graham E. Fagg, Keith Moore, Al Geist, Jack Dongarra. ``Scalable Network
Information Processing Environment (SNIPE)''. in Proceedings, Supercomputing
'97 (San Jose, CA), November 1997. http://www.supercomp.org/sc97/proceedings/TECH/MOORE/INDEX.HTM |
| [5] |
Tim Berners-Lee, Larry Masinter, Mark McCahill. Uniform Resource
Locators (URL). RFC 1738, December 1994. ftp://ftp.rfc-editor.org/in-notes/rfc1738.txt |
| [6] |
Karen Sollins. Architectural Principles of Uniform Resource Name
Resolution. RFC 2276, January 1998. ftp://ftp.rfc-editor.org/in-notes/rfc2276.txt |
| [7] |
Raj Srinivasan. RPC: Remote Procedure Call Protocol Specification
Version 2. RFC 1831, August 1995. ftp://ftp.rfc-editor.org/in-notes/rfc1831.txt |
| [8] |
Perry Metzger and William Simpson. IP Authentication using Keyed
MD5. RFC 1828, August 1995. ftp://ftp.rfc-editor.org/in-notes/rfc1828.txt |
| [9] |
Neil Haller. The S/KEY One-Time Password System. RFC 1760, February
1995. ftp://ftp.rfc-editor.org/in-notes/rfc1760.txt |
| [10] |
Ron Daniel, Michael Mealling. Resolution of Uniform Resource Identifiers
using the Domain Name System. RFC 2168, June 1997. ftp://ftp.rfc-editor.org/in-notes/rfc2168.txt |
| [11] |
Arnt Gulbrandsen, Paul Vixie. A DNS RR for specifying the location
of services (DNS SRV). RFC 2052, October 1996. ftp://ftp.rfc-editor.org/in-notes/rfc2052.txt |
| [12] |
Derek Atkins, William Stallings, and Philip Zimmermann. PGP Message
Exchange Formats. RFC 1991, August 1996. ftp://ftp.rfc-editor.org/in-notes/rfc1991.txt |
| [13] |
CCITT. The Directory - Authentication Framework. Recommendation
X.509, 1988. |
| [14] |
Keith Moore. SONAR - A Network Proximity Service. Internet-Draft
draft-moore-sonar-02.txt (work in progress). |
| [15] |
Ronald L. Rivest. The MD5 Message-Digest Algorithm. RFC 1321,
April 1992. ftp://ftp.rfc-editor.org/in-notes/rfc1321.txt |
| [16] |
Shirley Browne, Jack Dongarra, Stan Green, Eric Grosse, Keith Moore,
Tom Rowan, and Reed Wade. Netlib Services and Resources (Revised).
University of Tennessee Computer Science Technical Report ut-cs-94-222,
August 1994. http://www.cs.utk.edu/~library/TechReports/1994/ut-cs-94-222.ps.Z |