WWW Administration - Universitätsbibliothek Bielefeld

Document Ordering and Delivery Systems in Europe: Projects of the European Commission, services, conditions and prices


Bill Tuck

Abstract

Electronic document delivery has been the subject of several projects funded by the CEC under the European Libraries Programme, a number of which have recently been completed. The paper compares the different approaches taken by these projects and outlines some of the findings. File transfer, fax or email, for example, may all be used as the method of delivery, and an interesting debate is taking place as to the relative advantages of each. In addition, there are many other factors that will impinge on future document delivery systems, such as document request procedures, document formats, pricing structures and copyright control procedures. These are discussed in relationship to existing CEC projects and other plans for future work.

1. Introduction

Electronic document delivery in this context is taken to mean the transfer of journal articles in the form of digitised page images, either from a databank of stored articles or by scanning on demand. The more general case, where documents are originated in electronic form and delivered -- for example, as SGML (or HTML) or PDF encoded files -- is not considered here but is taken to be included in the domain of electronic journals and electronic publishing. The point at which one merges into the other is, of course, open to debate.
Within this domain, there are six projects currently funded by the CEC under the Libraries Programme. These are EDIL, EURILIA, DECOMATE, DALI, FASTDOC and AIDA. Detailed descriptions of the individual projects are available in the published literature. The intention here is to summarise the main developments under a number of specific headings, rather than by project, and to try to analyse them in a global fashion. The technical problems have now largely been solved. What remain are the more general problems of economics, service conditions, pricing, reliability and so on. The following paper attempts to analyse what we have learnt from these projects in these areas and what questions still need answers.

2. Delivery mechanisms

There are a variety of possible delivery mechanisms for transferring documents, including fax, email or direct file transfer. In general, the route from document server to end user is likely to require a sequence of different steps, from email to fax or file transfer, say, or from file transfer to email and print. Only in the case of a direct transfer to the user's workstation (via a WWW interface, or suchlike) might the delivery be completed in one step. For a number of reasons this option is not included in the present range of projects, but is known to be under active investigation by several groups.
The EDIL project began with a commitment to use FTAM, the OSI file transfer protocol, as the primary means of delivering documents between a network of national relays. For pragmatic reasons, however, the initial implementation was completed with FTP, the Internet file transfer protocol. FTAM has been tested between some sites, but software availability and the difficulty of configuring over different hardware has limited this development.
The method chosen for document transfer from the EDIL relay to the national system and then to the client library, however, was something that each participant (including INIST, PICA, BLDSC and UB-TIB) could determine for itself according to its own national policy. Thus PICA, INIST and UB-TIB adopted FTP (to a workstation in the client library), while BLDSC used X.400 (again to the client library).
Whether email or file transfer is the better way to transfer document images (which are inherently large, multi-megabyte files) is still a question of debate. It is hoped that one of the outcomes of EDIL will be a better understanding of this issue, together with some sound statistical evidence on questions such as throughput and reliability. In both cases, the underlying carrier is the Internet, using the so-called RFC1006 protocol interface in the case of FTAM and X.400.
In contrast to EDIL, FASTDOC is based on conventional Group3 fax over the telephone network. This technology uses conventional fax modems directly linked to a standard networked PC to create a fax server. A number of such servers, each with multiple fax output channels, can be linked onto a LAN, along with the document storage unit and system management facility. This is a well-tried and robust technology which has been found to operate very efficiently for document delivery. Print quality on modern plain-paper fax machines is very high. The only apparent disadvantage is the relatively high cost of telecommunications for cross-border deliveries.
Both EURILIA and DECOMATE have their origins in the Mercury project that began life at Carnegie Melon University. As part of this work, a specially designed file transfer protocol, based on FTP, was developed to optimise the document delivery link between server and client. In the case of EURILIA, the original intention had been to use the international telephone links (including ISDN) with Group3 and Group4 fax. With the rapid emergence of Internet services in Europe, this has been augmented to include a standard TCP/IP connection. Documents may be retrieved for on-screen display using browsing software coupled to the file transfer protocol as document carrier. DECOMATE, in this respect, is similar. Because of the large file sizes and relatively low bandwidths available on many Internet links, however, this can be inconveniently slow for reading large documents from a remote server. For this reason, as well as to avoid the copyright complications associated with full electronic delivery to the user's workstation, EURILIA adopts the strategy of supplying hard copy of the complete document (or of selected parts) by conventional fax. In contrast with all the above examples, AIDA is not based on electronic delivery, but assumes that either conventional post or fax would be used. Its concern is with the detailed procedures of document ordering and request management. Since it was intended also to cover the interlibrary lending of monographs, its focus is not on delivery itself, but on the controlling procedures. Finally, the DALI project recognises the inevitability that ultimately it will be the users that decide how they will want to receive their documents and that many different modes of transport will be required, partly depending on the nature of the material. By handling the control procedures for email (X.400 or MIME/SMTP), fax, file transfer or post, all in the same way, much of the complexity of having to deal with many different carrier technologies can be reduced.
From all these experiments, what can be said about the 'best' way to transfer the relatively large files involved in document delivery? FTP works, but is not totally reliable. X.400 email is very reliable, but the setup costs are high and relatively few client organisations have implemented an X.400 service. Fax is relatively cheap and universally available, but print quality may leave something to be desired and costs for international connection can be high (compared to Internet).
MIME/SMTP email has been suggested as the most favorable compromise for document transfer. Unlike X.400, the setup costs are low and it is already widely implemented. Software is now very reliable and there is little reason to doubt that it could provide the basis for an effective document delivery service. Unlike fax it is not constrained to relatively low resolution images. A number of experimental services based on MIME are currently at the planning stage (some within the UK FIGIT programme and others elsewhere).

3. Output and end-user access

Delivery to the client library is one issue, but final delivery to the end user is another. A range of strategies have been adopted here as well. Documents (at least of the kind addressed here) generally begin and end on paper. The question of document printing, however, tends to get neglected, partly on the assumption that it is up to the user (or client library) to manage that function. Both EDIL and FASTDOC are essentially paper-to-paper (or at least paper output) systems, and therefore depend on providing a paper copy to the end user. Within EDIL, two approaches are available. The first assumes that the client library will print out the copy, then deliver it by internal mail (or flag it for collection) to the user. Many sites (such as those linked to PICA's RAPDOC service, and the UB-TIB Hannover service) are using the Ariel software developed by RLG in the US. This generally means that the document is printed in the client library and then posted to the end user.
In the BLDSC part of the experiment, the final transfer from library to customer is by FTP to network printers (capable of accepting compressed image files) anywhere on the campus network, giving 'near to desktop' hard copy output. With FASTDOC, any local fax machine could, in principle, be used for the same purpose. In practice, the two participating university libraries (Patras and Barcelona) received the fax output, for onward delivery as hard copy to the customer.
The French user group for EDIL, coordinated by the Ministry of Education (MENESR) and consisting of 5 large universities (one of which also acted as a document supplier, along with INIST) also adopted the policy of electronic transfer by FTP to the library, for local printing and onward delivery as hard copy to the user by internal mail.
At all sites, therefore, the EDIL experiments adopted a policy of 'print and delete', partly for practical reasons, but also because it was less liable to create concern about copyright violations. Similarly, both FASTDOC and EURILIA avoid some of this problem by using fax as the primary delivery channel for full documents. Document delivery to the end user's workstation is, of course, everybody's preferred solution (or at least claimed to be). The problem is that for standard journal articles it is difficult to get publishers to agree to this, even for the purposes of experiment, as they fear the uncontrolled replication of digital copies. Until better safeguards can be provided this may be a difficult constraint to satisfy, at least for conventional document delivery services.
In contrast, however, a number of publishers have begun to offer their publications directly, in the form of SGML, PDF or PostScript files held on a Web server and accessable using standard Web browsers, or specialised SGML browsers (as in the case of OCLC). This is, of course, a form of electronic publication, rather than document delivery as we know it. But clearly such services are going to have a major impact on the latter. Although current policy is to supply serials titles to subscribers on a site or individually licensed basis, it is only a short step to offering a 'document delivery' service in which a single article is supplied, for a fee, on request. The technology is virtually the same in both cases. It is at this point that electronic publishing could have a serious impact on traditional library services.

4. Document ordering

The Inter-Library Loan (ILL) protocol (ISO 10160/1) was originally conceived as a set of formal procedures for the management of interlibrary loans (books, as well as journal articles). Among its first implementions was the ION project (again carried out with CEC funding). In the case of journal articles, which are essentially non-returnable items, the procedures can be simplified. Several of the current document delivery projects have therefore adopted elements of ILL as part of the document request and control procedures.
EDIL (which includes several of the original ION partners) uses the ILL request message format to carry document requests from the customer (or client library) to the supplier. As an EDIFACT message this can be delivered by either SMTP or X.400 email. In the same way, the ILL 'SHIPPED' and 'UNFILLED' messages are used as basic control procedures.
Linking ILL to national document request systems is not quite so straight-forward, as few have themselves adopted ILL for their internal use. In the case of BLDSC, a 'shell' has been created around the native ART system that allows ILL request messages from EDIL partners to be translated into ART format (and vice versa). These are carried via email (either SMTP or X.400). Similar mechanisms enable the status messages of ART to be converted to their ILL equivalents.
Document ordering in FASTDOC is carried out through the STN system run by FIZ. A user will typically carry out a search on the STN host. This now provides a facility for users to request source documents from a number of suppliers. Those requests destined for Beilstein (and FASTDOC) are bundled for delivery (by X.400 email) to the FASTDOC order processing system. An alternative method enables a requestor to create the document order in standard format on a PC and deliver it directly to FASTDOC via remote login access over a modem link. In neither case is the ILL message format at present used.
Document request in EURILIA and DECOMATE is by direct access to the document database, across the client/server link. There is no need, in this case, for a complex ILL messaging structure, as the necessary controls are built into the basic client/server system. To obtain material from other document databases, however, the Item Order procedures of SR/Z39.50 (V3) are in process of being implemented. This provides a mechanism for directly ordering a document from a Z39.50-based search.
Although Z39.50 Item Order is a powerful new tool for document requesting, it does not by any means cover all cases. Many requests are still based on customer-provided information rather than on an explicit search. In this case the ILL mechanisms are more appropriate. Recent work by the National Library of Canada has helped clarify these essentially different roles of the two protocols [1].
Document ordering through a Web interface is, of course, the logical way to allow end-users to place requests directly, by using the 'Forms' facility. Behind the scenes, the order could then be carried either as an ILL request or through Z39.50. Many such implementations, however, will use a proprietary or local protocol. This is the case with the British Library's 'Discovery' service, for example, in which a Web client provides access to a search and order facility based on the 'Inside Information' database of article records.

5. Storage of articles

A number of studies have shown that the utilisation rate of journal material is, in general, very low. If 80% of requests can be filled from 20% of the serials titles then it may make sense to store electronically just those 20% of high-use titles. But even then only 20% of the actual articles may be used, and of those requested, less than 20% may be re-requested. This high level of redundancy indicates that large-scale electronic storage may not yet be economic for major document supply collections, though with falling technology costs and rising labour costs, this point may be approaching quite rapidly.
When considering the feasibility of electronic storage, questions of scale are important. Major supply libraries, such as INIST or BLDSC, operate from collections of 30 - 50,000 current titles. Even the core collection represented by BLDSC's Inside Information runs to 10,000 titles. FASTDOC and DECOMATE, in contrast, are based on collections of less than 200 titles, while EURILIA uses just 200 individual thesis documents.
Within the EDIL project, therefore, it was assumed that scan-on-demand would generally be used (unless it is for material from a part of the collection that is already stored digitally). No direct link to an electronic archive is assumed and, once scanned, articles are not held online for possible later re-use. The main effect is on time for delivery. EDIL works on the basis of delivery within 12 hours. FASTDOC, where everything is stored electronically, can deliver within 5 minutes.
FASTDOC is based on a very large document store of over 100 serials titles and over 10 years holdings. These are all held (amounting to several Terabytes of data) on a large optical jukebox system. This database has been built up over a number of years and formal procedures for digital scanning of the paper versions of journals have been set up to handle the high volumes involved. Careful quality control is, of course, essential during this stage.
The justification for electronic storage, in the case of FASTDOC, is that the material is very heavily used by the Beilstein Institute as part of its day-to-day work on chemical information services. This means that document delivery can be carried out at marginal cost, making the economics look very much better than it would for major supply libraries such as INIST or BLDSC, where no such intensive use is possible.
A middle way is suggested by EURILIA and DECOMATE. By choosing a very narrow range of material, for which a relatively high demand could be anticipated, it was sensible to consider the 'total storage' model. In EURILIA's case this was a collection of post-graduate theses in aeronautics, while DECOMATE has small collections of journals (each of about 25 titles) in particular subject areas from a major publisher.
The DALI system is intended to operate in a very 'open' way to provide access to a large range of different data types generated from the very diverse holdings of several marine research organisations. Its focus, therefore, is on the problem of managing the complexity that this entails.
In the longer term, however, as publishers (or their agents) move towards electronic supply of subscribed-for journals, the documents will be stored at source. This alters the economics of storage considerably since, in the limit, only one copy of each title need ever be retained -- though titles for which there is heavy demand may need to be mirrored at many different sites (including those of large subscribers), much as Web publishing operates now. By avoiding the costs of scanning and indexing, it should be possible to reduce dramatically the cost of digital storage, although it may well increase total production costs.

6. Standards issues

6.1 Document format standards

Whether stored online or not, standards for the format of documents will be important. In the case of EDIL, documents are transmitted in GEDI format. This is a (non-ISO) standard established by the Group on Electronic Document Interchange [2]. It specifies the format of a 'header' record containing all the essential information about the request (customer codes, delivery address, etc.) along with the bibliographic details of the document (serial title, article title, author, etc.). The document itself is encoded as a multipage TIFF image file using CCITT Group4 compression and a scanning resolution of 300 dpi. The two elements (header plus document) can be combined into one file and transmitted via a file transfer protocol, or else packaged as separate body parts within an email message (X.400, or MIME) for delivery to the customer site.
FASTDOC uses Group3 fax, for which the resolution is only 200 dpi. But the original scanning and storage of documents is at the much higher resolution of 400 dpi. This means that a conversion must take place before transmission (though in this case scaling down from 400 to 200 dpi is relatively trivial). The storage formats are determined by the FileNet software used to maintain the basic electronic archive. In such systems, the bibliographic data will generally be held separately from the document image (usually on fast access magnetic storage), to which it will provide a pointer (in the form of an address to a location in slower optical store). A similar approach is used in both DECOMATE and EURILIA, where the database technology (known as KWIK) derives from the original Philips Megadoc system, later acquired and adopted by DEC.
For document page images, the de facto standard is TIFF (Tagged Image File Format). This defines the format of an image file by providing a set of well-defined tags whose values indicate essential information such as page size, resolution (dots per inch, dots per line, lines per page), colour encoding information, type of compression (CCITT, JPEG), etc. The inclusion of colour and gray scale information makes it suitable for many types of data other than text images. EDIL, FASTDOC and EURILIA all use TIFF (though not necessarily exclusively) as a file encoding standard for their documents.
Some publishers, or consortia of publishers (such as ADONIS) have also adopted Gp4 fax or TIFF as basic document format. The European Patents Office, for example, continues to supply its material in what is basically a fax encoded format. Increasingly, however, they are moving towards the adoption of Adobe's PDF (Portable Document Format) as the most convenient method of document encoding for electronic publishing. PDF is very much more compact than TIFF (often by a factor of 10). Files are fully searchable, and the availability of PDF browsers makes client access very easy. It is most usually generated from the original PostScript files, which of course restricts it to publishers or their agents. One option that has been tested, however, is the ability of Adobe's 'Capture' product to convert page images into a form of PDF. To reduce the size, this can also automatically OCR the file to provide a machine readable text. While there is no technical difficulty in a supply library using this software as part of an electronic document delivery service, it almost certainly contravenes copyright.

6.2 Document storage standards

Standards for document databases are less clearcut. The usual approach is to provide a system for managing the bibliographic records independently of the document images. The record will then contain a pointer to the location of the image file. Most systems adopted by the current projects are based on proprietary schemes (FileNet, KWIK, etc.).
This is unlikely to change with the move towards electronic publishing. Web server software may provide publishers with a ready-made solution to the database management problem, and may lead to a de facto standard. On the other hand, unlike with client software, there is no strong reason for publishers to adopt any particular standard rather than choose the one that appears to offer the best value for money or that best matches the facilities required.

6.3 Document ordering standards

This area has largely been covered in the previous section on document request. The important protocols are SR/Z39.50, for which an Item Order element is included in Version 3, and the more general ILL protocol. Much of the current effort in document delivery systems is concerned with the development of interfaces to these standards. This should provide a mechanism through which a degree of interoperability can be achieved. It is unlikely, on the other hand, that the proprietary systems already in place will ever be entirely replaced by 'native mode' ILL for the simple reason that the rich features of local systems may be difficult to match within the standard framework.
ILL is generally implemented as a set of EDIFACT messages and procedures. This ties it in very conveniently with the rich resources of the EDI domain. As more of the commercial world adopts EDI as the standard method for business data interchange, the cheaper related services become, such as email and database management software. This, in turn should reduce the cost of implementing EDI (and EDIFACT) within the library services domain, not only for procedures such as book ordering, but also for ILL.

7. Economic factors and copyright

The economic modelling of document delivery services is a basic requirement for, ultimately, it will only make sense if the economics are viable, or if the perceived benefits can justify the costs. Full scale economic studies have, up until now, been fairly rudimentary and difficult to carry out on what are relatively low use, experimental systems. Useful quantitative data is beginning to come out of FASTDOC, however, and some interesting general analyses for the particular case of aerospace information have been carried out by EURILIA.
There are two primary issues in the economics of document delivery: what does it cost to run such a service, and what level of royalties will publishers demand? It should not be difficult to establish the former. Nevertheless, few electronic delivery sevices have been running for sufficiently long (and on a sufficiently realistic scale) for true costs to be measured. The current group of projects should improve our information in this area. As to royalty charges, no consensus has yet emerged as to the balance and scale of subscription and royalty fees. Negotiations with publishers are currently underway within the UK FIGIT programme, and an important national licensing agreement with a group of major academic publishers has been made. The general principle is to achieve a fair return to compensate for loss of revenue on subscriptions for the printed journals. It is not easy, however, to see how document delivery services fit within this model. It is clear, nevertheless, that too high a royalty charge will force users to adopt different methods, resorting once again to the photocopier (with consequent loss of revenue to the publishers).
The question of copyright is, of course, quite fundamental to the viability of this kind of service. Publishers are understandably reluctant to allow totally open access to what they regard as their own material, even if that ownership should rightfully belong to the author or their affiliated organisation (as some universities are beginning to argue). This is not an issue that is likely to be resolved within the current round of document delivery experiments.
In spite of the obvious difficulties, some progress is being made. Beilstein (FASTDOC), after protracted negotiations, has been able to forge agreements with around 50% of the publishers whose serials it takes. The document supply partners of EDIL (BLDSC, INIST, TIB and the associated university libraries) have a tacit understanding that for the purposes of experiment a limited degree of electronic delivery will be tolerated. Both operate under the rules of 'fair dealing' in supplying to the academic world, which strictly limits the use to which articles may be put (eg. multiple copying is forbidden). For this reason, the policy of delivering only print-on-paper output goes some way towards allaying publishers' fears that delivery of the electronic version to the end user could result in abuse of the system.
In the longer term, some form of licensing and payment must be implemented. To this end, all projects within the current scheme are involved in the designing and building of suitable accounting and billing machinery. FASTDOC already has a sophisticated charging mechanism in place, while EURILIA and DECOMATE have this as one of their central design features. At the moment, EURILIA has restricted itself to non-copyright material, while DECOMATE has a major publisher (Elsevier) as a partner, in order to test out appropriate security and control mechanisms in this area. AIDA does not involve electronic copying directly, so has had no need to consider these factors.
One general problem in this area is that of European harmonisation of the rules of copyright. This has been the subject of considerable debate within the EU and a number of studies have been undertaken. Despite current concern, there is every reason to believe that solutions to this difficult problem will emerge. The move towards extensive site licensing agreements between publishers and the academic world is perhaps the most logical way forward as it has the potential, unlike the pay-per-view approach, of leaving the revenue base of academic publishing substantially unimpaired.

8. Summary

The projects outlined here are at varying stages of completion. EDIL, FASTDOC and EURILIA have been running the longest and are rapidly coming to a close. Their peliminary results appear promising. The systems all work, in the technical sense, and have served well as 'proof of concept' demonstrators. DALI and AIDA have completed the design phase and are beginning implementation. Deployment to user sites will, over the next year or so, produce interesting results in their areas of handling complex multimedia material and full ILL services, respectively. DECOMATE is the most recent of the experiments and is still very much in its early stages of development. Its potential to elucidate some of the difficult publisher/library/user relationships will be followed with interest.
For publishers, however, the big issue is not so much how to deal with supply libraries and electronic document delivery, but whether and when to begin electronic distribution of the primary material, the serials themselves. Many are planning such services and it is believed that 1996 will see a considerable number of titles available via such media as World Wide Web. In the short term this is unlikely to have any great impact on conventional document delivery services. In the longer term, however, it could alter the balance between libraries, publishers and authors in unpredictable ways.

References

[1] Fay Turner, "Document Ordering Standards: The ILL Protocol and Z39.50 Item Order", National Library of Canada, January 1995
[2] GEDI (Group on Electronic Dcument Interchange), "Specification of the GEDI standard for document interchange", available from GEDI Secretariat, Pica, Leiden.

Sekretariat der Bibliothek der Universitšt Bielefeld