4. Europäisches Bielefeld Kolloquium - Bibliotheken und Verlage als Träger der Infomationsgesellschaft

THE EUROPEAN BIBLINK PROJECT


Ross Bourne

BIBLINK Project Co-ordinator

The British Library

There can be few European research libraries that have neither participated in DG XIII projects nor been at least tempted by the funding that has been available through the European Commission's Telematics Applications Programme. This is not the time or place to describe the impact which the TAP has made on the ways in which our libraries function, and perhaps that impact will only become apparent to the library historian; but whatever history has to say on the subject I can easily believe that our modi operandi have been irretrievably changed by the discipline of having to observe deadlines for deliverables, workpackages, progress reports, management reports, cost statements, peer reviews, and all the other requirements we did not realise we were letting ourselves in for. Again, history will have to judge whether, regardless of any technological innovations, those bureaucratic disciplines have actually improved the services we provide for our users. I should like to think that they have.

BIBLINK stands for "Linking Publishers and national Bibliographic Services” and the BIBLINK project commenced formally in May 1996. Its origins lie in a DG XIII sponsored "concerted action”, which is Commission language for an activity that encourages particular sectors to discuss issues that might result in an activity such as a project or a feasibility study. CoBRA, or Computerised Bibliographic Record Actions, was - and indeed is - a concerted action bringing together European national libraries for just that purpose; amongst other things, it has initiated projects on record file labelling, character set standardisation and the networking of national name authority files. But the real impetus for BIBLINK is twofold:

· the belief that economies can be made by developing better relationships between publishers and libraries and re-using publishers' bibliographic information in library catalogues and databases;

· the growth of electronic publishing, requiring libraries to exercise bibliographic control over documents that while sharing many of the characteristics of conventional print publications, have some features which are not at present covered in traditional cataloguing codes.

These two factors have resulted in a project whose aim is to produce a prototype system for the exchange of metadata between publishers of electronic documents and national bibliographic agencies. In short, the system should enable a publisher to transmit bibliographic data to a national bibliographic agency or NBA, the NBA to adapt the data to its own requirements and then to return it to the publisher, duly enhanced. The advantage for the NBA will be that it is receiving timely and authoritative bibliographic information in advance of publication; the advantage for the publisher will be that its publications will be recorded in the national bibliography as soon as possible after publication - if not before - thus improving its accessibility, and that any metadata contained in the publication will have been verified by the NBA. In order to reach the stage at which such a system can be designed, however, a great amount of research into different aspects of the area had to be carried out. I will describe that research in a little more detail later on - or rather, I will mention the decisions that were made as a result of that research - but firstly I should say something about the people and the institutions that form the BIBLINK consortium and the parameters of the project itself.

The BIBLINK consortium includes the national libraries of France, the Netherlands, Norway, Spain and the United Kingdom and two academic institutions, the Universitat Oberta de Catalunya in Barcelona and the University of Bath, in the form of UKOLN, the UK Office for Library and Information Networking. The consortium is led by the British Library and the consortium agreement is a legally binding document which empowers the British Library to act on behalf of the partners in dealing with the Commission, especially on financial matters. The experience which the partners bring to the project is considerable, especially those partners which - unlike ourselves at the British Library - actually handle electronic publications. The National Library of Norway, for example, has been collecting electronic documents for some years now, and a visitor to Mo-i-Rana, its northern outpost only just south of the Arctic Circle, cannot but be impressed by the efficiency of an operation which ensures a near-comprehensive coverage of all forms of Norwegian material. The Royal Library in the Netherlands, on the other hand, operates voluntary rather than legal deposit, an outcome of which is that they enjoy much closer relationships with their publishers than any of the rest of us. UKOLN at the University of Bath is unusual within the consortium in that they are not a library per se: they are effectively a research centre which under various names has been at the forefront of many bibliographic developments for over 20 years. It is UKOLN which hosts the BIBLINK Web page, which can be visited at the following address, http://www.ukoln.ac.uk/metadata/BIBLINK/.

BIBLINK is a three year project, divided into two stages of equal length. Stage 1, the research stage, has just concluded; a revised technical annex for Stage 2, which will see the development of the prototype system, has just been agreed. The project has now just under eighteen months to run. Like all Commission-funded projects, partners receive 50% funding, which means that they must find the other 50% from within their own resources; but during the actual demonstration stage the Commission will reimburse only a third rather than a half. As project leader, the British Library has heavy obligations, but we are fortunate in having yielded day-to-day project management to an outside company, Level-7, and this has enabled us to concentrate better on what we at the British Library want from the project as well as our overall responsibility for ensuring that the project remains on track..

The first stage of the project has been about researching the field and obtaining consensus for its conclusions. I will say something about that research in a moment, but the consensus part of the project has been an important component, because in Stage 2 publishers have been enlisted to enable us to demonstrate that a system of creating, converting, enhancing and retransmitting metadata between the two sectors can work. Each of the partners has discussed the results of our research with publishers in each of our countries, and we are confident that those publishers support our findings.

I want now to summarise just what arose from that research. Stage 1 consisted of eight workpackages, two of which were the obtaining of publisher consensus and general consolidation of the results; the six which comprised the bulk of the research were as follows:

· scoping: what were the limits of the task?

· metadata: what metadata formats existed and which were most appropriate to the project?

· identification: as above, which numbering and other systems existed and which were most appropriate?

· format conversion: which tools were available to handle the metadata?

· transmission: how should the metadata be moved between publishers and NBAs?

· authentication: how could the relationship between the metadata and the item itself be assured?

In the Scoping workpackage the range of electronic publishers and electronic publications to be considered by the project was defined. The scope was narrowed to those publications that would traditionally be included in a national bibliography in whatever electronic medium they are published, although to all intents and purposes those media are CD-ROM and the World Wide Web. A taxonomy of publisher types by country, covering the established and newly emerging sectors, was also produced from which publishing partners would be recruited at a later point of the project; those publisher types are, in BIBLINK parlance, "traditional” or publishers with a background in conventional print publications; "new” or publishers without such a background; and "grey” or publishers such as university departments whose primary concern is not publishing and who are increasingly using the Web to disseminate scholarship.

The study of metadata formats identified formats in use or under development in the library and publishing sectors. A comparative analysis was performed on these and the content of the non-library formats compared with MARC formats. At the same time, a separate report examining three SGML metadata formats (TEI, EAD and CIMI) was commissioned and incorporated in the deliverable. National libraries' requirements in this area were also identified. It was concluded that it might be necessary to consider more than one format to accommodate the diverse body of publishers covered by the scope of the project. Dublin Core, the Book Industry Communication/Pira Simplified SGML for Serials Headers (SSSH) and BIC's forthcoming non-serial Document Type Definition (DTD) were recommended for consideration.

After initial comparison of fourteen identification schemes, seven were identified as possibly being appropriate for BIBLINK, including the emerging Digital Object Identifier (DOI) as well as the established ISBN and ISSN systems. These were analysed according to a set of criteria identified by the partners and refined by the workpackage leader. It was not felt to be appropriate to recommend a single identification scheme as none was used by all, or even a majority, of the possible publisher participants. The report recommended five schemes that would be acceptable to the project: ISSN, ISBN, SICI, DOI and URN.

The report from the format conversion workpackage examined the feasibility of producing MARC records from formats previously identified for use by BIBLINK. Working in parallel with the Consensus Building workpackage to define a minimum set of data elements, the report mapped the agreed minimum data set to Dublin Core and suggested modifications that would be required for the creation of a usable MARC record. It also mapped SSSH elements to MARC records at serial issue level only. The BIC non-serial DTD cannot, however, be mapped until its definition is complete. The report concluded that Dublin Core could be used to produce a reasonably comprehensive descriptive MARC record for so-called monographs - of which more later - but that SSSH is not a suitable format for descriptions of serials at issue level; for that purpose, either Dublin Core or the format used in the ISSN database could be used.

The transmission workpackage examined the technical and operational characteristics of the options for transferring metadata from publishers to national bibliographic agencies at two levels: firstly, the low-level transport layer for which four possible options were identified and fully analysed according to criteria specified in the technical annex and others that were identified as work proceeded; and secondly, a higher packaging level comprising such formats as EDI and the Warwick Framework (which grew out of the Dublin Core), which could allow data of interest to a variety of players to be included in the transmission at different points in time. It was concluded that BIBLINK should use Internet protocols for the transmission of data in conjunction with one of three high level formatting options. The final recommendation for the latter was dependent on the outcome of discussions with publishers.

In the BIBLINK context, authentication has been defined as ensuring a one-to-one relationship between a bibliographic record and the publication to which it refers. Two models were considered, one in which the publication is kept in a controlled environment and one in which it is not. It can be seen to encompass other matters which are live issues in the world of electronic publishing, namely version control, rights management and protection of intellectual property. The report studied other projects that are addressing these issues and examined the techniques in use or under development, but came eventually to the conclusion that the provision of a data element containing a checksum figure that had been generated by the publisher or the NBA would meet BIBLINK's requirements.

I would like to make two general conmments on progress to date. Firstly, the question of publisher consensus: BIBLINK partners have been greatly encouraged not only by the interest shown by publishers in the project but also by their promise of active support, to the extent that 23 have agreed to be involved in testing the prototype. Those 23 publishers range from major international publishers to relatively small organisations, they come from all five countries of the consortium and each of the three publisher types referred to earlier are well represented.

Secondly, what has become apparent to me during the course of the project so far is the fact that a librarian's conventional distinctions between monographs and serials is breaking down. What do you call a Web document which is not a serial in the classical sense but which is forever changing? Is there an optimal point in the life of such a document when it is appropriate for a library to acquire or download it, or can one do no more than treat the act of downloading as taking a snapshot? I wonder whether by attempting to use what are basically traditional means to handle electronic documents we are not falling into the same trap as early motor car designers, who did no more than adopt the design of the horse and carriage. Some of these questions will be answered, no doubt by another Commission sponsored project, NEDLIB, which will be looking into the whole life cycle of electronic publicaations. I look forward to NEDLIB's conclusions.

But back to the present. Very shortly a software house will be selected to create what the consortium is calling the "BIBLINK workspace” or BW. The BW will be a shared, distributed database containing records to which a variety of processes may be carried out within the confines of the workspace, such as creation, modification, conversion, and so forth. BW users may include not only publishers and NBAs but also the various standard numbering agencies to whom one or other of the two other user types may need to apply for the appropriate identification number. The records themselves must contain at least the data agreed to be mandatory within the BIBLINK Minimum Data Set (in other words, some of those elements are optional only because they are not applicable to all publications, such as frequency in the case of serials). That minimum data set, by the way, includes only eleven of the fifteen Dublin Core elements, but adds a further seven, making eighteen in total. What will essentially happen in the BW is that publisher records will enter the system (perhaps in Dublin Core format), be converted through a MARC conversion package such as USEMARCON to one of the national MARC formats, be reviewed by the NBA for authority control and subject indexing term, and then retransmitted to the publisher via a reverse conversion process, if that is what the publisher requires. Behind this brief scenario are various options and some questions still waiting to be answered: for example, will the BW interface be Web based or e-mail based? What national MARC formats can be handled, bearing in mind full conversion tables available at present as part of USEMARCON are limited to UKMARC and USMARC? Will NBAs themselves have a role in the allocation of Digital Object Identifiers if the DOI system prevails, as it may well do? Likewise, is the Dublin Core likely to become the standard for Web metadata?

The stage where BIBLINK is at present - or rather at the time of writing, which was just before Christmas - is that user requirements and functional specification documents are being drawn up. After a Call to Tender, the latter will be presented to the software house which has bid successfully to create the BIBLINK Workspace; that software house will have just a few months to produce and test the system, which should be ready for the demonstrator phase due to commence in the Autumn. In the meantime, the individual NBAs will need to ensure that their internal systems, which are of course outside the scope of the BW, are able to interact with the BW, since the demonstrator phase will be demonstrating the full chain of events, from publisher to NBA and back. The delivery of a final report will conclude the project in May 1999, and as part of that report NBAs will be required to indicate their individual plans to put the results of BIBLINK into action.

I want to round off this presentation by making one or two observations on the overall thrust of BIBLINK. Firstly, it has been an unwritten guideline that BIBLINK exists in the world that we have, rather than the world where we might like to live. While the Telematics Applications Programme is about innovation, that innovation is not intended to be at the expense of proven systems or technologies. BIBLINK has tried to make use of what is available, but to be innovative in the way in which existing systems and technologies are brought together. Secondly, it has been fortuitous to say the least that two systems designed to improve the accessibility and availability of Web publications have been under development throughout the course of the project, the Dublin Core and the DOI systems. BIBLINK partners have contributed to their development. Thirdly and finally, and this goes back to what I mentioned at the outset of this paper, the two factors which led to the BIBLINK project being proposed to and accepted by the European Commission, there is the matter of the relationship between libraries and publishers. BIBLINK's experience of that relationship is that while publishers - quite justifiably - will only help libraries if they can see advantage for themselves, the complexity and unknown future of electronic publications is such that they are more than willing to exchange experience with us. If therefore we can demonstrate through BIBLINK that solutions to those problems are best achieved through co-operation, then I believe we can be optimistic about our future relationships in other areas of joint concern as well.




ROSS BOURNE

December 1997


The author would like to thank his colleague, Robina Clayphan, for commenting on this paper during its preparation, thereby helping to make it more comprehensible.

List of abbreviations. The following is an almost complete list, omitting only those acronyms and initialisms too well known to spell out, such as CD-ROM.
BICBook Industry Communication (joint UK book trade/library organisation)
BWBIBLINK Workspace
CIMIComputer Interchange of Museum Information (specialist exchange format)
CoBRAComputerised Bibliographic Record Actions (EC sponsored Concerted Action)
DOIDigital Object Identifier (identification system for electronic documents)
DTDDocument Type Definition (SGML application)
EADEncoded Archival Description (specialist exchange format)
EDIElectronic Data Interchange (generic name for electronic trading standards)
ISBNInternational Standard Book Number
ISSNInternational Standard Serial Number
MARCMachine Readable Cataloguing (library exchange format)
NBANational Bibliographic Agency
NEDLIBNetworked European Deposit Library (EC sponsored project)
SGMLStandard Generalised Mark-up Language
SICISerial Item and Contribution Identifier (identification system for serial parts and contents)
SSSHSimplified SGML for Serials Headers (SGML application)
TAPTelematics Applications Programme
TEIText Encoding Initiative (SGML application)
UKOLNUK Office for Library and Information Networking (research office housed at the University of Bath)
URNUniform Resource Name
USEMARCONUser-controlled Generic MARC Converter (EC sponsored project)