There can be few European research libraries that have neither
participated in DG XIII projects nor been at least tempted by
the funding that has been available through the European Commission's
Telematics Applications Programme. This is not the time or place
to describe the impact which the TAP has made on the ways in which
our libraries function, and perhaps that impact will only become
apparent to the library historian; but whatever history has to
say on the subject I can easily believe that our modi operandi
have been irretrievably changed by the discipline of having
to observe deadlines for deliverables, workpackages, progress
reports, management reports, cost statements, peer reviews, and
all the other requirements we did not realise we were letting
ourselves in for. Again, history will have to judge whether,
regardless of any technological innovations, those bureaucratic
disciplines have actually improved the services we provide for
our users. I should like to think that they have.
BIBLINK stands for "Linking Publishers and national Bibliographic Services and the BIBLINK project commenced formally in May 1996. Its origins lie in a DG XIII sponsored "concerted action, which is Commission language for an activity that encourages particular sectors to discuss issues that might result in an activity such as a project or a feasibility study. CoBRA, or Computerised Bibliographic Record Actions, was - and indeed is - a concerted action bringing together European national libraries for just that purpose; amongst other things, it has initiated projects on record file labelling, character set standardisation and the networking of national name authority files. But the real impetus for BIBLINK is twofold:
· the belief that economies can be made by developing better relationships between publishers and libraries and re-using publishers' bibliographic information in library catalogues and databases;
· the growth of electronic publishing,
requiring libraries to exercise bibliographic control over documents
that while sharing many of the characteristics of conventional
print publications, have some features which are not at present
covered in traditional cataloguing codes.
These two factors have resulted in a project whose aim is to produce
a prototype system for the exchange of metadata between publishers
of electronic documents and national bibliographic agencies.
In short, the system should enable a publisher to transmit bibliographic
data to a national bibliographic agency or NBA, the NBA to adapt
the data to its own requirements and then to return it to the
publisher, duly enhanced. The advantage for the NBA will be that
it is receiving timely and authoritative bibliographic information
in advance of publication; the advantage for the publisher will
be that its publications will be recorded in the national bibliography
as soon as possible after publication - if not before - thus improving
its accessibility, and that any metadata contained in the publication
will have been verified by the NBA. In order to reach the stage
at which such a system can be designed, however, a great amount
of research into different aspects of the area had to be carried
out. I will describe that research in a little more detail later
on - or rather, I will mention the decisions that were made as
a result of that research - but firstly I should say something
about the people and the institutions that form the BIBLINK consortium
and the parameters of the project itself.
The BIBLINK consortium includes the national libraries of France,
the Netherlands, Norway, Spain and the United Kingdom and two
academic institutions, the Universitat Oberta de Catalunya in
Barcelona and the University of Bath, in the form of UKOLN, the
UK Office for Library and Information Networking. The consortium
is led by the British Library and the consortium agreement is
a legally binding document which empowers the British Library
to act on behalf of the partners in dealing with the Commission,
especially on financial matters. The experience which the partners
bring to the project is considerable, especially those partners
which - unlike ourselves at the British Library - actually handle
electronic publications. The National Library of Norway, for
example, has been collecting electronic documents for some years
now, and a visitor to Mo-i-Rana, its northern outpost only just
south of the Arctic Circle, cannot but be impressed by the efficiency
of an operation which ensures a near-comprehensive coverage of
all forms of Norwegian material. The Royal Library in the Netherlands,
on the other hand, operates voluntary rather than legal deposit,
an outcome of which is that they enjoy much closer relationships
with their publishers than any of the rest of us. UKOLN at the
University of Bath is unusual within the consortium in that they
are not a library per se: they are effectively a research
centre which under various names has been at the forefront of
many bibliographic developments for over 20 years. It is UKOLN
which hosts the BIBLINK Web page, which can be visited at the
following address, http://www.ukoln.ac.uk/metadata/BIBLINK/.
BIBLINK is a three year project, divided into two stages of equal
length. Stage 1, the research stage, has just concluded; a revised
technical annex for Stage 2, which will see the development of
the prototype system, has just been agreed. The project has now
just under eighteen months to run. Like all Commission-funded
projects, partners receive 50% funding, which means that they
must find the other 50% from within their own resources; but during
the actual demonstration stage the Commission will reimburse only
a third rather than a half. As project leader, the British Library
has heavy obligations, but we are fortunate in having yielded
day-to-day project management to an outside company, Level-7,
and this has enabled us to concentrate better on what we at the
British Library want from the project as well as our overall responsibility
for ensuring that the project remains on track..
The first stage of the project has been about researching the
field and obtaining consensus for its conclusions. I will say
something about that research in a moment, but the consensus part
of the project has been an important component, because in Stage
2 publishers have been enlisted to enable us to demonstrate that
a system of creating, converting, enhancing and retransmitting
metadata between the two sectors can work. Each of the partners
has discussed the results of our research with publishers in each
of our countries, and we are confident that those publishers support
our findings.
I want now to summarise just what arose from that research. Stage 1 consisted of eight workpackages, two of which were the obtaining of publisher consensus and general consolidation of the results; the six which comprised the bulk of the research were as follows:
· scoping: what were the limits of the task?
· metadata: what metadata formats existed and which were most appropriate to the project?
· identification: as above, which numbering and other systems existed and which were most appropriate?
· format conversion: which tools were available to handle the metadata?
· transmission: how should the metadata be moved between publishers and NBAs?
· authentication: how could the
relationship between the metadata and the item itself be assured?
In the Scoping workpackage the range of electronic publishers
and electronic publications to be considered by the project was
defined. The scope was narrowed to those publications that would
traditionally be included in a national bibliography in whatever
electronic medium they are published, although to all intents
and purposes those media are CD-ROM and the World Wide Web.
A taxonomy of publisher types by country, covering the established
and newly emerging sectors, was also produced from which publishing
partners would be recruited at a later point of the project; those
publisher types are, in BIBLINK parlance, "traditional
or publishers with a background in conventional print publications;
"new or publishers without such a background; and "grey
or publishers such as university departments whose primary concern
is not publishing and who are increasingly using the Web to disseminate
scholarship.
The study of metadata formats identified formats in use or under
development in the library and publishing sectors. A comparative
analysis was performed on these and the content of the non-library
formats compared with MARC formats. At the same time, a separate
report examining three SGML metadata formats (TEI, EAD and CIMI)
was commissioned and incorporated in the deliverable. National
libraries' requirements in this area were also identified. It
was concluded that it might be necessary to consider more than
one format to accommodate the diverse body of publishers covered
by the scope of the project. Dublin Core, the Book Industry Communication/Pira
Simplified SGML for Serials Headers (SSSH) and BIC's forthcoming
non-serial Document Type Definition (DTD) were recommended for
consideration.
After initial comparison of fourteen identification schemes, seven
were identified as possibly being appropriate for BIBLINK, including
the emerging Digital Object Identifier (DOI) as well as the established
ISBN and ISSN systems. These were analysed according to a set
of criteria identified by the partners and refined by the workpackage
leader. It was not felt to be appropriate to recommend a single
identification scheme as none was used by all, or even a majority,
of the possible publisher participants. The report recommended
five schemes that would be acceptable to the project: ISSN, ISBN,
SICI, DOI and URN.
The report from the format conversion workpackage examined the
feasibility of producing MARC records from formats previously
identified for use by BIBLINK. Working in parallel with the Consensus
Building workpackage to define a minimum set of data elements,
the report mapped the agreed minimum data set to Dublin Core and
suggested modifications that would be required for the creation
of a usable MARC record. It also mapped SSSH elements to MARC
records at serial issue level only. The BIC non-serial DTD cannot,
however, be mapped until its definition is complete. The report
concluded that Dublin Core could be used to produce a reasonably
comprehensive descriptive MARC record for so-called monographs
- of which more later - but that SSSH is not a suitable format
for descriptions of serials at issue level; for that purpose,
either Dublin Core or the format used in the ISSN database could
be used.
The transmission workpackage examined the technical and operational
characteristics of the options for transferring metadata from
publishers to national bibliographic agencies at two levels: firstly,
the low-level transport layer for which four possible options
were identified and fully analysed according to criteria specified
in the technical annex and others that were identified as work
proceeded; and secondly, a higher packaging level comprising such
formats as EDI and the Warwick Framework (which grew out of the
Dublin Core), which could allow data of interest to a variety
of players to be included in the transmission at different points
in time. It was concluded that BIBLINK should use Internet protocols
for the transmission of data in conjunction with one of three
high level formatting options. The final recommendation for the
latter was dependent on the outcome of discussions with publishers.
In the BIBLINK context, authentication has been defined as ensuring
a one-to-one relationship between a bibliographic record and the
publication to which it refers. Two models were considered, one
in which the publication is kept in a controlled environment and
one in which it is not. It can be seen to encompass other matters
which are live issues in the world of electronic publishing, namely
version control, rights management and protection of intellectual
property. The report studied other projects that are addressing
these issues and examined the techniques in use or under development,
but came eventually to the conclusion that the provision of a
data element containing a checksum figure that had been generated
by the publisher or the NBA would meet BIBLINK's requirements.
I would like to make two general conmments on progress to date.
Firstly, the question of publisher consensus: BIBLINK partners
have been greatly encouraged not only by the interest shown by
publishers in the project but also by their promise of active
support, to the extent that 23 have agreed to be involved in testing
the prototype. Those 23 publishers range from major international
publishers to relatively small organisations, they come from all
five countries of the consortium and each of the three publisher
types referred to earlier are well represented.
Secondly, what has become apparent to me during the course of
the project so far is the fact that a librarian's conventional
distinctions between monographs and serials is breaking down.
What do you call a Web document which is not a serial in the
classical sense but which is forever changing? Is there an optimal
point in the life of such a document when it is appropriate for
a library to acquire or download it, or can one do no more than
treat the act of downloading as taking a snapshot? I wonder whether
by attempting to use what are basically traditional means to handle
electronic documents we are not falling into the same trap as
early motor car designers, who did no more than adopt the design
of the horse and carriage. Some of these questions will be answered,
no doubt by another Commission sponsored project, NEDLIB, which
will be looking into the whole life cycle of electronic publicaations.
I look forward to NEDLIB's conclusions.
But back to the present. Very shortly a software house will be
selected to create what the consortium is calling the "BIBLINK
workspace or BW. The BW will be a shared, distributed database
containing records to which a variety of processes may be carried
out within the confines of the workspace, such as creation, modification,
conversion, and so forth. BW users may include not only publishers
and NBAs but also the various standard numbering agencies to whom
one or other of the two other user types may need to apply for
the appropriate identification number. The records themselves
must contain at least the data agreed to be mandatory within the
BIBLINK Minimum Data Set (in other words, some of those elements
are optional only because they are not applicable to all publications,
such as frequency in the case of serials). That minimum data
set, by the way, includes only eleven of the fifteen Dublin Core
elements, but adds a further seven, making eighteen in total.
What will essentially happen in the BW is that publisher records
will enter the system (perhaps in Dublin Core format), be converted
through a MARC conversion package such as USEMARCON to one of
the national MARC formats, be reviewed by the NBA for authority
control and subject indexing term, and then retransmitted to the
publisher via a reverse conversion process, if that is
what the publisher requires. Behind this brief scenario are various
options and some questions still waiting to be answered: for example,
will the BW interface be Web based or e-mail based? What national
MARC formats can be handled, bearing in mind full conversion
tables available at present as part of USEMARCON are limited to
UKMARC and USMARC? Will NBAs themselves have a role in the allocation
of Digital Object Identifiers if the DOI system prevails, as
it may well do? Likewise, is the Dublin Core likely to become
the standard for Web metadata?
The stage where BIBLINK is at present - or rather at the time
of writing, which was just before Christmas - is that user requirements
and functional specification documents are being drawn up. After
a Call to Tender, the latter will be presented to the software
house which has bid successfully to create the BIBLINK Workspace;
that software house will have just a few months to produce and
test the system, which should be ready for the demonstrator phase
due to commence in the Autumn. In the meantime, the individual
NBAs will need to ensure that their internal systems, which are
of course outside the scope of the BW, are able to interact with
the BW, since the demonstrator phase will be demonstrating the
full chain of events, from publisher to NBA and back. The delivery
of a final report will conclude the project in May 1999, and as
part of that report NBAs will be required to indicate their individual
plans to put the results of BIBLINK into action.
I want to round off this presentation by making one or two observations
on the overall thrust of BIBLINK. Firstly, it has been an unwritten
guideline that BIBLINK exists in the world that we have, rather
than the world where we might like to live. While the Telematics
Applications Programme is about innovation, that innovation is
not intended to be at the expense of proven systems or technologies.
BIBLINK has tried to make use of what is available, but to be
innovative in the way in which existing systems and technologies
are brought together. Secondly, it has been fortuitous to say
the least that two systems designed to improve the accessibility
and availability of Web publications have been under development
throughout the course of the project, the Dublin Core and the
DOI systems. BIBLINK partners have contributed to their development.
Thirdly and finally, and this goes back to what I mentioned at
the outset of this paper, the two factors which led to the BIBLINK
project being proposed to and accepted by the European Commission,
there is the matter of the relationship between libraries and
publishers. BIBLINK's experience of that relationship is that
while publishers - quite justifiably - will only help libraries
if they can see advantage for themselves, the complexity and unknown
future of electronic publications is such that they are more than
willing to exchange experience with us. If therefore we can demonstrate
through BIBLINK that solutions to those problems are best achieved
through co-operation, then I believe we can be optimistic about
our future relationships in other areas of joint concern as well.
ROSS BOURNE
December 1997
The author would like to thank his colleague, Robina Clayphan, for commenting on this paper during its preparation, thereby helping to make it more comprehensible.
List of abbreviations. The following is an almost complete
list, omitting only those acronyms and initialisms too well known
to spell out, such as CD-ROM.
BIC | Book Industry Communication (joint UK book trade/library organisation) |
BW | BIBLINK Workspace |
CIMI | Computer Interchange of Museum Information (specialist exchange format) |
CoBRA | Computerised Bibliographic Record Actions (EC sponsored Concerted Action) |
DOI | Digital Object Identifier (identification system for electronic documents) |
DTD | Document Type Definition (SGML application) |
EAD | Encoded Archival Description (specialist exchange format) |
EDI | Electronic Data Interchange (generic name for electronic trading standards) |
ISBN | International Standard Book Number |
ISSN | International Standard Serial Number |
MARC | Machine Readable Cataloguing (library exchange format) |
NBA | National Bibliographic Agency |
NEDLIB | Networked European Deposit Library (EC sponsored project) |
SGML | Standard Generalised Mark-up Language |
SICI | Serial Item and Contribution Identifier (identification system for serial parts and contents) |
SSSH | Simplified SGML for Serials Headers (SGML application) |
TAP | Telematics Applications Programme |
TEI | Text Encoding Initiative (SGML application) |
UKOLN | UK Office for Library and Information Networking (research office housed at the University of Bath) |
URN | Uniform Resource Name |
USEMARCON | User-controlled Generic MARC Converter (EC sponsored project) |