Ovid LinkSolver
Click here to login to Links Admin Tool
LinkSolver navigation
 
 
White Papers

Where’s My Link?
A Perspective on Why Linking is Not Perfect

Diana Bittern
Director, Software Product Management
Ovid Technologies


If an institution has purchased the rights to content, the institutional users should have access to that content from any application, regardless of where that content resides.

Introduction

Institutions continue to invest significant resources to purchase electronic full text for their patrons. Then, additional investments and resources are required to purchase and implement link resolution systems that promise to connect end-user students and researchers from a bibliographic citation to the full text article in the cited electronic journal. In essence, much time and money is spent to fulfill the concept of “one-click access” to full text.

It is hardly surprising to hear the battle cries of librarians when their users innocently ask, “Where’s my link?” to the full text that they, the subscribing users, expect to take them to the article online. As information providers, publishers and aggregators struggle with the task of bringing order out of chaos and giving users reliable resource integration through linking, this paper attempts to illustrate some of the realities of linking and the steps being taken to overcome the myriad anomalies in matching bibliographic citations to the full text.

To help clear the muddy water around the linking issues and the solutions which have at least been proposed include:

  1. Metadata (and their Information Providers)
  2. Proliferation of ISSNs
  3. Embargoed titles
  4. CrossRef, the Silver Bullet

The Metadata [non]Standard

The creation of a link relies on several key ingredients. Linking is about matching metadata, or the data that defines an article. OpenURL attempts to standardize matching by defining required metadata elements and an originating resource, so that the necessary information will be available to make a match when presented to a target destination.

The most common metadata matching information for article level linking is IVIP, or ISSN, Volume, Issue, Page. A URL “template” establishes a placeholder for this information, extracted from a bibliographic database citation, and is sent to the ‘target’ where – presuming a match can be made –a full text experience results. Some URL templates include additional syntax such as journal title, author(s), and article title.

Using the IVIP model is preferred because it reduces the possibilities of variations that result in mismatches. Title and Author metadata can be troublesome, because exact matching is marred by the existence (or lack) of punctuation, articles, name conventions author’s last name; author first initial, honorific, etc, which often differ among information providers.

Yet, even the most ‘standard’ metadata can produce problems. For example, take the ISSUE metadata element. If we presume that the starting point for researchers is the bibliographic database?acting as an online card catalog?figure that there are scores of information providers who deliver bibliographic databases, it is not hard to imagine differences emerging. Out of 10 bibliographic database providers, there could be 10 different ways of representing a supplement, a combined issue, or an issue part.

Some databases combine source information into a single field, such as 2001(6)2;112: publication year 2001; Volume 6, Issue 2, Page 112. It is possible to extract required metadata elements from the source field, provided that there are no variations. Enter a twist, such as combined issues 2 and 3, or page S33, and the extraction formula needs to be augmented with filters or Perl expressions, rules for handling anomalies (assuming that all the anomalies can be identified).

Multiply the number of designations for title and author information, by the variations for extracting source information, across all databases used in your institution. The development and maintenance of linking formulae becomes a daunting task.


The International Standard Serial Number (ISSN)

Speaking of standards, the journal identifier or ISSN, is THE identifier for a serial, right? Not quite. Although imperfect, the ISSN has long defined a journal in the traditional sense, and in theory, a single ISSN identified a single print version. In the pre-digital world, ISSN’s were somewhat unreliable, since an ISSN could arbitrarily change (or not) with minor title changes, publisher changes, etc.
In the digital world, ISSN’s are even less reliable, since suddenly, they are being registered for the electronic version of a journal, which may contain different content, often break the IVIP rule – who needs Page numbers in an electronic version? -- and can precede the arrival of the print version of the publication, making it a more attractive target for online linking services.
Linking software relies on ISSN as a match element, but it no longer works as reliably as it once did. End users first encountered this problem a few years back when NLM, with no warning, began using electronic ISSN (EISSN) as the indexing standard for its Medline database. Linking products use a ‘knowledgebase’ to identify journals and coverage information. In other words, in a bibliographic citation, we check against a user profile of ISSN’s and years of coverage to determine whether to show a Full Text link. Ovid’s OpenLinks product, for example, successfully used the Print ISSN as the basis of its knowledgebase, to link across its 100 bibliographic databases. That is, until Medline’s change broke this link model.
The rules for assigning ISSN’s to different versions of a journal (print, CD-ROM, electronic), as well as rules for naming journal subsets are currently under review as part of the ISO (International Organization of Standardization) 3297, the official number for the ISSN standard. NISO’s directive is to answer the requirements of publishers who need to delineate different media of their titles, while at the same time answering the technical uses of ISSN to provide links and coverage information in software solutions.
For those who confront A&I database linking, reference linking, and OpenURL, this challenge is already well known. I recently participated in a survey from the ISSN Review Committee, and made the point that for many, the key use of ISSN is for linking from a bibliographic database or cited reference to the full text of the journal content.
Once the standard-makers have spoken, the resulting mandate must be adopted by the publishers and information providers. Where changes require that bibliographic databases be reloaded, it will take time and pressure from the user community to enforce the revised standard.
Meanwhile, the pressure from end users for high quality link reliability has pushed technology leaders to come up with their own answers to the problem. At Ovid, the problem is addressed through a Pub-ID database that stores ISSN and name elements, and provides what we refer to as ISSN-Mapping across resources. All ISSN numbers are linked in a relational database, by name (fuzzy matching logic at work here) and number. If a journal changes publishers, its name may change and its ISSN definitely changes. The database stores ISSN’s that refer to the same journal name (Print/Electronic) and maps older versions of the journal.
So, whether the MEDLINE database indexes EISSN, whether the customer’s journal import list includes some ‘old’ ISSN’s, or whether a journal changes hands, the A&I database user will be presented with a valid link and will reach the Holy Grail: the full text.
While our efforts are duly noted, overall, relying on individual publishers and aggregators to come up with sophisticated, technology work-arounds to make linking “perfect”, ultimately does not best serve the end-user.

Embargoes

The world of free journal linking brings enthusiastic response from customers, until they have to deal with the question of embargoes. What can be done to address rolling coverage or that 6-month subscriber embargo or a journal that offers open access to its archives after 12 months?

Journal embargoes are of varying durations, ranging from 6 months, to 9 months, and even 24 months, and can be applied by aggregators, where access to the content is delayed by a period of 3+ months; or by publishers who make access to certain journals free after a period of 12+ months.

In order for linking applications to process a request that accounts for an embargo, it must have reliable information about the MONTH of the publication. An OpenURL can be crafted to include a Month field. However, the majority of desirable resources - bibliographic databases and cited references – do not have reliable Month metadata. Many Information Providers do not use Month at all, but use Publication Year, or Entry Week. Of Ovid’s top 10 bibliographic databases, only MEDLINE has a field called EP (Electronic Date of Publication) from which one could reliably parse Month information. So, the existence of this information benefits a very small number of databases – and virtually no cited references.

There are scenarios to address this issue; whereby librarians can select the publication year to determine whether to show a link for an embargoed title. However, neither is ideal, since one results in dead-end links, and the other excludes otherwise valid links. To illustrate, let’s use a title called the Journal of Parapsychology that has a 3 month embargo. The options are:

  1. Define coverage by rounding Backwards to the Previous Year: This journal would be missing links for up to 9 months, reflecting the most recent issues. If the coverage is set to 2002, and it is now December 2003, the result is that a user would not see links for citations from January through September of 2003 which are valid, under terms of the embargo.
  2. Make coverage reflect the Current Year: This journal with a 3 month embargo will show ‘dead links’ for the embargo period.

Ovid is addressing the embargo issue programmatically by providing two types of embargo coverage: subscriber and free.

In Ovid’s knowledgebase, journal identifiers will now include embargo information, where relevant. For a specific supplier, whose terms mandate them, subscriber embargos can be defined as N months, and links will appear only if the embargo period is honored. For journals that become “free-after-N-months,” that information will be stored in the knowledgebase and activated for all users automatically through the linking application.


CROSSREF

CrossRef is a technical and marketing phenomenon. In its short lifetime, it has gained tremendous brand awareness with a following of over 300 member publishers, and has expanded from its original theme of reference linking among member publishers to a much broader one of improving overall access to scholarly information on the internet.

CrossRef’s expanded mission now includes forays into areas such as cross-search and forward linking (links to articles that were cited by a viewed article). But as with any large and fast-growing enterprise, there have been many challenges along the way, and there is plenty of work to solidify the original goal of CrossRef linking: providing persistent [link] identifiers to the full text.
Member publishers deposit DOI’s or Digital Object Identifiers to CrossRef’s metadata database for the sole purpose of facilitating linking. The original purpose of the CrossRef initiative was to promote cited-reference linking among participating e-publishers. Member publishers agreed to submit ‘persistent identifiers” (DOI numbers) with relevant metadata (such as ISSN, volume, issue, page, and other information such as author(s), journal title, article title, etc) and a URL to the article’s location.
A publisher, who is preparing to publish an article electronically, can query the CrossRef metadata database with reference metadata, and where there is a successful match, embed full text links to references. The reader clicks on a reference link to be transported to the full text of that cited article at the publisher’s web site.
CrossRef members are honor-bound to deposit DOI’s for all of their electronically published content, both current and archival. Most have done a thorough job of depositing, but the database is far from complete. CrossRef is self-policing and recognizes the varying technical capabilities and resource constraints among its members. However, a missing CrossRef DOI is a missed full text link, plain and simple.
Metadata requests to the CrossRef database assume the presence of certain elements in order to create and resolve links. We return to the familiar IVIP model where volume, issue and page information, combined with the journal name, author(s) and article title are used, in some combination, to create link queries. The absence of one or more ‘required’ metadata elements causes a link not to appear.
Wolters Kluwer is one of several large publishers that locally hosts the CrossRef Database. Recently, Ovid (part of the Wolters Kluwer Health Division) undertook an exhaustive analysis to compare our respective link repositories, and uncovered data that was not present in Ovid’s local CrossRef database. CrossRef and Ovid have since been working closely to implement validation processes to ensure that locally hosted CrossRef databases deliver the same high quality link information as CrossRef’s own metadata database.

In the course of our investigations, we also identified a significant number of DOI’s that were deposited by publishers into CrossRef’s database with insufficient data to allow metadata matching under the most common link rules. In some cases, these DOI’s were deposited in advance of print, so that Issue or Page information was missing. In other cases, the DOI is an article from an Electronic-Only journal and has no Issue or Page information.

CrossRef responded by notifying the member publishers about missing metadata so that it can be re-deposited to allow successful matching. In addition, vendors are exploring alternative metadata matching schemes, using different combinations of ‘required’ metadata and new algorithms to extend the reach of link reliability.
CrossRef has been working overtime to beef up its infrastructure, to reduce the time it takes to process DOI deposits, and to expand its own capabilities to provide error reporting and statistical information to its membership. While CrossRef is not perfect in its implementation, it is still a valuable resource and boon to the information industry.

Wrap Up

So what have we learned? Today’s linking systems continue to deliver increasingly reliable link performance. Over time, this painstakingly slow work of creating order out of linking chaos, will improve as the awareness of all the details become apparent to the parties responsible for creating and maintaining the primary resources and delivery systems.

The issues covered in this review are not unique to a single application vendor; they are common to all, and require constant vigilance, analysis and innovation in order to deliver reliable resource integration. Success will be achieved when end-users proclaim, “Here’s my link!”

Documentation
Administration Manual
White Papers
Linking Glossary

Resources
Listservs
Feedback Form

Tools
Citation Matcher

©2008 Ovid Technologies, Inc. All Rights Reserved.
Ovid® is a registered trademark of Ovid Technologies, Inc. and cannot be reproduced without permission.