Linking Across the Digital Divide
Diana Bittern
Director, Software Product Management
Ovid Technologies
If an institution has purchased the rights to content, the institutional
users should have access to that content from any application,
regardless of where that content resides.
Introduction
Open linking has become the accepted standard in the information
industry today. Publishers are providing full text of their
electronic journal and book content, and content aggregators
are linking from their bibliographic databases to full text
content both within and outside their hosted collections.
In short, accessing full text electronically is expected as
baseline functionality by today’s web-savvy users. Most
primary publishers either have their own websites where they
offer electronic versions of their journal and book content,
or contract with vendors like Lippincott Williams & Wilkins
or Highwire to host their society’s website. Others,
like Ovid, led the way to full text linking from bibliographic
databases by introducing aggregated, fully searchable full
text collection databases (Journals@Ovid), then supplemented
this service by linking to remote full text with OpenLinks.
Then, along came the vendors who offered content-agnostic,
centralized linking servers and opened the doors to bi-directional
resource linking through OpenURL. New players and solutions
are arriving on the scene daily. The product offerings are
complex, and sweeping new claims are being circulated that
require careful scrutiny regarding initial setup and ongoing
maintenance. This paper attempts to put the state of linking
today into perspective.
Quantity Is King
Librarians, students, and researchers all expect to access
all sources of full text and to navigate freely among their
subscribed-to resources. For the most part, they don’t
know or care how linking works or where the full text resides.
End-users often do not know the content source or the linking
format. Librarians do know that if one product does not provide
the quantity and quality of full text links desired, they
can find one that does.
Linking Models – Dynamic vs. Data
So, how do information providers get their users to the full
text of journal articles and books? There are two basic models
for linking: dynamic and data.
The dynamic or live linking model is widely
used and represents the basis for OpenURL linking, a standard
that, in its most simplistic definition, establishes the rules
for matching bibliographic information (“metadata”).
Earlier dynamic linking products, like Ovid’s OpenLinks,
were developed from joint linking agreements with content
providers. The result of partnering with publishers was a
metadata matching algorithm that sent the A&I database
searcher directly to the cited article at the publisher’s
web site. Coverage tables in the Ovid product further refined
the matching model to insure that the searcher saw a full
text link button only for the period covered by the institutional
subscription. If an institution’s electronic subscription
started in 1998, any citation prior to that date would not
tempt the user with a full text link that resulted in a dead
end. This extra refinement requires technology on the original
site and is not a part of the OpenURL standard.

The linking data model used by PubMed’s LinkOut and
SilverPlatter’s SilverLinker, employs a databases of
links – one record for each full text article link.
The database model is arguably more reliable, and leads to
fewer failed matches because a link is recorded in the database
only if there is a target full text article to match the bibliographic
record. However, it is also a more work-intensive model, requiring
participating vendors to create linking databases, send them
to the recipient system, and then to process and distribute
them to all users. This takes time, which means that currency
becomes the tradeoff to the less reliable, live linking model.

Universal Linking Solutions
Library automation vendors have recently added centralized
linking products to their product offerings. These solutions,
based on the emerging OpenURL standard of linking, offer the
advantage of setting up and maintaining links to all resources
through a single, centralized link repository. Linking from
A&I databases to a wide variety of full text and other
resources (book ordering through Amazon.com, genome databases,
etc)
can be set up through a single, centralized source. Serials
lists from the institution’s OPAC can be imported to
simplify setup and maintenance.
OpenURL is based on a dynamic linking model
and adds information in the URL query that identifies the
source requester. That way, the central link server knows
the origin of the link request and is able to parse it according
to the rules for that vendor’s product.
However, some of these products carry a hefty price tag, and
require Perl programming to build the filters to get around
the syntax required by database and target resource formats.
And all require that the link matching syntax keeps up with
the ever-changing world of resource movement across the great
digital divide.
CrossRef
The CrossRef organization began in 2000, as a collaboration
of publishers with a mission to provide a reference linking
“backbone” for electronic resources to promote
scholarly research. Three years later, CrossRef has over 180
primary publisher members. In addition, CrossRef is used by
a growing number of library ‘affiliates’ who pay
for the privilege of accessing CrossRef’s metadata database
to retrieve Digital Object Identifiers (DOI’s) that
resolve to the article’s full text. CrossRef represents
the belief that the value of a publication is tied to the
number of links to and from it, and that ease of navigating
from one article to another is key. That’s why publishers
pay to deposit article metadata to the CrossRef database.
To be a player in the electronic market is to be linked.
However, until recently, there were no methods
for policing the quality of data deposited by the publishers,
and there is no guarantee that a publisher has deposited all
of its archival data, or is completely up to date in depositing
DOI’s for current journal content. CrossRef has a worthy
mission, but it’s still being proven as a long-term
solution and not recommended for dynamic linking solutions.
In other words, if a researcher expects that a bibliographic
citation will link successfully to a full text journal article
based on ISSN and publication year, then the assumption must
be that the publisher has deposited DOI’s for each article
in that volume and issue. When gaps exist, dead ends are the
result.
Linking is Not a Perfect Science
Examples of problematic links can be found even in the best
of circumstances.
The difficulty can come from any of the
participants in linking -- indexers of bibliographic data,
the publishers, or the service providers.
Supplements and parts can be represented and organized differently
when published online; and syntax varies from publisher to
publisher, and from service provider to service provider.
Supplements may be indexed as "part", "pt",
"supp", or "s" in volume, issue or page
information.
Indexing by content providers is not standardized, so that
each database uses a different set of criteria for indexing
bibliographic information. Just as a publisher may organize
supplements and parts differently, so do indexers. They can
use any of the terms list above, or none at all. For example,
MEDLINE may index a journal issue as issue “4 part 1,”
while the journal issue number in the target link syntax may
be “4”. A link will not work in this example because
there is not an exact match on the issue information. However,
the link may work correctly in another bibliographic database
where the issue number for that journal is indexed as “4”.
Similar problems exist with articles contained in combined
issues, special issues, and supplements, all of which are
indexed differently in various bibliographic databases.
It is possible to develop fuzzy matching logic, or rules to
fetch additional data (e.g. article title) when the IVIP (issn,
volume, issue and page) information cannot provide an exact
match. The tradeoff is performance; the more processing required,
the slower the link generation
The changeover from print ISSN to electronic
ISSN as the journal identifier has created conflicts, as there
is no standard use across information providers; where there
is no standard, the quality of linking is affected. High quality
linking solutions require significant resource expenditure
and ongoing attention to changes. The DOI has gone a long
way to establish the value of a persistent identifier in the
journal linking world, but anyone who maintains an institutional
link repository can attest, it’s not a one-time investment.
The work is ongoing, so the tools need to be robust and flexible.
Linking to Related Information
Linking to full text does present a challenge, however it
can be modeled simply in theoretical terms as the linking
relationship is presumed to be one-to-one. A successful link
to full text should, for the given metadata, always present
a single, accessible document, initially described by originating
metadata.
It is important to mention the ability to
resolve link queries to the institution’s subscribed
content or appropriate copy. While CrossRef provides a persistent
identifier in the form of the DOI, the associated URL takes
the user to a web site identified by the publisher or agent.
To allow an institution to link to its locally held full text
collections requires that the DOI be redirected to the local
site. This capability, called “reverse metadata lookup”
is starting to be advertised by vendors who offer universal
link resolver products.
In today’s world, because most of
the interaction between the end user and information is direct,
without the assistance of an information specialist, it is
extremely useful to be able to identify other documents which
can be, in one or many ways, related to the document that
the end user is viewing. Article metadata might contain sophisticated
indexing that allows us to guide the researcher to related
and probably relevant information in one or more documents
in other document repositories.
To illustrate, consider the bibliographic
record from Biological Abstracts database:
Title: Analysis of chromosomal aberrations involving
chromosome 1q31fwdarwq53 in a DMBA-induced rat fibrosarcoma
cell line: Amplification and overexpression of Jak2.
Author, Editor, Inventor: Sjoling-A {a}; Lindholm-H;
Samuelson-E; Yamasaki-Y; Watanabe-T-K; Tanigami-A; Levan-G
Author Address: {a} Department of Cell and Molecular
Biology-Genetics, Goteborg University, SE-40530, Box 462,
Goteborg; E-Mail: Asa.Sjoling@gen.gu.se, Sweden
Source: Cytogenetics-and-Cell-Genetics. [print] 2001(2002);
95 (3-4): 202-209.
Journal URL: http://www.karger.com/journals/ccg/ccg_jh.htm
Publication Year: 2001
Document Type: Article-
ISSN (International Standard Serial Number): 0301-0171
Language: English
Abstract: In a study of DMBA-induced rat fibrosarcomas
we repeatedly found deletions and/or amplifications in the
long arm of rat chromosome 1 (RNO1). Comparative genome hybridization
showed that there was amplification involving RNO1q31fwdarwq53
in one of the DMBA-induced rat fibrosarcoma tumors (LB31)
and a cell culture derived from it. To identify the amplified
genes we physically mapped rat genes implicated in cancer
and analyzed them for signs of amplification. The genes were
selected based on their locations in comparative maps between
rat and man. The rat proto-oncogenes Ccnd1, Fgf4, and Fgf3
(HSA11q13.3), were mapped to RNO1q43 by fluorescence in situ
hybridization (FISH). The Ems1 gene was mapped by radiation
hybrid (RH) mapping to the same rat chromosome region and
shown to be situated centromeric to Ccnd1 and Fgf4. In addition,
the proto-oncogenes Hras (HSA11p15.5) and Igf1r (HSA15q25fwdarwq26)
were mapped to RNO1q43 and RNO1q32 by FISH and Omp (HSA11q13.5)
was assigned to RNO1q34. PCR probes for the above genes together
with PCR probes for the previously mapped rat genes Bax (RNO1q31)
and Jak2 (RNO1q51fwdarwq53) were analyzed for signs of amplification
by Southern blot hybridization. Low copy number increases
of the Omp and Jak2 genes were detected in the LB31 cell culture.
Dual color FISH analysis of tumor cells confirmed that chromosome
regions containing Omp and Jak2 were amplified and were situated
in long marker chromosomes showing an aberrant banding pattern.
The configuration of the signals in the marker chromosomes
suggested that they had arisen by a break-fusion-bridge (BFB)
mechanism.
Abstract Indicator: Y
Major Concepts: Methods-and-Techniques; Molecular-Genetics
(Biochemistry-and-Molecular-Biophysics); Tumor-Biology
Super Taxa: Muridae-: Rodentia-, Mammalia-, Vertebrata-,
Chordata-, Animalia-
Organisms: LB31-cell-line (Muridae-): rat-fibrosarcoma-cells
Taxa Notes: Animals-; Chordates-; Mammals-; Nonhuman-Mammals;
Nonhuman-Vertebrates; Rodents-; Vertebrates-
Parts, Structures and Systems of Organisms: rat-chromosome-1:
q31-q53-region
Chemicals and Biochemicals: 7-12-dimethylbenz[a]anthracene-:
carcinogen-; PCR-probes [polymerase-chain-reaction-probes]:
probe-
Sequence Data: AF054619-: GenBank-, nucleotide-sequence; D14014-:
GenBank-, nucleotide-sequence; L29232-: GenBank-, nucleotide-sequence;
M13011-: GenBank-, nucleotide-sequence; M26926-: GenBank-,
nucleotide-sequence; S78355-: GenBank-, nucleotide-sequence;
U03184-: GenBank-, nucleotide-sequence; U13396-: GenBank-,
nucleotide-sequence; U49729-: GenBank-, nucleotide-sequence;
X14849-: GenBank-, nucleotide-sequence; Y00848-: GenBank-,
nucleotide-sequence
Diseases: fibrosarcoma-: chemically-induced, genetics-, neoplastic-disease
CAS Registry Number (R): 391544-71-1: GENBANK-U03184; 160102-94-3:
GENBANK-U13396
Methods and Equipment: ABI-377-DNA-Sequencer: ABI-, medical-equipment;
BigDye-cycle-sequencing-reaction: PE-Biosystems, medical-equipment;
PCR- [polymerase-chain-reaction]: DNA-amplification, DNA-amplification-method,
analytical-method, genetic-method; Southern-blot-hybridization
[Southern-blot]: analytical-method, detection-method, gene-mapping,
genetic-method, labeling-, recombinant-DNA-technology; comparative-genome-hybridization:
Molecular-Biology-Techniques-and-Chemical-Characterization,
analytical-method, genetic-method; dual-color-fluorescence-in-situ-hybridization-analysis
[dual-color-FISH-analysis]: Molecular-Biology-Techniques-and-Chemical-Characterization,
analytical-method, gene-mapping-method; fluorescence-in-situ-hybridization
[FISH-]: Molecular-Biology-Techniques-and-Chemical-Characterization,
analytical-method, gene-mapping-method; radiation-hybrid-mapping:
Molecular-Biology-Techniques-and-Chemical-Characterization,
analytical-method, gene-mapping-method
Miscellaneous Descriptors: break-fusion-bridge-mechanism;
chromosomal-alterations; comparative-gene-maps; marker-chromosomes
Alternate Indexing: Fibrosarcoma-(MeSH)
Accession Number: 200200260598
Update Code: 20020909
In the above record we can use the information
in the Organisms field to link to more detailed taxonomy information
in the taxonomy database, Sequence Data field content to display
more information on genes mentioned in the article from the
Genome databank, CAS registry number for more information
on physical and chemical properties of substances mentioned
in the article, content of the field Methods and Equipment
can be used to link to related information in the MethodsFinder
database, Diseases field can be used to automatically link
to clinical management of the mentioned disease, while Alternative
Indexing field can be used to link to related information
in any other database using MESH indexing.
It has to be said that there are very few
systems which currently support this type of linking. Even
OpenURL does not yet provide sufficient granularity of the
metadata to standardize this type of linking.
Linking Futures
Bibliographic database providers may eventually incorporate
the DOI as a field identifier to link bibliographic database
citations to full text at the publisher’s web site.
Several information providers are doing this now and others
are sure to follow. Then, the mandate will be to ensure that
there are solutions that account for appropriate copy management,
or redirecting the DOI identifier to an alternate URL that
points to an institution’s locally held or aggregator-supplied
content.
Conclusion
Using the linking methods outlined above, most A&I vendors
can offer certain basic linking functions from their own bibliographic
content, and the expectation is that they will be available
at no extra cost. For those entertaining the prospect of centralizing
link administration, here are some things to look for:
An optimal centralized link solution should:
- Support the OpenURL standard for bi-directional
linking that includes linking to any full text, document
delivery, catalog holding and other OpenURL compliant internet
resource.
- Address linking to the appropriate copy
of a resource, redirecting CrossRef and DOI links to locally
held resources by means of maintaining local resolution
tables.
- Be priced competitively. Compare before
you buy. There are a number of universal linking products
on the market. Ease of administrative setup and maintenance
are important.
|