isbnlib
is a (pure) python library that provides several
useful methods and functions to validate, clean, transform, hyphenate and
get metadata for ISBN strings.
From the command line, enter (in some cases you have to precede the
command with sudo
):
$ pip install isbnlib
If you use linux systems, you can install using your distribution package
manager (all major distributions have packages python-isbnlib
and python3-isbnlib
), however these are (usually) very old and don't work well any more!
The official form of an ISBN is something like
ISBN 979-10-90636-07-1
. However for most applications only the numbers are important, you can always 'mask' them if you need (see below). This library works mainly with 'stripped' ISBNs (only digits and X) like '0826497527'. You can strip an ISBN-like string by usingcanonical(isbnlike)
. You can 'mask' the ISBN by usingmask(isbn)
. So in the examples below, when you see 'isbn' in the argument, it is a 'stripped' ISBN, whereas when the argument is an 'isbnlike', it is a string likeISBN 979-10-90636-07-1
or even something dirty likeasdf 979-10-90636-07-1 bla bla
.Two important concepts: a valid ISBN should be an ISBN that was built according to the rules, which is distinct from an issued ISBN, which is an ISBN that was already issued to a publisher (this is the usage of the libraries and most of the web services). However isbn.org, probably for legal reasons, merges the two! So, according to isbn-international.org, '9786610326266' is not valid (because the block 978-66... has not been issued yet, however if you use
is_isbn13('9786610326266')
you will getTrue
(because '9786610326266' follows the rules of an ISBN). But the situation is even murkier, trymeta('9786610326266')
and you will see that this ISBN was already used!If possible, work with ISBNs in the ISBN-13 format (since 2007, only ISBNs in the ISBN-13 format are issued). You can always convert ISBN-10 to ISBN-13, but not the reverse (read this). Read more about ISBNs at isbn-international.org or wikipedia.
is_isbn10(isbn10like)
- Validates as ISBN-10.
is_isbn13(isbn13like)
- Validates as ISBN-13.
to_isbn10(isbn13)
- Transforms ISBN-13 to ISBN-10.
to_isbn13(isbn10)
- Transforms ISBN-10 to ISBN-13.
canonical(isbnlike)
- Keeps only digits and X. You will get strings like 9780321534965 and 954430603X.
clean(isbnlike)
- Cleans ISBN (only legal characters).
notisbn(isbnlike, level='strict')
- Checks with the goal of invalidating ISBN-like.
get_isbnlike(text, level='normal')
- Extracts all substrings that seem like ISBNs (very useful for scraping).
get_canonical_isbn(isbnlike, output='bouth')
- Extracts ISBNs and transforms them to the canonical form.
ean13(isbnlike)
- Transforms an isbnlike string into an EAN13 number (validated canonical ISBN-13).
doi(isbn)
- Returns a DOI's ISBN-A from a ISBN-13.
mask(isbn, separator='-')
- Mask (hyphenate) a canonical ISBN.
info(isbn)
- Gets the language or country assigned to this ISBN.
meta(isbn, service='default')
- Gives you the main metadata associated with the ISBN. As the service parameter you can use:
'goob'
uses the Google Books service (no key is needed) and is the default option,'wiki'
uses the wikipedia.org API (no key is needed),'openl'
uses the OpenLibrary.org API (no key is needed). You can enter API keys withconfig.add_apikey(service, apikey)
(see example below). The output can be formatted asbibtex
,csl
(CSL-JSON),msword
,endnote
,refworks
,opf
orjson
(BibJSON) bibliographic formats withregistry.bibformatters
. Now, you can extend the functionality of this function by adding plugins, more metadata providers or new bibliographic formatters (check for available plugins). editions(isbn, service='merge')
- Returns the list of ISBNs of editions related with this ISBN. By default uses 'merge' (merges 'openl', 'thingl' and 'wiki'), but other providers are available: 'openl' (uses the search API from Open Library), 'thingl' (uses the service ThingISBN from LibraryThing), 'wiki' (uses the service Citation from Wikipedia) and 'any' (first tries 'wiki', if no data then 'openl').
isbn_from_words(words)
- Returns the most probable ISBN from a list of words (for your geographic area).
goom(words)
- Returns a list of references from Google Books multiple references.
classify(isbn)
- Returns a dictionary of classifiers for a canonical ISBN. For the meaning of these classifiers see OCLC. Most of the data in the underlying service are for books in English. (See issue 138).
desc(isbn)
- Returns a small description of the book. Almost all data available are for US books!
cover(isbn)
- Returns a dictionary with the url for cover. Almost all data available are for US books!
doi2tex(DOI)
- Returns metadata formatted as BibTeX for a given DOI.
ren(filename)
- Renames a file using metadata for an ISBN in the filename.
See files test_core and test_ext for a lot of examples.
You can extend the functionality of the library by adding plugins (for now, just new metadata providers or new bibliographic formatters).
For available plugins check here.
After installing, your plugin will blend transparently in isbnlib
(you will have more options in meta
and bibformatters
).
In the namespace isbnlib
you have access to the core functions:
is_isbn10
, is_isbn13
, to_isbn10
, to_isbn13
, canonical
,
clean
, notisbn
, get_isbnlike
, get_canonical_isbn
, mask
,
info
, check_digit10
, check_digit13
, doi
and ean13
.
In addition, you have access to metadata functions, namely:
meta
, editions
, ren
, desc
, cover
,
goom
, classify
, doi2tex
and isbn_from_words
.
The exceptions raised by these methods can all be caught using ISBNLibException
.
You can extend the lib by using the classes and functions exposed in the
namespace isbnlib.dev
, namely:
-
WEBService
a class that handles access to web services (just by passing a url) and supportsgzip
. You can subclass it to extend the functionality... but you probably don't need to use it! It is used in the next class. -
WEBQuery
a class that usesWEBService
to retrieve and parse data from a web service. You can build a new provider of metadata by subclassing this class. Its main methods allow passing custom functions (handlers) that specialize them to specific needs (data_checker
andparser
). It implements a throttling mechanism with a default rate of one call per second per service. -
Metadata
a class that structures, cleans and 'validates' records of metadata. Themerge
method allows implementing a simple merging procedure for records from different sources. The main features of this class can be implemented by calling thestdmeta
function instead! -
vias
exposes several functions to make calls to services simply by passing the name and a pointer to the service'squery
function.vias.parallel
allows making threaded calls. You can usevias.serial
to make serial calls andvias.multi
to use several cores. The default isvias.serial
.
The exceptions raised by these methods can all be caught using ISBNLibDevException
(or, more generally, ISBNLibException
).
You shouldn't raise this exception in your code, only raise the specific exceptions
exposed in isbnlib.dev
whose names end in Error.
In isbnlib.dev.helpers
you can find several methods that we found very useful, some of which
are only used in isbntools
(an app and framework that uses isbnlib
).
With isbnlib.config
you can read and set configuration options:
change timeouts with seturlopentimeout
and setthreadstimeout
,
access API keys with apikeys
and add new ones with add_apikey
,
access and set generic and user-defined options with options.get('OPTION1')
and set_option
.
Finally, from isbnlib.registry
you can change the metadata service to be used by default
(setdefaultservice
),
add a new service (add_service
), access bibliographic formatters for metadata (bibformatters
),
set the default formatter (setdefaultbibformatter
), add new formatters (add_bibformatter
) and
set a new cache (set_cache
) (e.g. to switch off the cache set_cache(None)
).
The cache only works for calls through metadata functions. These changes only work for the 'current session',
so should always be done before calling other methods.
Let us concretize these points with a small example.
Suppose you want a small script to get metadata using Open Library
formatted in BibTeX.
A minimal script would be:
from isbnlib import meta
from isbnlib.registry import bibformatters
SERVICE = "openl"
# now you can use the service
isbn = "9780446310789"
bibtex = bibformatters["bibtex"]
print(bibtex(meta(isbn, SERVICE)))
The library implements a very simple API with sensible defaults, but there are cases that need your attention (see case 3 below).
- You only need core functions:
# import the core functions you need
from isbnlib import canonical, is_isbn10, is_isbn13
isbn = canonical("978-0446310789")
if is_isbn13(isbn):
...
...
- You also need metadata functions with the default config:
from isbnlib import canonical, meta, description
isbn = canonical("978-0446310789")
data = meta(isbn)
...
-
You also need metadata functions with a special config:
Let's suppose you need to add an API key for a metadata plugin and change the cache too.
from myapp.utils import MyCache
# import the functions you need, plus 'config' and 'registry'
from isbnlib import canonical, config, meta, registry
# you should use 'config' first
config.add_apikey("isbndb", "kjshdfkjahsdflkjh")
# then 'registry'
registry.set_cache(MyCache())
# Only now should you use metadata functions
# (there are no adaptions for core functions,
# so they can be used at any time)
isbn = canonical("978-0446310789")
data = meta(isbn, service="isbndb")
...
-
You want to build a plugin or use isbnlib.dev in your code:
You should study the public methods in
dir(isbnlib.dev)
very carefully, starting with this template and following the instructions there. For inspiration take a look at goob.Most of the public bibliographic catalog services return data in SRU or Unimarc format. It is very easy to write a customer plugin for these services, just use porbase (SRU) or sbn (Unimarc) as templates and consult this project.
- These classes are optimized for single calls to services and not for batch calls.
- If you inspect the library, you will see that there are a lot of private modules (their names start with '_'). These modules should not be accessed directly since there's a high probability your program will break with a future version of the library!
Open Library https://github.com/internetarchive/openlibrary
NYPL Library Simplified https://github.com/NYPL-Simplified
RERO ILS https://github.com/rero/rero-ils
CERN CDS RDM https://github.com/CERNDocumentServer/cds-rdm
ResearchHub https://github.com/ResearchHub/researchhub-backend
Manubot https://github.com/manubot
isbntools https://github.com/xlcnd/isbntools
isbnsrv https://github.com/xlcnd/isbnsrv
See the full list here.
If you need help, please take a look at github or post a question on stackoverflow.