planet ecorrado.us

September 15, 2009

Dan Scott

Two podcasts of potential interest to Evergreen fans

Most recently, the latest Software Freedom Law Show focuses on the subject of how to choose a license for your software project's documentation. The episode was a direct response to a dent I had sent to one of the hosts, Bradley Kuhn, suggesting the subject. I thought the Evergreen Documentation Interest Group might find it a useful treatment from two of the most knowledgeable folks in the free software licensing world. As a bonus, when I started listening to the episode today, I was pleased to hear Bradley lead in with a very positive mention of Evergreen. Many thanks, Bradley, both for the show and for the shout-out to Evergreen!

Also, back in July, I had the opportunity to travel to Algoma University in Sault Ste. Marie to spend a few days locked in a room with my fellow Conifer propeller-heads (Art, Kevin, and Robin) to dump the Evergreen-related content of my brain out onto the table in preparation for my parental leave. As part of the visit, we joined in the Tangential Convergence crew to put together a podcast about Conifer and Evergreen in the standard Tangential Convergence style: having a few beer while sitting around a table in Dave Brodbeck's backyard. We ended up veering off onto other subjects rather quickly, but such is the nature of the show!

by dan@coffeecode.net (Dan Scott) at September 15, 2009 08:59 PM

Roy Tennant: Digital Libraries

LibLime To the Koha Community: Fork You!

A shot across the bow of the open source software community has come from an unexpected source -- a company that has championed the open source eth...

September 15, 2009 08:25 PM

Jonathan Rochkind

article


So, if you try to architect your applications solidly and flexibly, and build in features for integration, and it all works out okay, one of the benefits you get is it’s pretty easy to combine them.

I’ve added a feature to the Xerxes federated search tool to add sophisticated item-level information and services that were already being compiled by our Umlaut installation— to  Xerxes record-detail pages.

I think this is pretty neat from a sort of ’single business’ perspective of providing consistent services regardless of what tool the user happens to be using.

So now, when you look at an item detail page in Xerxes, you can, right on that page,  see:

  • call numbers and availability
  • Full text links from SFX, right on the page
  • Links to “similar items” content from Web of Knowledge and Scopus.
  • links to pre-filled ILL forms, as appropriate.
  • For monographic content, full text, preview, and ’search inside’ functionality from Amazon, Google, and others.
  • Other stuff — whatever happens to be configured in Umlaut, when new stuff is added to Umlaut, it’ll automatically show up in Xerxes too. (Well, new services of the existing types; if a whole new type/section is added to Umlaut, will take a couple lines of code in Xerxes to add it).

This is live in production here now, but you can’t really see it without a local login. So here’s some screenshots of Xerxes item detail pages, content from Umlaut circled in red.

book

article

It’s worth noting that this content is inserted on the page by javascript after page load. It can take 1-3 seconds or so to come in (depending on speed Umlaut can do it’s thing), which you can’t see in the screenshots. While waiting, you get a spinner and status message. If a user doesn’t have javascript enabled, this feature won’t effect their page view at all.

Posted in General

by jrochkind at September 15, 2009 08:13 PM

Edward M. Corrado: Comments

Comment on “Digital Preservation: The Next Library Frontier” IGeLU presentation slides by ecorrado

If you tried to read these using Adobe earlier and received a color space error, try again. Originally I saved the files as PDF/A which didn’t seem to please Adobe. I re-saved the PDF as a regular PDF and, at least on my Mac Book Pro, things seem to be working fine.

by ecorrado at September 15, 2009 05:13 PM

D-Lib Magazine

It's Time for Wider Acceptance of e-Textbooks

"Because the printed book is such a perfect technology, print books will surely continue to be produced and purchased, especially for leisure reading. But the time has come for greater use of e-textbooks for educational purposes. Efforts should be made to overcome the remaining obstacles to their broad acceptance." Editorial by Bonita Wilson, CNRI.

September 15, 2009 04:04 PM

Establishing Trust in a Chain of Preservation: The TRAC Checklist Applied to a Data Staging Repository (DataStaR)

"DataStaR is a data staging repository in development at Cornell University. A data staging repository offers unique opportunities to recruit data into domain and institutional data repositories, but as a transitory curation environment, it demands careful consideration of the role of such a repository in the full life cycle of research data. We describe our experience applying the Trustworthy Repositories Audit and Certification Criteria and Checklist as a framework for specifying system, policy, and documentation requirements to ensure that DataStaR is an effective partner in the entire chain of preservation activities." Article by Gail Steinhart and Dianne Dietrich, Cornell University; and Ann Green, Yale University.

September 15, 2009 04:04 PM

Subject-based Information Retrieval within Digital Libraries Employing LCSHs

"In this article, an effort is made to take advantage of the inherent semantic wealth existing within Library of Congress Subject Headings (LCSHs) in order to provide more efficient subject-based information retrieval in digital libraries. To formally express such wealth, an ontology schema is presented that is capable of modeling the semantics of LCSHs into adequate ontologies. Finally, in order to show the way towards exploiting such ontologies in favor of the development of more effective subject-based information retrieval in digital libraries, a prototype system is presented." Article by Ioannis Papadakis and Michalis Stefanidakis, University of Ionio; and Konstantinos Kyprianos and Rosa Mavropodi, University of Piraeus.

September 15, 2009 04:04 PM

Analysing Selection for Digitisation: Current Practices and Common Incentives

"This article presents the results of a close reading of current practices and guidelines for digitisation, in an attempt to further the movement towards greater consensus on this issue. From the existing myriad approaches found in the field, the article formulates a set of common criteria for selection by way of a sector-independent longlist. In this way the article illustrates the complex nature of selection, which may be seen to depend upon significantly greater number of criteria than have so far been put forward in any single guiding document, but it also proposes a base-terminology that can be used in any institutional setting." Article by Bart Ooghe, Heritage Cell Waasland; and Dries Moreels, Flemish Theatre Institute (BE).

September 15, 2009 04:04 PM

OA Network: An Integrative Open Access Infrastructure for Germany

"This article describes concepts, development, and implementation of an overall Open Access infrastructure for Germany. Currently, the joint project Open Access Network is facilitating comprehensive and value-added services built on top of distributed Institutional Repositories. Using the OAI-PMH as the harvesting mechanism, Open Access (OA) Network furnishes an open and extensible architecture to form the technological base for manifold enhanced services. It not only provides a personalized end user platform but also serves as an aggregator node for passing data to other service providers (e.g., DRIVER). Moreover, OA Network provides a testbed for the development of software to implement value-added services. OA Network is funded by the German Research Foundation (DFG)." Article by Uwe Müller, Robin Malitz, and Peter Schirmbacher, Humboldt-Universität zu Berlin; and Thomas Severiens, Universität Osnabrück.

September 15, 2009 04:04 PM

Curriculum for Digital Libraries: An Analytical Study of Indian LIS Curricula

"The information professionals of today must discharge their duties in a hybrid environment, one that deals with traditional print documents but digital documents as well. To make digital information management a reality, Library and Information Science (LIS) education programmes in India now include digital library courses in their syllabi. An attempt has been made in this article to analyze the digital library course content of LIS Master Degree programmes of selected University Departments/Institutions in India." Article by R.S.R.Varalakshmi, Andhra University.

September 15, 2009 04:04 PM

Talis: Panlibus

Talking with Talis Podcast: Hazel Hall, Strategic Leader at LIS Research Coalition

Dr Hazel Hall

lis_research_logo

In this Talking with Talis Podcast I speak to Hazel Hall, the newly appointed Strategic Leader at the recently established Library and Information Science Research Coalition. We discuss how the six month old Coalition aims to address leadership and advocacy challenges by working with five bodies representing each corner of the LIS Research world (currently, the British Library, CILIP, JISC, MLA and the Research Information Network). As the first to be appointed to drive the aims of the Coalition, Hazel Hall’s plans moving forward and her vision of a successful first year are also discussed.

In this Talking with Talis Podcast I speak to Hazel Hall, the newly appointed Strategic Leader at the recently established Library and Information Science Research Coalition. We discuss how the six month old Coalition aims to address leadership and advocacy challenges by working with five bodies representing each corner of the LIS Research world (currently, the British Library, CILIP, JISC, MLA and the Research Information Network). As the first to be appointed to drive the aims of the Coalition, Hazel Hallrsquo;s plans moving forward and her vision of a successful first year are also discussed.

by richard.wallis@talis.com at September 15, 2009 01:48 PM

Edward M. Corrado

“Digital Preservation: The Next Library Frontier” IGeLU presentation slides

I have uploaded the PDF of my IGeLU presentation, Digital Preservation: The Next Library Frontier to my personal repository, as well as the Binghamton University Libraries digital repository. Enjoy!

by ecorrado at September 15, 2009 12:31 PM

Michael Giarlo

I2: Survey results

I wrote in June that the I2 subgroup surveyed "repository managers to determine the current practices and needs of the repository community regarding institutional identifiers. Results from the survey will inform a set of use cases that will be shared with the community, and that are expected to drive the development of a new standard for institutional identifiers."

The survey closed in July, and the subgroup spent August writing a report on the survey results. That report is now final and it's available to the public. Feedback may be sent to our (woefully underutilized) public i2info mailing list, left as a comment on this post, or e-mailed to me privately which I can forward to our internal list.

The next step is to build upon the report to draw yet more conclusions from the data — there's an awful lot there — and flesh out some repository use cases for institutional identifiers. The I2 core group is moving quickly towards finalizing identifier metadata elements so that a standard may be drafted, and I think having some use cases documented will help drive the standard in a direction the community can get behind.

Onward and upward.

by Michael Giarlo at September 15, 2009 12:29 PM

Roy Tennant: Digital Libraries

Prince Edward Island Is Where It's At

That's right, Prince Edward Island is the host of the Access Conference (September 30-October 3), long one of my favorite technology conferences of...

September 15, 2009 07:46 AM

Edward M. Corrado: Comments

Comment on IGeLU 2009 thoughts by ecorrado

@LG: Hopefully they’ll be a RUG meeting near you that you can attend. I know EMA is having just as many customer-presented Voyager sessions as IGeLU. Not being on the IGeLU steering committee, I don’t want to speak for them (although I suspect they would say the same thing) but I can tell you that the ELUNA SC takes the customer concerns we here about seriously. While not directly related to incident reporting, ELUNA and IGeLU have been working together on a new enhancement system that should lead to a better process next year.

by ecorrado at September 15, 2009 12:17 AM

September 14, 2009

Edward M. Corrado: Comments

Comment on IGeLU 2009 thoughts by lg

thanks for this update, especially for those of us who have no money to attend either ELUNA or IGELU. hats off to YOU for doing so.

i have some concerns that there are serious customer-related issues that were not mentioned in your synopsis. these include lapses in documentation and in the incident reporting system. i hope the organizations that we are paying membership money to are paying attention to customers’ concerns.

welcome home.

by lg at September 14, 2009 09:07 PM

Comment on IGeLU 2009 thoughts by Gary Johnson

Edward, you wrote a good overview of a really great IGeLU conference in Helsinki. All involved did a wonderful job making it a productive time for everyone there!

by Gary Johnson at September 14, 2009 08:57 PM

Edward M. Corrado

IGeLU 2009 thoughts

From September 5 until September 9 I attended the International Group of ex Libris Users (IGeLU) 2009 annual conference and associated Ex Libris Users of North America (ELUNA) Steering Committee meetings. I had to attend the ELUNA SC meetings because of my role on the ELUNA Steering Committee. There were many topics discussed including how ELUNA can improve communication with IGeLU, and with ELUNA members. Communication with IGeLU actually is going really well and we just need to continue to build on the level of cooperation we have established in the last couple of years.

We have come up with a number of action items regarding communicating to ELUNA member institutions (and potential member institutions). One of the ways we hope to do this is with a new, and hopefully vastly improved Web site. We are planning to replace Drupal with a combination of three tools: For news and our main Web site we will be using WordPress Multi-User, for sharing conference proceedings and other finalized documents we will use E-Prints, and for corroborating on documents we will use Google Apps (mostly the Google Docs portion). Being that I am the ELUNA Web master, I think I have a busy Fall ahead of me.

The IGeLU conference itself was very nice. The local organizers did an excellent job and there was hardly any local arrangement issues. Kudos to the great job the folks in Helsinki did. They raised the bar to almost unattainable heights for those who will follow after them.

The conference itself had two general types of sessions: Ones that Ex Libris put on, and ones the customers put on. The Ex Libris sessions could farther be divided into sessions about the URM and sessions about the current products. The URM sessions did not have a lot of new information in them since ELUNA. It is obvious that Ex Libris has a clear vision and are working towards that, but it is going to take time before they can talk about specifics. Now that the development partners are in place, I expect a lot of work to be done between now and ELUNA 2010 and that they’ll be a lot more meat on the bones at that conference. The sessions on the existing products, Aleph and Metalib, that Binghamton University has also did not have a lot of new developments since ELUNA. Aleph enhancement voting for ELUNA just took place and IGeLU is in the process of voting so they didn’t have any announcements about new enhancements.

The news about Metalib was that Ex Libris is still planning on rolling Metalib functionality into Primo (and providing the Metalib functionality to existing Metalib-only customers at no additional costs). This should happen in Primo 4.0, due out in 2011. Primo and Metalib customers will get some improvements in Primo version 3.1. Since Ex Libris says that they are not going to release a new version of the existing Metalib in order to concentrate on Metalib, Next Generation (yes, that is what they are currently referring to it as), this means non-Primo Metalib customers will not have to worry about upgrading Metalib for a while [1].

There was also some talk about whether or not Primo Central (a product similar to Serial Solutions’ Summons) would be available to Metalib only customers. The IGeLU SC discussed this with Ex Libris and Ex Libris latter announced in the company Q&A session that they are going to consider making Primo Central available to Metalib only customers. My hats off to Ex Libris for listening to the customers about this.

The costumer sessions I went to were very good. Probably the best and most relevant one for Binghamton was “Easy OPAC enhancements” by Matthew Phillips, Systems Librarian, University of Dundee. Mathew should a large number of enhancements to the stock Aleph OPAC that will greatly improve the look and feel of an Aleph OPAC, as well as add some desired functionality. I plan on looking over his slides when the UISC begins to look at making some changes to our Aleph OPAC.

As with any conference, sometimes the personal connections you make can be important. I had interesting conversations about Aleph and other library software with many people. Among them was one of the people responsible for the new Library of the University of Amsterdam Aleph Catalog http://opc.uva.nl/F who offered me some pointers and ideas about how they designed their Aleph interface. Between the presentations and contacts I made at the conference I think it will be much easier to get more out of our Aleph OPAC than it otherwise would.

All and all, a productive, educational, and enjoyable conference. I hope I am able to go to IGeLU 2010 in Belgium.

[1] This does not mean at all that Ex Libris will not be doing knowledge base updates. They are continuing to come, and from my perspective, Ex Libris has been doing a very admirable job with the knowledgeable updates. With new procedures being put in place by Ex Libris, IGeLU, and ELUNA for determining which new resources to create connectors for, I expect that this will even improve more.

by ecorrado at September 14, 2009 08:30 PM

Talis: Nodalities

Linked Data Meetup

On Wednesday, I had the privilege to attend the first Linked Data Meetup down in Hammersmith. The day was a storming success, with talks and presentations from all over the Linked Data community: from academia to startups. I think the organisers were slightly overwhelmed, because in the end there were nearly 200 people there, making use of the Talis-sponsored bar well into the evening. Apart from being a good opportunity to catch up with people, this meetup had the feeling of a guild-meet of Linked Data professionals—with lots of different perspectives over similar problems.

The two panel discussions gave the opportunity for quite a range of different views and topics to be covered, and seemed to well. The first was about Government Data and was chaired by Carol Tullo from the Office of Public Sector Information (OPSI) and included Sir Tim Berners-Lee on a panel of five. The topics covered a swathe of issues with public data, licensing, rights and infrastructure. This panel had a certain gravitas I wasn’t expecting from a semi-formal “meetup”, probably because it was representing the UK’s actual public sector data workers. After much discussion about what it means to “link data” and what count as “LInked Data”, I was left with the important point from the discussion: there are important and well-placed people currently working to make public data public, and I look forward to the potential benefits this will have.

The second panel covered a topic which has become very important to me, and which is strongly tied up with the first: the Future of Journalism. Although I was unable to hear much of this discussion (there were a fair few of us in that hall!), I certainly found the questions asked of the panel particularly acute. There was a particular emphasis on advertising and the future of revenue for news media in an online world. From this panel, I took the view that Journalists report on the public happenings of their nations and worlds, and often what they’re working with is made available by the very institutions “making the news”. So, the work on public data has a strong bearing on journalism and on citizens’ collective knowledge of what’s going on in their worlds. Paul Bradshaw, who chaired this panel, published his notes from the session, which will give a good overview of the topics there!

I won’t report on every talk that happened here, though the programme is still available on the Meetup site, and if anyone has any links to slides or photos they’d like to share, just ping them in the comments. I had a great time, and I left feeling hugely excited by many of the projects and trends discussed there.

by Zach Beauvais at September 14, 2009 04:14 PM

Lorcan Dempsey

The library website: a unified service?

By: dempsey

Categories: Libraries - systems and technologies• Libraries - organization and services• Marketing• User experience• Websites: design and role

I mentioned the reworked University of Michigan Libraries website a while ago.

Although it is still a layer over various other resources, I liked the way that the site aimed to project the library on the web as a unified service not as a set of unrelated opportunities. This extended to the way that links to specialist library advice was included alongside the other resources. [Lorcan Dempsey's weblog - A web-siting at the University of Michigan]

For reasons discussed in these pages many times this layering is not straightforward:

This is largely because they are providing a thin layer over two sets of heterogeneous resources. One is the set of legacy and emerging systems, developed independently rather than as part of an overall library experience, with different fulfillment options, different metadata models, and so on (integrated library system, resolver, knowledge base, repositories, ...). Another is the set of legacy database and repository boundaries that map more to historically evolved publisher configurations and business decisions than to user needs or behaviors (for example, metadata, e-Journals, eBooks, and other types of content, which may be difficult to slice and dice in useful ways). [Lorcan Dempsey's Weblog - Stitching costs]

There is a blog entry by Ken Varnum providing a high level overview of design decisions:

Implemented in Drupal with Solr to power our search results, the site integrates what was a varied and highly decentralized network of library web pages. We pruned and weeded the roughly 100,000 pages on the old site - many of which were leftovers from past redesigns - into about 7,000 pages and 13,000 resource entries. We also introduced powerful new discovery tools for our patrons. [[BLT] Blog for library technology - The New University of Michigan Library Web Site]

I particularly like the way a search in the home-page search box gives back a range of results, including a match to a subject expert if this is possible. They try to put, literally, a 'human face' on the results page.

The catalog search itself is a VuFind deployment, and article search talks to Metalib's x-server, dropping you into Metalib if you request it (I think - I am not an authorized user so can only go so far).

I think the Michigan site does a very nice job in projecting the library service and trying to abstract the user experience away from the underlying database and service boundaries it is managing. Of course, as a 'tourist' I cannot follow trails into resources I am not authorised to use.

Recently, I was directed to the new Libraries website at Miami University. This is on a smaller scale and there is a 'thinner' integration layer over the site as a whole. But, again, I liked the fact that the design goal seems to be to project the library as a unified service offering, carrying navigation and appearance as consistently as it could between parts.

In terms of presentation, I was interested to see some things in common between the sites. They each use Libguides to assemble web-based resources in particular subject areas (including links to pictured library staff). This enhances the sense of the pages as a professionally curated resource, and seems to support better overall UI consistency. Another was the use of Drupal and Solr. And another was the deployment of a separate interface to the native ILS catalog, VuFind in Michigan's case and (I think) MULtifacet in Miami's.

MULtifacet is a Drupal 5 module that wraps a UI around an arbitrary Solr index. It shares many features with other Solr-based OPAC projects, including faceted search results, SMS/email output, RefWorks integration, user tagging, saving/exporting of records, RSS feeds, Zotero support, Google Books images/linking, "more like this", formatted citation output (via the WorldCat API), COinS, unAPI, etc. While developed for, and initially targeted at, library collections, there's no reason it couldn't put a UI on other Solr indexes that need similar features. [MULtifacet]

I had begun this entry when I was interested to see a related discussion emerge on the WEB4LIB list. Tim Spalding asked if anybody was aware of a library website where the various solutions looked as if they were part of the same organic whole. David Walker responded by pointing to the library website at CalPoly.

And as library websites go, it is indeed very nice. I noticed again that Libguides play a part, and am interested to see that you can browse by both subject area and by librarian ('people are entry points').

Even though the website is increasingly a part only of how the library projects its network services, it is important to improve how it does so.

Related entries:


September 14, 2009 12:58 AM

September 13, 2009

Roy Tennant

Your Ideas for "Top Ten Things"

When I was writing my "Top Ten Things Library Administrators Should Know About Technology" I put a message out on Twitter asking for the thoughts of my followers on that topic. And boy, did you ever respond! I decided that it would be worth following up my post with many of the things that my followers said, since as it turned out that I was pretty set on my list and I couldn't by any means include them all. Here you go, and thanks to all who participated!

Vendor solutions still require knowledgeable staff to make them work. A vendor who claims otherwise is lying.

No platform is forever. Ask not only how you'll move onto it, but how you'll move off of it.

It won't solve any of your problems without proper staffing and management policies, which you should allow techies to shape.

Administrators need to know that just because a staff member can support certain tech doesn't mean they can support all tech.

Allow your staff time and resources to experiment - even if nothing comes of it. Innovation comes with risks.

Understand where's a possible single point of failure i.e. rely on only one staff for a critical function.

Believe a staff member's opinion over a vendor's. Always. ALWAYS.

It's OK to fail.

Never depend on technology alone to save your library. Ask "why" and "how" instead of, and/or in addition to, "when" and "how much." Be prepared to answer those questions. Major technology decisions that are made in accordance with unclear policy can find any holes that may exist in that policy, and come back to burn you later. Don't be selectively blind or oblivious. You don't have to hop inside the black box, but make sure your policy can be met by how you choose to implement technology.

Don't say no. If you don't accept technology ideas from your techies, where do you think they'll come from?

more tech != less money; whether vendor or open src, requires approp expertise on staff.

Good project management practices are key.

#1 should be to TRUST those working with the technology to do the right thing with it and to listen.

if they do not know how to manage IT (most legacy lib admin do not), they should hire people who do and get out of their way

Technology is cheap. Having knowledgeable people costs money.

It's not scary; The youngest people on staff aren't automatically techno-geeks.

If you don't know about something, trust the people who do.

Delegate the discovery phase to those who can dedicate more resources to coming up with concise answers to "how," and justify "why."

by Roy Tennant at September 13, 2009 07:58 PM

John LeMasney

Twitter Weekly Updates for 2009-09-13

Powered by Twitter Tools

by John LeMasney at September 13, 2009 02:22 PM

Twitter Weekly Updates for 2009-09-13

Powered by Twitter Tools

by John LeMasney at September 13, 2009 02:22 PM

September 11, 2009

Talis: Panlibus

Google Book Settlement will help stimulate eBook availability in libraries

books_logo So says former Google Book Search product manager Frances Haugen in her contribution to the debate on the September Library 2.0 Gang.

This month’s Gang was kicked off by Orion Pozo from NCSU, where they have rolled out dozens of Kindles and a couple of Sony Readers.  The comparative success of their Kindles ahead of the Sony Reader appears to be because of the simpler process of distributing purchased books across sets of readers and a broader selection of titles at a lower cost.  Currently users request books for the Kindle via an online selection form, then they are purchased and downloaded on to the devices which are then loaned out.  There were no restrictions on titles purchased and they have an approximate 50% split between fiction and non-fiction.

L2Gbanner144-plainThe Gang discussed the drivers that will eventually lead to the wide adoption of eBooks.  This included things like the emergence of open eBook standards, and the evolution of devices, other than dedicated readers, that can provide an acceptable reading experience.   Carl Grant shared his experience of starting a read on his Kindle and then picking it up from where he left off on his iPhone (as he joined his wife whilst shopping).

An obvious issue influencing the availability of eBooks is licensing and author and publisher rights.  This is where the Google Book Settlement comes in to play.  If it works out as she hopes, Frances predicts that over time this will facilitate broader availability of currently unavailable titles.  I paraphrase:

[From approx 26:50] Institutional subscriptions will become available on the 10M books that Google has scanned so far.  Imagine in the future a user with a reader that accepts open formats will be able to get access to the books this institutional license would provide.  Imagine school children having access to 10M books that their library subscribe to, instead of having to formally request one-off books to be added to their device.

[From approx 44:50] There are a huge number of books that are no longer commercially available in the US, for several reasons.  If the rights holders of those books do not opt-out, they will become available for people to purchase access to.  One of the interesting things about the way the settlement is set-up is that you will be able to purchase access either directly or through an institutional subscription.  What is neat is that cycle will put a check on prices as prices for individual books are based upon the demand for the books. So less poplar books will cost less…  So if the price of the institutional subscription ever gets too high libraries can decide to buy one-offs of these books.   I think that whole economic mechanism will substantially increase access to books.

The Gang were in agreement that eBooks will soon overtake paper ones as the de facto delivery format.  It is just a question of how soon.  Some believe that this will be much more rapid than many librarians expect.  A challenge for librarians to take their services in to this eReading world. 

by Richard Wallis at September 11, 2009 04:13 PM

Ed Summers

Documents




I’ve struggled in the past with what constitutes an Information Resource in the context of Web Architecture, Linked Data and practical digital library applications such as the National Digital Newspaper Project I work on at the Library of Congress. So it was reassuring to see the issue come up a few months ago during a review of the effort to revise the HTTP specification (RFC 2616). It would be a major effort to summarize the entire conversation here. However an interesting sub-discussion circled around the idea of normalizing the language in the Architecture of the World Wide Web and RFC 2616 with respect to Resources.

Well into the multi-month thread Tim Berners-Lee offered up a very helpful, historical recap of the “what is a resource” issue , in which he said:

I would like to see what the documents [AWWW and RFC 2616] all look like if edited to use the words Document and Thing, and eliminate Resource.

A Short History of “Resource”

Which, somewhat predictably, started a discussion of what a Document is. However this conversation seemed more tangible and earthy, and culminated in Larry Masinter recommending David M. Levy’s book Scrolling Forward:

… since much of the thought behind it informs a lot of my own thinking about the nature of “Document”, “representation”, “Resource” and the like.

www-tag email message

Now Larry is a scientist at Adobe, a company that knows a thing or two about electronic documents. He also works closely with the W3C and IETF on web architectural issues. So when he suggested reading a book to learn what he means by Document my ears perked up. The interjection of a book reference into this rapid-fire email exchange was like a magic spell, that made me pause, and consider that a working definition of Document was nuanced enough to be the subject matter of an entire book.


I’ve come to expect references to Michael Buckland’s classic What is a Document? in discussions of documents. I hadn’t run across David Levy’s name before so Larry’s recommendation was enough for me to request it from the stacks, and give it a read. I wasn’t disappointed. Scrolling Forward is an ode to documents of all shapes and sizes, from all time periods. It’s a joyful, mind expanding work, that explores the entire landscape of our documents: from cash register receipts, the multi-editioned Leaves of Grass, email messages, letters, books, photographs, papyrus scrolls, greeting cards and web pages. Since this takes place in 212 pages, it is not surprising that the analysis synthesizes rather than being exhaustive. Having received a doctorate in computer science from Stanford, obtained a diploma in calligraphy and bookbinding from the Roehampton Institute, and then worked at Xerox PARC studying the nature of documents for 15 years, Levy’s own professional career is marked by a bringing together of scientific and humanistic disciplines.

One of the key messages of the book is a working definition of the Document. Levy’s draws out his definition largely in contrast to a statement made by David Weinberger in his 1996 Wired piece What’s a Document? where he says:

The fact that we can’t even say what a document is anymore indicates the profundity of the change we are undergoing in how we interact with information and, ultimately, our world.

What is a Document?

To which Levy responds:

We can say what a document is. Doing this, however, requires a somewhat different approach from that which dictionaries take. It requires going beyond word usage. It does require looking at the relevant technologies, but in such a way that we aren’t fixated on them, that we don’t fetishize them. Most of all, it requires immersing ourselves in the social roles these technologies play.

Scrolling Forward p. 23

So Scrolling Forward is a survey of sorts; a survey of document types that are inextricably linked to the social contexts in which they were created. This approach to describing rather than positing a theory of documents dove-tailed nicely with some reading of Wittgenstein I’ve been doing recently. In Wittgenstein’s later period he eschewed positing philosophical theories, but instead attempted to resolve philosophical problems by exploring the richness of language and its use in social settings, or language games, to lay bare the problem in a therapeutic way. Levy takes a similar approach in simply laying out the complex, sometimes contradictory history of documents before us, instead of carving out a logical argument and selecting facts to support it.

Some parts of the book that were of particular interest to me (as a software developer working in the area of digital preservation) were the sections discussing document fixity:

… paper documents, and indeed all documents are static and changing, fixed and fluid. There is a reason why text and graphics editors have a Save button, after all.

Scrolling Forward p. 36

Also of interest was Levy’s analysis of why the idea of “digital libraries” is such a lightning rod of opinion (which perhaps applies to its sister concept “repositories”).

[The] ambiguity between institution and collection is carried through in the phrase “digital library”. For some groups, most notably librarians, the phase refers most directly to institutions that oversee digital collections, while for other professionals, primarily computer and information scientists, it refers to digital collections, without regard to the institutional settings (if any) in which they might be managed … Digital library, it seems to me, draws much of its power from this ambiguity: it provides a name for collections of digital materials that invokes the aura of the modern library and its social mission (library as social institution). But it does so without actually making any commitments to the public good (library as collection).

Scrolling Forward p. 135

And finally, Levy doesn’t shy away from the big questions of how our psychological and religious impulses influence our notions of what documents are.

The human search for and construction of order [...] is our response to the profound mystery, and accompanying anxiety, of existence. Emerging into an unfathomable universe and fearing we are nothing within it, we strive to create a meaningful and ultimately immortal place for ourselves [...] Culture creates the conditions for a meaningful existence, for us to play out our games of physical and symbolic survival. But it is an ongoing performance, a play we can never stop performing, lest we see the back-stage gears and levers and be reminded of the mysterious and terrifying backdrop against which we are performing it. [Documents] are death-transcending, lack-filling artifacts of major proportions. Perhaps they can’t literally prevent our physical demise or fill our deepest sense of lack. But they are the central participants in our attempts to do so. Every one of them — each cash register receipt, each greeting card, each Post-it note — makes a contribution to the collaborative edifice we call human culture. Although few carry the weight of the Bible or the Constitution, all of them inform us of “what is and what we should do”. And in concert they help us create and sustain an orderly, and meaningful human lifeworld.

Scrolling Forward pp. 187-188

Heady stuff to be sure. And now I feel like I’ve traveled far from the beginning of this blog post, and the definition of information resources and the semantic web. Scrolling Forward has given me a very personal perspective on what documents are, and have been–and as a result I’m a bit more hopeful about the future of electronic documents. Working in digital preservation, it’s sometimes pretty easy to give in to despair. I’m not sure what the the application of this perspective is towards the normalization of language in the Architecture of the World Wide Web and RFC 2616. But it seems certain that part of the answer lies in not taking our information technologies too seriously, and trying to stay focused on the roles that they play in our individual and collective lives:

We make a mistake, I believe, when we fixate on particular forms and technologies, taking them, in and of themselves, to be the carriers of what we want either to embrace or resist. Not only do we fail to see the forms and technologies in their full complexity, but we use them, in their symbolic simplicity, as blunt instruments with which to beat one another over the head.

Scrolling Forward p. 198

PS. The bibliography is a great source of new material to read too.
PSS. This blog post was also a not-so-secret experiment in using RDFa and the Bibliographic Ontology to mark up quotations. Check out the rdf assertions you can extract from it using the RDFa Distiller.

by ed at September 11, 2009 04:02 PM

Andrew Pace

What is our ROI?

Welcome back guest-blogger, Matt Goldner.  

goldner_matt.jpg

ROI (Return on Investment) is not commonly used among librarians while it is a key issue for commercial business. This is somewhat natural given our different missions, libraries typically serve the public and businesses serve their shareholders. I have been wondering about a possible application of this same measure to libraries and where it might be applied.

Commercial businesses have over the last half a decade moved more and more of their business processes up to the cloud, e.g. as Web delivered applications. Areas that might have been unthinkable to have stored and maintained offsite are now routinely placed on external platforms. Examples are a business's customer relationship management system, HR systems, payroll systems. Why are they doing this, because it allows them to reduce their costs for these systems and re-allocate time and effort to moving their business forward thus enhancing the opportunity for a better ROI.

Why should this matter to libraries? On what would a library ROI be based? To answer the second question I would say it is how well we have served our constituency. Are they delighted and amazed with our service? Or are we the last ditch resort when they can't find what they need elsewhere.

As to why it should matter if we are spending too much of our financial resources on maintaining infrastructure we are reducing our opportunity to increase the value of our services to our constituency. So perhaps we have something to learn here from commercial businesses who have found value in reducing local infrastructure and moving it to the cloud. It could give libraries the opportunity to increase a good measure of return on investment, delighting and amazing our constituencies.

September 11, 2009 12:55 AM

September 10, 2009

Jonathan Rochkind

jrochkind


So, I have a servlet (based on initial work from Tod Olson at uchicago, expanded by me) to provide holdings information from Horizon in the DLF ils-di “dlfexpanded” format. The servlet code and some documentation is available.

That’s the short statement. It turns out that you can’t really just say that without providing some more specifics, caveats, exceptions, limitations etc. Also it’s worth adding some interesting observations.

Motivation

As we’ve moving ahead with blacklight, we’re going to need to have some way to get item holdings information out of Horizon. By “item holdings information” I mean “copy” information, what items do we have, what are their call numbers, what are their statuses (checked in or out among many others), what are their locations, etc. etc. Everything you’d need to provide an actual OPAC display telling the users what they need to know about our holdings.

A sidenote on terminology: In Horizon there are ‘items’, and sometimes a bib just has ‘items’. But sometimes a big has different sets of items in groups — this is usually used for serials, or occasionally for multi-volume series.  Horizon confusingly calls this set of items a ‘copy’.   The DLF ils-di report calls it a ‘holdingset’.  I have no idea what your ILS calls it. It’s a two-level hiearchy, a bib can contain one or more copies/holdingsets which each contains items.  OR a bib can contain one or more items directly, without the intervening copy/holdingset.

And, the way most people are doing this at present (for a variety of reasons) is checking in realtime at point of demand for this info, not trying to index it. So, okay, go with the conventional wisdom. So I need a realtime service to provide this info from Horizon.

But I figure, as long as I’m doing this, MUCH better to provide the info in some standard format, instead of a custom one. Then, theoretically, the consuming code on the Blacklight end can be written to that standard format, instead of being custom for Horizon.  And my understanding is that the Blacklight team has indeed been thinking/wishing for some standard stuff on the Blacklight end to consume stuff in DLF ils-di format, and/or jangle (which also typically, at the moment, uses the DLF ‘dlfexpanded’ format to actually return data in).

So, okay, that makes sense.

But DLF ils-di format is not a complete spec

So it turns out once you decide to return data in the DLF ils-di “dlfexpanded” format, you’re actually not done deciding what your data is actually going to look like.

The dlfexpanded format is just kind of a coat tree to hang your actual metadata ‘coats’ on.  dlfexpanded lets you give a list of itemIDs and say they belong to a bib; it lets you give a list of holdingsets and say which itemIDs belong to them. Good so far. But to actually describe anything else about those items and holdingsets (location, call number, item status, any user-displayable notes, etc), you’ve got to include additional metadata of your own choosing — dlfexpanded gives you some hooks that it allows you to hang basically whatever other namespaced (and hopefully specified and standardized) XML you want on.

So figuring out what metadata to actually use to describe everything I wanted about my Items and Copies (aka ‘holdingsets’) took a bit of investigating and thinking.

simpleavailability

Sure, I used the dlf:simpleavailability format that dlfexpanded gives you just to say whether something is “available” or not (and provide a custom user-displayable string conveying that).

Although I ended up only providing that at the item level. The dls-di report seems to assume the client could ask for ‘availability’ at the bib or holdingset level too. But I wasn’t even sure what the semantics of this should be, and figuring out the code to this without impacting performance (more on performance later) was tricky. So, okay, the client can look at the availability on all items and figure out how to sum them up at the bib or copy level itself, if needed (I’m not sure I’ll even need to, for my use cases).

But I want to say a lot more about my Items and Copies than simpleavailability. I want to include enough data that my complete OPAC screen could be replicated by third party software.

mfhd

So after hunting around for available ’standard’ options, I settled on good old MFHD — expressed in marc-xml.   I considered the new fangled “ISO Holdings”, but limited public documentation is available, and from looking at the schema that is available, it didn’t look like ISO Holdings would let me express anything that MFHD didn’t. Sure, MFHD is kind of a bear for the developer to work with, with all those opaque numeric codes, but oh well, went with the known evil, MFHD.

Except I’m not really using mfhd as is typical. I use just enough of it to express what I want.  I include kind of a dummy ‘leader’ just for the sake of appearances, since there’s nothing in the leader I actually need. In standard MFHD usage, you would rarely (never?) have an individual MFHD record just for an item, but the dlfexpanded “coat tree” gives me hooks to hang MFHDs for individual items, and that makes it a lot more convenient to express and retrieve things unambigously, so why not. So anyway, it’s MFHD, but I’m not neccesarily saying any existing MFHD-processing tools will be able to do much with it, I’m using it so unusually (although not illegally in any way as far as I can tell). Oh well, at least it’s a standard format.

Interestingly, while MFHD theoretically lets you express serial run statements in a machine readable form…  A) I don’t have that info in my ILS anyway, and B) that machine readability in the way mfhd has you express it is a lot more theoretical than practical.  So I’m not doing that.  If my ILS had the data, I’d probably express it in the more straightforward ONIX Serial Coverage Statement instead of MFHD.  (Note to ONIX people — why oh why do you only provide the actual schema in a zip file online? You used to provide it individually. Very inconvenient.)

But wait, there’s more

But to completely express all the data I’d need to duplicate my OPAC display in external software, mfhd still didn’t quite do it for me. Mostly, I wanted more internal ILS codes.  mfhd lets me express ‘location’ and ‘collection’ as user-presentable strings, but I want to reveal my internal non-mutable codes for these too. mfhd doesn’t let me express the concept of ‘item type’ that’s in my catalog at all!

So after looking around some more for something to do that, I gave up and just created my own very simple XML schema to do it, which I’m calling “ILS holdings schema” for expressing internal codes and such, in case you want to.

And one more plug for DAIA

And as I alluded to my last post, I’m using DAIA too — at this point solely to expose the URL that can be accessed to issue a ‘request’ for the item through HIP.  This is a bit against the spirit of DAIA, since exactly what a ‘request’ will do is unclear [recall a checked out item, or only add you to a hold list?  Let you check it out, or only request it to be provided in the special collections reading room?  Deliver it to a circ desk, or actually to your office (as we provide to some people). Who knows!]

And worse, I’m not able to actually pre-check if ‘request’ really is available or not, for reasons discussed in the last post.  Which is really against the spirit of DAIA.

But oh well, it was such a nice little schema for simply revealing a URL for a service, and my OPAC ‘request’ feature is a service… so I used it.

At some later point I hope to go back and make a real nice DAIA response, but it’ll be a buncha work, which isn’t required by the specs of the project I’m working on presently.

Oh, and I only provide DAIA at the item-level too, not at the Copy or Bib level. (I think some people’s Horizon setups actually do allow Requests at the Copy or Bib level, but not ours, so I couldn’t quite figure out how it should/would work and didn’t have time for it).

Performance Issues

So I think the servlet is reasonably fast, but the trickwhen you’re developing an API that’s going to be used by other software is… “reasonable” gets a lot less forgiving. I mean, let’s say there’s a search result ‘hit list’ with 20 hits on it — my software might want to call this API 20 times for one web page!  A 0.2 second response time might be pretty good for a user-facing web app, but not for an API that needs to be called 20 times to deliver one page to the user.

So I might have some speed issues, that theoretically I can optimize to some extent. (Although I’m not looking forward to it. Java is not my specialty. If I had to do it over again, not sure I would have done this in Java, although it made sense at the time for several reasons. And if I were going to do it in Java, I think I’d want to use a framework of some kind, not do it with the pretty low-level stuff that JDBC and Servlet APIs alone give you. But that would result in it’s own trade-offs.)

But perhaps worse than the speed issues are some response size issues. I took a look at the response for a bib I knew would have a lot of items — JAMA, with dozens of holdingsets and hundreds or more items. The dlfexpanded response was 1.2 megs!  That might be an issue for sending accross the network, loading into memory, and parsing the XML on the client side.

It’s so large in part because there’s some redundancy in the multiple metadata formats we use to express everything.  A basic schema-less ad hoc uchicago-created XML response for the same data is only 220k. Which is still pretty big.

So, I provided some extra query parameters (not specified in dlf ils-di of course) to allow the client to limit the data returned, if it doesn’t really need all of it. The client can choose which metadata payloads it wants for items or copies, instead of taking all of them. And the client can choose NOT to have items included in a response that includes copies, just to include the copy information, and let the client ask for the item info later if it needs it.

We will see how it goes.

Standard or not? Workable or not?

So, okay, I’m providing my info in the DLF ils-di ‘dlfexpanded’ format, but how standard is it?  If someone says “Oh yeah, I have code that can consume dlfexpanded”, does that mean it will automatically work with my (or anyone elses!) dlfexpanded info?

Doubtful.  You’ve got your choice of metadata payloads to hang on that ‘coat tree’, and everyone can choose different things. Even once you’ve chosen, two people providing the same ones may be using them slightly differently (as evidenced by a few choices I had to make here and there with how to use mfhd).

On top of that, for performance related reasons, or to fit ‘dlfexpanded’ into the actual use cases I have (which go beyond simple DLF “getAvailability”), my dlfexpanded responses sometimes don’t include everything — just because there are no ‘items’ listed in the response doesn’t necessarily mean there are no items, they might have been suppressed based on the request parameters for performance. And, those request parameters are non-standard, but I think (at least for my use cases), the client is really going to need to use them to avoid a performance nightmare.

Or, if you asked my API for info on a certain item, you get a dlfexpanded response that only has that item in it, not all the other items belonging to the same bib, which may or may not be misleading or confusing to the consumer.

Meanwhile, I’ve only written the producer end of things so far, I haven’t even written the consumer. When I get around to writing the consumer, I’m probably going to run into even more tricks and problems requiring me to go back and revise, including but not limited to performance stuff.

So we’ll see. I don’t blame the DLF ils-di task force for this; they did a great job. But we make the map as we tread the path, there’s no way to map out everything without actually trying it in practice first, and trying it in a bunch of different use cases and scenarios to abstract out the commonalities.  So, we’re figuring it out as we go, that’s the only way to do it, and the ils-di task force wisely recognized that and didn’t try to map everything out in advance.

Still, it means this stuff is trickier than it might originally seem. The specs, standards, and best practices are not “done”, not even close.  We’ve got to figure out a bunch of stuff.

Posted in General

by jrochkind at September 10, 2009 09:00 PM

September 09, 2009

Roy Tennant: Digital Libraries

How To Release Open Source Software

Recently an institution that I respect released some open source software. Being interested, I immediately downloaded it and started checking it ou...

September 09, 2009 10:55 PM

The Best Live Presentations Make the Worst Archived Ones

Many of the best PowerPoint (or Keynote, or whatever) presentations are virtually devoid of words. Slides usually consist an image or a graphic, an...

September 09, 2009 12:32 AM

September 08, 2009

Talis: Nodalities

William Tunstall-Pedoe talks about True Knowledge

In my latest podcast I talk with William Tunstall-Pedoe, CEO of Cambridge-based True Knowledge.

We discuss the company’s ‘Answer Engine’ and consider the ways in which its technology can be seen as complementing rather than competing with more traditional search engines.

During the conversation, we refer to the following resources;

This conversation was recorded on Monday 7 September, 2009.

For other Talis podcasts in this Nodalities series, see here

In my latest podcast I talk with William Tunstall-Pedoe, CEO of Cambridge-based True Knowledge. We discuss the company's 'Answer Engine' and consider the ways in which its technology can be seen as complementing rather than competing with more traditional search engines. During the conversation, we refer to the following resources; Bing Cambridge University David Letterman page on Wikipedia Powerset Siri, and my podcast with CTO Tom Gruber True Knowledge True Knowledge API True Knowledge Browser plugins True Knowledge Local Search Wolfram Alpha This conversation was recorded on Monday 7 September, 2009. For other Talis podcasts in this Nodalities series, see here

by richard.wallis@talis.com at September 08, 2009 11:23 AM

September 06, 2009

John LeMasney

Twitter Weekly Updates for 2009-09-06

Powered by Twitter Tools

by John LeMasney at September 06, 2009 02:22 PM

First Monday

September 04, 2009

Lorcan Dempsey

Libraries and publishing: a couple of examples

By: dempsey

Categories: Books, movies and reading ...• Research, learning and scholarly communication• ebooks and other e-resources

As interaction with the book literature, publishing, the role of large print collections, and research and learning behaviors are all changing in a network environment, academic libraries have been looking at their role in the scholarly communication and publishing process. I came across two examples of library activity which prompt this note ....

The Economist has a short article on the Google Book Search settlement (who doesn't?). A colleague and I had a look at the two works it mentions as examples of orphan works. One was The Appalachian frontier America's first surge westward by John Anthony Caruso.

Worldcat tells me about three versions of The Appalachian Frontier.

  1. Caruso, John Anthony. 1959. The Appalachian frontier; America's first surge westward. Indianapolis: Bobbs-Merrill.
  2. Caruso, John Anthony. 2003. The Appalachian frontier: America's first surge westward. Knoxville, TN: University of Tennessee Press.
  3. Caruso, John Anthony. 2003. The Appalachian frontier America's first surge westward. Appalachian echoes. Knoxville: University of Tennessee Press. http://www.newfoundpress.utk.edu/pubs/caruso/.

The third is a digital version provided by the Library at the University of Tennessee Knoxville through its imprint Newfound Press. The Press is described as follows:

The University of Tennessee Libraries is developing a framework to make scholarly and specialized works available worldwide. Newfound Press, the University Libraries digital imprint, advances the community of learning by experimenting with effective and open systems of scholarly communication. Drawing on the resources that the university has invested in digital library development, Newfound Press collaborates with authors and researchers to bring new forms of publication to an expanding scholarly universe. We consider manuscripts in all disciplines, encompassing scientific research, humanistic scholarship, and artistic creation. [Welcome to Newfound Press]
In partnership with the library's Digital Library Initiatives and the UT Digital Repository, Newfound Press applies technology to promote emerging forms of scholarly inquiry and discovery of the results. The University Libraries and the University of Tennessee Press enjoy a longstanding, friendly relationship. Newfound Press draws on these advantages to pursue its mission of expanding access to scholarly and specialized work in the global environment. We aspire to prominence as a publisher of choice. [About us]

The second example is a joint initiative between Cambridge University Press and Cambridge University Library called Books of Enduring Scholarly Value. From the press release:

Cambridge University Press has launched the Cambridge Library Collection, a new project reissuing books of enduring scholarly value. It uses state-of-the-art scanning and print-on-demand technology to make accessible in new ways works that until now would have been available only in libraries.
Already a pioneer in the re-publishing of titles from its own backlist, Cambridge University Press is extending its reach to include other books which are still of interest to researchers, students and the general reader. The Press's unique relationship with Cambridge University Library allows access to a vast range of out-of-print and out-of-copyright titles.
With subjects ranging from travel to maths and from life sciences to music, the Cambridge Library Collection allows readers access to books they would otherwise struggle to find. Writings by Charles Darwin and his circle, and studies on Shakespeare, Jane Austen, Handel and Wagner all appear, alongside the works of the greatest nineteenth-century mathematicians, and a range of books on Cambridge from the serious to the scandalous.
The Press takes advice from experts worldwide on what books in their subject areas should be included in the programme. Each book is carefully scanned, and the resulting files undergo a rigorous process of cleaning, in which any blemishes are removed to obtain a crisp and legible text. Each book has a new cover design and a specially written blurb which highlights the relevance of the book to today's readers.
The Cambridge Library Collection launched on 20 July with 475 titles, to celebrate the 475 years since the Press was granted permission to print 'all manner of books' by Letters Patent of Henry VIII. By the end of 2009, this number will have grown to over 1,000. [Cambridge University Press launches a new venture to reissue rare and out-of print books]

And here is a video, which shows among other things, the actual scanning in action ...



The Cambridge example is discussed in an article in Inside Higher Ed which also refers to the recent announcement by the University of Michigan about reprint on demand services for thousands of its rare books in collaboration with Booksurge. It also points to Cornell's arrangement with Amazon for reprint services on 80,000 titles through the Cornell University Library Bookstore.

These examples show an area of growing interest. They also underline how books will appear through multiple channels with consequences for discovery and fulfillment options.


September 04, 2009 09:44 PM

Metadata redux

By: dempsey

Categories: Metadata

I was asked in a meeting recently to define metadata. This prompts me to adapt some text recycled from All that is solid melts into flows* ...

Like most people ;-), I tend to think about metadata as 'schematized assertions about resources': schematized because patterned and machine understandable; assertions because they involve a claim about the resource by a particular agent; resource because any identifiable object may have metadata associated with it. Metadata is useful because it relieves a potential user (a person or a program) of having to have full advance knowledge of the characteristics or existence of the resource. In other words, metadata provides 'intelligence' which supports more efficient operations on resources. Examples of operations are discovery, preservation, purchase, reformatting, embedding, analysis, extraction of components, and so on.


* And sorry, I couldn't resist keeping in the smiley ....


September 04, 2009 03:33 PM

September 03, 2009

Lorcan Dempsey

Muldoon and Colbert

By: dempsey

Categories: Books, movies and reading ...

I have mentioned Paul Muldoon in these pages several times. I could not resist linking to this appearance of the Northern Irish poet, Princeton faculty member, and poetry editor of the New Yorker, on The Colbert Report. The best bit is at the end.

The Colbert ReportMon - Thurs 11:30pm / 10:30c
Paul Muldoon
www.colbertnation.com
Colbert Report Full EpisodesPolitical HumorHealth Care Protests

Here is Muldoon's Worldcat Identity.


September 03, 2009 09:01 PM

Talis: Panlibus

Google Book Scanning Project – Issues and Updates

google-logoLast night I listened to another Educause webinar – something that is developing into a (good) habit. This week’s was entitled The Google Book Scanning Project – Issues and Updates, and featured presentations and discussion between Dan Clancy, Engineering Director of Google Book Search, and Jonathan Band from the Library Copyright Alliance.

Even though the current negotiations are US-specific, it’s still a good idea for librarians everywhere to keep themselves up-to-date on progress on this area. This webinar provides a useful overview of the project, but if you haven’t got a full hour to spare, a recent article written by William Skidelsky in The Observer – Google’s plan for world’s biggest online library: philanthropy or act of piracy? – should also do the job.

So I’ll leave it to those two sources to cover the basics. However, there are a number of concepts that are important to understand in order to follow the debate between the two sides, which is what this blog posting is really about.

First of all, Google is categorising all the books it scans into one of the following:
a. Public domain – defined as having been published before 1923.
b. Books published after 1923, but which are either out of print or orphaned works (around 75% of all books scanned).
c. Books still in print.

Secondly, Google is planning to offer a number of different diverse access models, the most noteworthy being:
a. Preview uses
b. Online consumer access – enabling users to buy online access to individual works under a pricing regime set by either the rightsholder or Google.
c. Institutional subscription – on a FTE basis, for HEIs and corporations
d. Public Access Terminal – one free terminal per US public or university library.

Thirdly, an independent Books Rights Registry (no website as yet) will be set up to represent rightsholders and to collect and distribute revenues as well as resolve disputes.

Well that covers a lot of Dan Clancy’s presentation, although it’s worth mentioning in passing that Clancy does come across as being genuinely philanthropic, as the Observer article also noted.

So now let’s move onto Jonathan Band, who was there really to cover the pros and cons of the project as it currently stands.

Band had many good things to say about the Google Book Settlement, painting a rosy picture of where we’ll be if the Settlement is approved. Firstly, of course, Google will be able to continue scanning books into its search index. Notable benefits for users include free access to users to full-text through public access service terminals, and the ability to purchase access to out of print books for relatively low cost. Meanwhile, institutions will be able to purchase access to the full text of millions of books, and those that are participating in the project will receive digital copies of their collections.

As Band said, all in all there’s a lot to like.

And yet the project has generated considerable controversy. Why is this?

One frequently made argument is the absence of competition for what is bound to become an essential facility. Google has already scanned 10 million books in 5 years so it has a huge competitive advantage. So here is a situation in which there is enormous demand, yet there is no other supplier, so there is a risk of a cost-prohibitive subscription which might undermine equity of access, privacy and intellectual freedom.

The business model is also contentious. Together, Google and the Books Rights Registry (with arbitration if necessary) will set the price of the institutional subscription. Google’s objectives in pricing are the realisation of revenues at market rates and of broad access to books. The parameters for pricing include pricing of “similar products and services”, and Band is concerned that if eJournal subscriptions are used as a benchmark, then the subscription could be cost-prohibitive for many institutions.

Only Google’s library partners have the right to a separate price negotiation route. And even then, refund is limited to Google’s share (37% of price).

For Clancy, the solution is that rather than ask the court to reject the Settlement, we should ask the court to closely supervise the interpretation and implementation of the settlement, given that this is a natural monopoly needing regulation. Brand is also anxious to ensure diverse composition of the Book Rights Registry, encompassing author representation in particular.

Clancy countered this by emphasising that Google cares deeply about the pricing, and is making this investment because it believes in broad access; a limited access project will be inconsistent with their vision. Clancy compared the planned price of a typical book under the terms of the Settlement with the price of a journal article, which can cost around $30. To me this seemed like a fudge. The original argument that Band made was around the cost of the institutional subscription, so why didn’t Clancy use the price of an eJournal subscription as a comparator? He also argued, though, that the vast majority of books will be cheaper than ILLs.

Clancy didn’t touch the issue of competition, emphasising customer choice instead i.e. libraries can decide that the subscription is too expensive and instead opt for free services. Again, this lacked conviction. No library worth its salt would build its collection on such a restrictive basis. He did mention the lack of competition and choice in the eJournal marketplace though.

He also dismissed the suggestion that people will get rid of their physical books as seeming stupid. Actually this seemed strange, as Band hadn’t mentioned that argument.

The killer argument for me was made by Band towards the end of the webinar. He argued that we all want to trust Google. The Settlement is fundamentally desirable. And the people who are at Google right now seem eminently trustworthy. However, ownership can change, and that is why some degree of quasi-regulation is necessary. Clancy could only reply by saying that Google’s library partners (i.e. only the partners and not libraries as a whole) would have the right to arbitrate with Google if they felt the pricing was unfair.

by Sarah Bartlett at September 03, 2009 04:07 PM

Niche Print on Demand services on the rise

university-of-michiganToday in the Times Higher Education (THE), Matthew Reisz reports on the growth of niche print on demand (POD) services offered by academic libraries and university presses in both the UK and the US. While the Google Book Settlement moves through its long and laborious negotiation process, a small handful of libraries have taken the initiative and are making an increasing number of books available via print on demand.

Probably the most well-known of these is University of Michigan’s growing Michigan Historical Reprint Series, which recently announced the availability of 400,000 additional titles. But the THE  article also highlights similar developments at Cornell University Library and Cambridge University Press.

These developments surely deserve our congratulations. They have succeeded in generating a new revenue stream, which, as all librarians know, is easier said than done, especially in what is now a very risk-averse climate. Furthermore, any initiative that broadens availability of long tail publications has got to be a good thing. And lastly, whatever we think of the Google Book Settlement, a bit of healthy competition can only be a good thing for all parties.

by Sarah Bartlett at September 03, 2009 01:16 PM

September 02, 2009

Jonathan Rochkind

jrochkind


So DAIA is a nice little response format-slash-API specification from Jakob Voss.

It’s focused on a very specific goal: describing what services are available for a given item, possibly providing URLs to access that service for a given item, telling the user how long they’ll have to wait to get that service, etc.

Some more specific scenarios mapped to my library might make things more clear. For a given item and user, that user might be able to:

  • Look at the item in the library. Which they might be able to do immediately (upon finding it in the stacks), or there might be a 1 or 2 business day delay because it’s in some kind of closed stacks or offsite storage, and they’re going to have to request it.
    • OR, there might be a longer delay, because the item is currently checked out, and they’re going to have to wait until it comes back — or maybe they have ‘recall’ privileges, and there’s still a delay, but shorter!
  • Check the book out?  Again, maybe they can, or maybe they can’t at all. If they can, maybe they’re going to have to first ‘recall’ it (if they’re allowed to), with a longer delay.
  • Request the book for delivery to a circ desk?  Related to recall/checkout, but in rare cases they might be able to request delivery to a circ desk, but only view it in library! And there are cases where they might be able to check it out, but NOT request delivery.  Or where they can request delivery, but they won’t get it until the book comes back on it’s own, they have no ‘recall’ privileges.

Now, the answers to these questions, once determined, are easily expressible in DAIA, no problem.

The problem is, as the complicated foregoing discussion may have hinted, that determining the answers to these questions from our ILS is enormously complex. All the info is in the ILS somehow. In the end, either the ILS is going to allow a ‘request’ or a ‘loan’ or a ‘recall’, or it’s not.  And there’s info in the ILS to let us predict what’s going to happen, and estimate how long it’ll take until the user gets access (as DAIA allows us to express once we’ve figured it out).  It’s all there somehow — but trying to figure out how to actually predict it, oh boy, I get confused really quick. There are dozens of different tables I need to consult in the ILS, and figure out how they interact and which takes priority or overrides which other.  Privileges can be set on item statuses, locations, groups, etc. Borrower statuses, groups, types, etc. And they are not set, in my ILS Horizon, in only one place, but in dozens of different places with different semantics that all interact in ill-defined ways.

Phew.

It seems like something a user would expect, in this day and age, that when they look up a book the listing could actually TELL them if they can check the book out (and how long they’ll have to wait to get it, if there’s a recall involved, etc), if they can view it in the library, if they can request it for delivery, etc.  Our ILS is currently incapable of doing that — to the extent that it even always displays a ‘request’ button, and the user has to actually click on it to find out if they actually can make a request or not.  Which is generally the only way a user can find out what services are available, by trying them.  Which depending on the service may or may not be able to be done over the web (can you look at it in the library? Who knows unless you go there and try. Or call a librarian and hope they aren’t as confused as I am!).  You want to know how long you’re probably going to have to wait to get it?  Too bad.

At first I optimistically thought I could calculate all this stuff from the ILS, deliver it in DAIA, and then use it in new interfaces to actually tell the users what they’re going to want to know. DAIA is quite up to it.  But writing code to actually calculate these things — very non-trivial.  Not so happy with Horizon right now.

Anyone reading this know about the open source ILS’s?  Would this be easier in any of them?

Posted in General

by jrochkind at September 02, 2009 06:21 PM

First Monday

Latinas cross the IT border: Understanding gender as a boundary object between information worlds

How do successful Latina IT professionals construct and reconstruct their conceptions of gender before, during, and after contact with the information world of the IT industry? We conducted semi-structured phone interviews with five Latinas who held senior management positions in IT firms in 2008 to explore their reasons for choosing and persisting in the workforce. Using the theory of information worlds as a framework for analysis of gender as a boundary object, this article reports four trends in the perceptions of these women dealing with the relationship between gender and success in IT.

by Amelia Gibson (and04g@fsu.edu) at September 02, 2009 02:44 PM

Roy Tennant: Digital Libraries

Massive Storage For a Song

What if I told you that for under $8,000 you could have 67 Terabytes of storage? Would you call me crazy? If you were to price today's storage solu...

September 02, 2009 01:25 AM

September 01, 2009

Lorcan Dempsey

QOTD: public libraries and social engineering

By: dempsey

Categories: Libraries - organization and services

From the Preface to Books, buildings and social engineering:

We seek to rescue the historic public library building from a perspective that in the past has all too easily denigrated it. The mass construction - for that is what it eventually became - of early public library buildings is to be celebrated as a successful exercise in social engineering, an unprecedented effort by a public-sphere movement to persuade citizens, in a consensual way, of the importance of knowledge in an increasingly open and modernising society (we have deliberately used the term 'social engineering' as opposed to the narrower and misleading term 'social control'). More precisely, the provision of what amounted to over one thousand library buildings in the period 1850-1939 was an exercise in what Karl Popper called 'piecemeal social engineering' - social engineering that was incremental and continually amended in the light of experience and new knoweldge, a scientific approach to planning social change, in this instance via the free provision of books, information, education and knowledge, that reflected the essence of the public library as an institution of progress and scientific inquiry. [Black, A., Pepper, S., & Bagshaw, K. (2009). Books, buildings and social engineering: Early public libraries in Britain from past to present. Farnham, England: Ashgate Pub.]

September 01, 2009 10:27 PM

Edward M. Corrado: Comments

Comment on Grokker gone :-( by ranti

Bummer. I happen to like Grokker as well; I still have their desktop application (too bad it no longer works.)

by ranti at September 01, 2009 05:15 PM

Edward M. Corrado

Grokker gone :-(

We received an e-mail from Binghamton University Libraries has been one of the few Academic Libraries using Grokker for visual search of scholarly materials. Binghamton began collaborating with Groxis on the project in Fall 2004 and went live in January 2005 (more than 3 years before I arrived here). Since starting at Binghamton, I have really liked the visual search interface it provides and used it rather often. It is not to say I don’t like our other federated search product, Metalib, I do and in many ways it was better. However, they both had the strength and weaknesses and I will miss the visual result sets from Grokker.

For more information, see Groxis CEO Randy Marcinfo’s comment about the financial situation of Groxis on Steve Arnold’s Beyond Search Web log.

by ecorrado at September 01, 2009 03:15 PM

Jonathan Rochkind

jrochkind


I have returned from my leave of absence, and am back at work.

Posted in General

by jrochkind at September 01, 2009 01:32 PM

Roy Tennant: Digital Libraries

Twitter Is the New RSS

After going a full six weeks without reading my blog aggregator, I have to wonder what's up. What's up, I can tell you, is that if it isn't hot wit...

September 01, 2009 02:55 AM

August 31, 2009

Lorcan Dempsey

Strategic reading and bouncing

By: dempsey

Categories: Books, movies and reading ...• Metadata• Research, learning and scholarly communication• User experience

I was very taken by Timothy Burke's presentation at the first open meeting of the Library of Congress Working Group on Bibliographic Control. What was especially interesting was how he outlined a variety of ways in which he needed to interact with the literature. In his own specialties, he expected to have comprehensive knowledge of what was published. Occasionally he might prospect a new area, wanting to get a sense of its 'shape'. He might need to understand something of the research area of a student. And so on. This variety of approaches led him to outline a set of capacities that would be useful to him, summarized in the meeting report as follows:

  • the ability to recognize clusters of knowledge production (persons and subjects),
  • the lineage of publications (i.e., how they exist in chronological relationship to each other),
  • the ability to make previously unknown connections among resources,
  • the ability to make serendipitous or unforeseen connections among topics,
  • identification of the authoritativeness of sources,
  • the popularity/amount of use of a resource, and
  • the sociology of knowledge, for example the "pedigree" of authors and publishers.
[Users and Uses of Bibliographic Data Meeting - Meetings - (Library of Congress)]

Now, he does not talk about 'discovery' or 'search'; he talks about patterns, relationships, judgements. That said, I was also interested that for his specialist areas he was able to name appropriate Library of Congress Subject Headings.

It seems to me that that phrase "strategic reading" might cover what he is doing quite nicely: his pattern of reading is determined by a particular goal, and will be different depending on the goal. His goals are probably variably well supported by our bibliographic apparatus (and remember he is also the author of the modestly notorious Burn the catalog.)

"Strategic reading" is a phrase used by Alan Renear and Carole Palmer in a recent contribution to Science Magazine: Strategic Reading, Ontologies, and the Future of Scientific Publishing (Science 14 August 2009: 828-832) (behind a paywall), and in the associated news story. Their focus is more on the techniques adopted to do some of what is discussed above, than the patterns themselves.

Scientists have always strived to avoid unnecessary reading. Like all researchers, they use indexing and citations as indicators of relevance, abstracts and literature reviews as surrogates for full papers, and social networks of colleagues and graduate students as personal alerting services. The aim is to move rapidly through the literature to assess and exploit content with as little actual reading as possible. As indexing, recommending, and navigation has become more sophisticated in the online environment, these strategic reading practices have intensified. ...
... They sweep through resources, changing search strings, chaining references backward and citations forward, dodging integrator and publisher sites to find open-access copies, continually working to reduce the number of clicks required for access. By note-taking or cutting and pasting, scientists often extract and accumulate bits of specific information, such as findings, equations, protocols, and data. In this process, rapid judgments are made--such as assessments of relevance, impact, and quality--while search queries are being formulated and refined. (Fig. 3). The goal often seems to be undifferentiated assimilation of information about a domain or a problem at hand, and the online experience may be highly valuable, even though no clear aim is met and no articles to read are located. [Allen H. Renear and Carole L. Palmer. Strategic Reading, Ontologies, and the Future of Scientific Publishing. Science 325, 828 (2009) - behind a paywall]

They go on to refer to the analogy with the remote control proposed by David Nicholas et al in their work on information behaviors. We have become bouncers and flickers, moving horizontally though the literature.

Just as the aim of channel surfing is not to find a program to watch, the goal of literature surfing, is not to find an article to read, but rather to find, assess, and exploit a range of information by scanning portions of many articles. This behavior is common among scientists (9). [Allen H. Renear and Carole L. Palmer. Strategic Reading, Ontologies, and the Future of Scientific Publishing. Science 325, 828 (2009) - behind a paywall]

What types of services are strategic readers most likely to use?


(9) D. Nicholas, P. Huntington, H. R. Jamali, T. Dobrowolski, Inf. Process. Manage. 43, 1085 (2007).


August 31, 2009 01:26 AM

August 30, 2009

John LeMasney

Twitter Weekly Updates for 2009-08-30

Powered by Twitter Tools

by John LeMasney at August 30, 2009 02:22 PM

Twitter Weekly Updates for 2009-08-30

Powered by Twitter Tools

by John LeMasney at August 30, 2009 02:22 PM

August 28, 2009

First Monday

Reinventing academic publishing online. Part II: A socio-technical vision

Part I of this paper outlined the limitations of feudal academic knowledge exchange and predicted its decline as cross-disciplinary research expands. Part II now suggests the next evolutionary step is democratic online knowledge exchange, run by the academic many rather than the few. Using socio-technical tools it is possible to accept all, evaluate all and publish all academic documents. Editors and reviewers will remain, but their role will change, from gatekeepers to guides. However, the increase in knowledge throughput can only be supported by activating the academic community as a whole. Yet that is what socio-technical systems do --- activate people to increase common gains. Part 1 argued that scholars must do this or be left behind in the dust of progress. The design proposed here is neither wiki, nor e-journal, nor electronic repository, nor reputation system, but a hybrid of these and other socio-technical functions. It supports print publishing as a permanent archive byproduct useful to a living, online knowledge exchange community. It could also track academic submissions, provide performance transcripts to promotion committees, enable hyperlinks, support attribution, allow data-source sharing, retain anonymous reviewing and support relevance and rigor in evaluation. Rather than a single "super" KES, a network of online systems united by a common vision of democratic knowledge exchange is proposed.

by Rob Friedman (friedman@njit.edu) at August 28, 2009 09:10 PM

Presidential Web sites and the Georgian-Russian War, 8-16 August 2008

During the war between Georgia and Russia, 8-16 August 2008, the Web sites of Presidents Mikheil Saakashvili and Dmitry Medvedev were used actively to promote their countries´ conflicting views on the war. This article considers the structure of the two Web sites, their use during the war and their place in the media systems of Georgia and Russia.

by Robert W. Vaagan (robertwvaagan@gmail.com) at August 28, 2009 09:10 PM

The voice from the base(ment): Stridency, referential structure, and partisan conformity in the political blogosphere

The Internet has become a critical medium for American politics: in 2008, almost half of American adults looked for political information online, and 30 percent of Internet uses contributed to online political discussions. Using the candidacy of Sarah Palin as a case study of a provocative political event, this paper examines the tone, partisan leanings, and referential structure of six elite blogs. First by randomly sampling overall trends of Palin coverage and then by performing a quantitative content analysis of a sub-sample of posts, this paper finds that the valence and stridency of blog posts vary by partisan identification, and that stridency dramatically affects the referential structure of posts. Although the referential structure of blog posts varies significantly by blog, it does not vary along partisan lines. Nonetheless, the relationship between stridency and partisan conformity exposed by this paper illustrates a trend amongst conservative blogs to repeat the allegations of “liberal media bias” often voiced by traditional conservative media outlets, contributing to an “echo chamber” effect in the blogosphere.

by Elizabeth Anne Roodhouse (eroodhouse@asc.upenn.edu) at August 28, 2009 09:10 PM

The relationship between public libraries and Google: Too much information

This article explores the implications of a shift from public to private provision of information through focusing on the relationship between Google and public libraries. This relationship has sparked controversy, with concerns expressed about the integrity of search results, the Google Book project, and Google the company. In this paper, these concerns are treated as symptoms of a deeper divide, the fundamentally different conceptions of information that underpin the stated aim of Google and libraries to provide access to information. The paper concludes with some principles necessary for the survival of public libraries and their contribution to a robust democracy in a rapidly expanding Googleverse.

by Vivienne Waller (vwaller@swin.edu.au) at August 28, 2009 09:10 PM

What value do users derive from social networking applications?

Why do users of social networking sites recommend some applications to their friends while rejecting others? What value do they obtain from applications? This exploratory study investigates the value users derive from ‘cool’ Facebook applications, and explores the features that either encourage or discourage users to recommend applications to their friends. Our qualitative data reveal consumers derive a combination of functional value along with either social or emotional value from the applications. Female Facebook users indicate self-expression as important motivators, while males tend to use Facebook applications to socially compete. Three broad categories emerged for application features; symmetrical features can both encourage or discourage recommendation, polar features where different levels of the same feature encourage or discourage, and uni-directional features only encourage or discourage but not both. Recommending or not recommending an application tends to be the result of a combination of features and context, rather than one feature in isolation.

by Rebekah Russell-Bennett (rebekah.bennett@qut.edu.au) at August 28, 2009 09:10 PM

From PDF to MP3: Motivations for creating derivatives

With increasing frequency, authors are licensing their works in such a way so as to permit others to create derivative works. In some cases, these derivatives extend the impact of a work by providing a translation into another language or modifying the file format to make it more accessible. The Internet is increasing people’s ability to create and distribute these derivatives. Seventeen creators of derivatives were surveyed on their motivations for doing so. They indicated that they were willing to create derivatives that extend the original content of a book because they want to help others access the work. Nearly all the people surveyed indicated they were glad they had created derivative works, often feeling like they were part of a community effort to share the work with others. These creators of derivatives believe that as awareness of open licenses increases others will be encouraged to create derivative works.

by John Hilton III (johnhiltoniii@byu.edu) at August 28, 2009 09:10 PM

Roy Tennant: Digital Libraries

The Great Web Site Die-Off: Why It Matters

You may not think that a web site needs something akin to a living will, but many do -- or more accurately, will. These are sites that are basicall...

August 28, 2009 02:18 PM

Talis: Panlibus

e-Readers and e-Textbooks: current reality and future possibilities

north-west-missouri-state-universitye-Readers and e-Textbooks: Current reality and future possibilities turned out to be easily the most interesting webinar I’ve ever attended. This Educause webinar featured Dr Jon T Rickman and Dr Roger Von Holzen from North West Missouri State University in the States and describing an initiative there around the evaluation of e-Readers and e-Textbooks over the past year.

Like other universities, North West Missouri State university had found itself under considerable pressure to deliver electronically, and the introduction of new devices in the marketplace has acted as a catalyst for an explosion in sales. There is focus on textbooks specifically in the relentless pursuit of cost reductions. NW Missouri State University is, in terms of its computing provision, unique as it has had a computer rental scheme in place for over two decades – the university charges $360 to its students for a wireless notebook computer.

They set about evaluating the e-Readers out in the marketplace and chose Sony Reader. The Kindle people at Amazon weren’t really interested in participating in the project. The Sony Reader looked attractive for a number of reasons. It was going to cost $250 per unit with bulk purchase (Kindle would have been $299 plus shipping). Sony will be transitioning to the EPUB format. The device has a 6 inch (15cm) display. Text is available in three sizes. It also uses electronic ink technology, which is almost like paper and retains good levels of readability even in strong sunlight, as well as having low power consumption and thus offering great battery life.

They had discounted the idea of offering a paper textbook rental service as the notional cost savings would have been cancelled out by the difficulties in running such a service.

The difficulties they ran into with e-Readers turned out to be considerable. For example, formatting content for e-Readers can take weeks. For campus-wide deployments there are currently not enough e-Reader-compatible e-Textbooks. Keyword searching and annotating are very important features for both students and academics, so despite the strong affinity that students have for hand-held devices, enthusiasm waned without those functions.

They also encountered a number of issues intrinsic to the e-Textbook format rather than the device. For example, the multiple components to the textbook including graphs and images, all have separate copyright. PDF format textbooks provide very restrictive options. And it turned out that what students really want from e-Textbooks is interactivity, animation and the ability to integrate content into other online tools.

They accepted that the whole area of e-Readers and e-Textbooks is subject to rapid change. It’s already the case, for instance, that keyword searching is now offered by e-Reader suppliers even though it wasn’t at the time of evaluation. Nevertheless, they were happy with their decision to move away from e-Reader provision, and instead set about making e-Textbooks available on the notebook computers that they were already renting out to students. They perceived that e-Reading devices and notebook computers are merginginto each others. They also felt uneasy that e-Readers aren’t the platform that authors are creating on – they’re actually creating the content on notebooks. With issues such as these in mind, it was hard to justify an additional $2million costs to add e-Readers to their raft of student services.

A Notebook approach to e-Textbook provision would also integrate with other software and services, including email and web access, thus meeting a key student requirement. And user support was already in place.

The delivery of a range of eTextbooks provided by five publishers to students via notebooks turned out to be simple and efficient. Students were able to complete the download of e-Textbooks with little support.

Rickman and Von Holzen don’t expect e-Textbooks to replace the traditional textbook any time soon. They foresee a transition, but expect academics to continue to select resources on the basis of content. In the meantime, they will continue their search for a new delivery platform, seeing the tablet PC with integrated eReader as an option. Overall, then, they’ve found that e-Readers simply don’t have the functionality to support the richness of e-Textbooks right now, and are more suited to a leisure-type read.

by Sarah Bartlett at August 28, 2009 10:22 AM

Lorcan Dempsey

Serendipity

By: dempsey

Categories: Search • User experience

Digital environments provide many more opportunities for serendipity than print ones. Sure, there may be happy discoveries on the shelf or in a random group of items bound together. But the chances of happy discovery are multiplied in the web environment. Now, when I hear an argument based on serendipity, I usually assume that it really an assertion of a preference for one set of behaviors over another.

In this context, I was interested to read Steven Johnson's discussion of serendipity from a few years back ...

I find these arguments completely infuriating. Do these people actually use the web? I find vastly more weird, unplanned stuff online than I ever did browsing the stacks as a grad student. Browsing the stacks is one of the most overrated and abused examples in the canon of things-we-used-to-do-that-were-so-much-better. (I love the whole idea of pulling down a book because you like the "binding.") Thanks to the connective nature of hypertext, and the blogosphere's exploratory hunger for finding new stuff, the web is the greatest serendipity engine in the history of culture. It is far, far easier to sit down in front of your browser and stumble across something completely brilliant but surprising than it is walking through a library looking at the spines of books. With music blogs and iTunes, I've discovered more interesting new bands and albums in the past year than I did in all of my college years. I know radio has gotten a lot worse, but really -- does anyone actually believe that radio was ever more diverse and surprising in its recommendations than surfing through the iTunes catalog or the music sites? [stevenberlinjohnson.com: Can We Please Kill This Meme Now]

Now, if you narrow the discussion to the library 'before' and 'after' the digital turn, we come back to the issue that library systems do not make their data work hard enough in service of discovery, routine or happy ....


August 28, 2009 03:54 AM

August 27, 2009

Andrew Pace

Meet you at the Forum

On of my favorite conferences is coming up.  LITA Forum will be in Salt Lake City this year and the theme is "Open & Mobile."  Save the dates--October 2-4, 2009.  There are some great concurrent sessions planned--mobile technologies, open data, open source software, lightening talks, and some really great looking poster sessions.  Lest I forget three fantastic keynoters:
If you're in the mood for an extra day in Salt Lake, two LITA members are sure to dazzle with their pre-conference offerings.  Jason Griffey will be talking about the future of Mobile, a great pre-conference that extends the excellent article Jason wrote for netConnect last year.  And Nina McHale will help libraries navigate the tricky issues surrounding Accessibility with updates, a live tour of several assistive technologies, and industry updates.

I love the programming at the Forum, because it is literally the only conference I go to where I am torn between which session to attend.  This headache for conference organizers (just too much good stuff!) should be joy for conference attendees with shrinking professional development budgets or concerns about getting the most for their money.  I should mention that I have never witnessed such a hard-working and dedicated group of volunteers as the ones who put this Forum together.  They are to be congratulated.

But I have thus far neglected the best part about LITA Forum--the people.  If you want to socialize, network, chat with vendors and sponsors in a relaxed atmosphere, or make that connection that will spark an idea or solidify a fleeting thought, then LITA Forum is the place to be.  LITA is fun and inclusive.  LITA is valuable.  These are tough financial times for travel and professional development.  Librarians, IT professionals, and support staff need to choose wisely from an array of conference offerings.  If I had to narrow my choices down to one conference per year or pay for extra professional development out of my own pocket, you'd find me at the Forum.

Did that sound too much like a commercial?  If not, then let me add that you should ACT NOW! and save $50 off of registration.  I'm already registered, are you?  

Seriously, this year's Forum is shaping up to be fantastic.  I'll be in Salt Lake City the first weekend in October and I hope to see lots of people there.

August 27, 2009 02:09 PM

August 26, 2009

Roy Tennant: Digital Libraries

Wikipedia Crosses the Rubicon

As the New York Times has reported, Wikipedia will now require that edits on pages of living persons be first approved by an experienced volunteer ...

August 26, 2009 02:04 PM

August 25, 2009

Roy Tennant: Digital Libraries

Sony Edges Kindle on Library Integration

As seen in Engadget today, and reported by Macworld, Sony is moving to integrate access to library purchased ebook licenses. From Macworld: On Tue...

August 25, 2009 06:55 PM

Talis: Nodalities

Thanassis Tiropanis talks about the Semantic Web and Higher Education

|This podcast also appears on Talis Education Blog.

In my latest podcast I talk with Thanassis Tiropanis of the Learning Societies Lab at the University of Southampton.

We discuss the findings of the recent JISC-funded project, SemTech, which set out to

“identify and quantify the benefits of semantic technologies and outline a roadmap for their adoption in the context of [Further and Higher] education, informal learning and exploratory learning.”

We conclude by looking ahead to the SemHE workshop in Nice on 30 September.

During the conversation, we refer to the following resources;

This conversation was recorded on Friday 21 August 2009.

For other podcasts in the series, see here. To subscribe to updates from all of Talis’ podcast series, see here.

#124;This podcast also appears on Talis Education Blog. In my latest podcast I talk with Thanassis Tiropanis of the Learning Societies Lab at the University of Southampton. We discuss the findings of the recent JISC-funded project, SemTech, which set out to "identify and quantify the benefits of semantic technologies and outline a roadmap for their adoption in the context of [Further and Higher] education, informal learning and exploratory learning." We conclude by looking ahead to the SemHE workshop in Nice on 30 September. During the conversation, we refer to the following resources; Mark Birbeck podcast, discussing RDFa in Government CETIS CETIS Semantic Technologies Working Group DSpace EC-TEL conference Ensemble ePrints EPSRC ESRC JISC Learning Societies Lab, University of Southampton SemHE workshop SemTech project SemTech survey This conversation was recorded on Friday 21 August 2009. For other podcasts in the series, see here. To subscribe to updates from all of Talis' podcast series, see here.

by richard.wallis@talis.com at August 25, 2009 12:50 PM

August 24, 2009

Michael Giarlo

Roy Tennant: Digital Libraries

Making Up Book Covers

It came to my attention lately (from my colleague Lorcan Dempsey) that Google is creating book covers from images found within public domain books ...

August 24, 2009 05:39 PM

Lorcan Dempsey

On books again

By: dempsey

Categories: Analytics and measurement• Books, movies and reading ...• OCLC

'Book' is a big word. It has a lot of power as it is intimately bound up with our intellectual and imaginative histories. More parochially, the book is also strongly bound up with the professional practice and identity of the library and librarians.

At a more prosaic (sic) level, the book is also interesting as we manage data about multiple formats. We use book to mean a format (a set of bound pages, etc, say), a type of creative work (continuous textual/pictorial narrative, say), a work (Don Quixote, say), and maybe other things. Because of its centrality when our professional practices were being formed, perhaps more was taken for granted about the book than has been about other formats (the phrase 'non-book formats' is telling here).

This is a bigger topic than a late-night blog entry will tackle. I am prompted to write the post by a conversation I had recently with my colleague Brian Lavoie about what a book is. This was in the context of data mining activity looking at counts of books in particular contexts (here and here for example). There are two questions here: 'what is a book?', and 'how do you operationalise that definition in relation to a particular data set?'. It may be that it is not possible to operationalise aspects of your definition, in which case you will not be able to count as you wish. For example, one sometimes sees this Unesco definition of a 'book':

A book is a non-periodic publication of at least 49 pages exclusive of the cover pages, published in the country and made available to the public. [Revised Recommendation concerning the International Standardization of Statistics on the Production and Distribution of Books, Newspapers and Periodicals, 1 November 1985]

The document provides some other qualifications as to what is and what is not a book. So, is this this unambiguous enough to be operationalizable in a database like, say, Worldcat? Well, the short answer is 'probably not completely' ;-) I would have to consult more knowledgable colleagues who would have to do some work to find out how much could be done. However, it is unlikely that one would be able to consistently identify all the included categories of materials and apply a limit of 49 pages.

Anyway, this is a prelude to the these comments that Brian sent me about the issue:

"As a non-librarian who works with library data on a regular basis, I was surprised to learn that the commonplace object 'book' is not well-defined in traditional cataloging practice. This is all the more surprising when one considers that historically, libraries were built around aggregations of books. The difficulty is that there are no explicit bibliographic criteria for identifying something most people would recognize as a 'book'. So for example, consider a simple question like 'How many books are in WorldCat?' In the bibliographic universe, there is nothing explicitly defined as a 'book': there are monographs, or more narrowly, language-based monographs, but the items falling into these categories are not necessarily books as we might commonly perceive them. Is a government document a book? A dissertation? A technical report? A pamphlet of only a dozen pages? These kinds of materials, and more, get included when we use a construct like 'language-based monographs' as a proxy for 'books'.
"Why is this important? The concept of "books" is appearing in a variety of current discussions, most notably in the context of digitization issues like the Google book settlement. So we are often asked questions like, 'how many print books in WorldCat have been published after 1923?' We can provide answers to these questions, but only with a degree of approximation built in: i.e., we can calculate a number that reflects something along the lines of 'all language-based monographs in WorldCat, excluding dissertations and government documents'; we can even throw in a minimum page requirement (at least 49 pages, according to the UNESCO definition of a book). But we can't say exactly how many books are in WorldCat, because from a cataloging standpoint, we don't know what a book is. Libraries are grappling with difficult new questions these days, as collections and services transition from print to digital, from local to the network. But an old question still remains: what is a book?" [Personal communication from Brian Lavoie]

August 24, 2009 01:41 AM

August 23, 2009

John LeMasney

Twitter Weekly Updates for 2009-08-23

Powered by Twitter Tools

by John LeMasney at August 23, 2009 02:22 PM

August 22, 2009

Roy Tennant: Digital Libraries

The Power of Visualization

Large amounts of data require appropriate visualization for comprehension. Edward Tufte in his books and presentations has tried to make that point...

August 22, 2009 03:32 AM

William Bilancio

The Allman Brothers & Widespred Panic in Camden for a night of great music

So last night The Allman Brothers and Widespread Panic played at the Susquehanna Bank Center (formerly the Twitter Center) in Camden, NJ. Widespread Panic came on stage promptly at 7:00pm. I had forgotten that Jimmy Hearing was playing guitar for them. He use to play with The Allman Brothers, Jazz is Dead, Phil and Friends and The Dead. The whole band sounded great and I was glad to finally see them. Jimmy was working it and his playing was outstanding. They played 2 hours and it was packed full of great music. I was very impressed with them, and since I have only heard a few things of there’s and I think this show has made me a fan. I will be downloading some other shows to get a better feel of them.

Here is the Widespread Panic playlist from last night:

Lil Kin >
Radio Child
Up All Night
Can’t Get High >
Fishwater >
Christmas Katie >
Travelin’ Light
Diner >
Space Wrangler
Blackout Blues
Junior >
You Should Be Glad
Tall Boy
Climb To Safety

The Allman Brothers took the stage around 9:50pm and hit the ground running with Statesboro Blues. Even though the weather was hot and humid the band sounded great. I haven’t seen the brothers in about 2 years and I was very impressed at how well Derick Trucks has grown into the band. Greg sounded great and he hasn’t lost his grove, he pounded on the keys and never sounded better. Of course Warren Haynes sounded great so not much to say there. A few of the members of Widespread Panic joined the band for a few songs. I thought Jimmy playing with band was good, it was one of the best Dreams and One Ways Out I have heard in a long time.

Here is The Allman Brothers playlist from last night:

Statesboro Blues
Les Brers In A Minor
Come and Go Blues
The Sky Is Crying
Don’t Keep Me Wonderin’
I Walk On Gilded Splinters
with John Bell, guitar & vocals; Sonny Ortiz, percussion
Trouble No More
Good Morning Little School Girl
with Jo Jo Hermann, piano; James van de Bogert, drums
Revival
The Weight
with Jo Jo Hermann, piano
Dreams
with Jimmy Herring, guitar; James van de Bogert, drums
One Way Out
with Jimmy Herring, guitar

Encore
Whipping Post

Now to the down side of the night. At this show there were 2 different types of fans. The we are going to get so drunk and wasted that we will look and act stupid all night and the we come to a show to actually see the show and have a good time and actually remember the show the next night.

I see these types of fans at most shows I go to. The first type get so drunk and stoned that they weave all over the place and act like fools, then they usually pass out and miss the show. We had a whole row of them in front of us last night. They enjoyed the whole two hours of Widespread Panic and the whole break before the Brothers then when the Brothers hit the stage they droped like flies and were asleep by the end of the second song.

It just doesn’t make sense to me why you would pay good money to see a show and then get so wasted you miss most of it and then feel sick the next day.

I also don’t understand those who leave the show in the middle of it to go home? But that’s a whole different post.

by wbilancio at August 22, 2009 12:11 AM

August 21, 2009

Lorcan Dempsey

Ebooks and/or digital books

By: dempsey

Categories: Featured• Libraries - organization and services• RLG Partnership• The cultural and scholarly record• ebooks and other e-resources

I was in a meeting with a group of folks from research libraries the other week. I was interested in a particular terminological issue: 'ebooks' and 'digital books' were each being used in conversation. I asked was there a pattern of consistent use here. 'Not complete consistency' was the answer, but there was certainly a tendency to use 'ebooks' for materials available for license from external providers, and a tendency to use 'digital books' for materials digitized from library collections.

So, in this context, it is easy to see how each expression has a different - if overlapping - set of associations. Ebooks may evoke an environment currently fragmented by provider platforms, with restrictions on use, and managed in a licensed e-resource workflow. They are for reference, information, reading. Digital books may evoke a digital library environment, an aspiration to provide higher level research services based on text mining, entity identification, and so on, and various funding and cooperative initiatives which aim to increase the corpus. The Monk Project or the international Digging into Data Challenge are examples of a direction here.

Over the next few years, it will be interesting to see how these environments evolve as ebooks/digital books grow in number and usage. Ebooks and digital books - to continue to use these ambiguous terms - will become more important in the practice of research and learning. There are at least three big drivers in the environment the group above was discussing. The first is around moving physical collections to the cloud as libraries balance service between local collections, shared offsite collections and digital collections. There are early discussions about policy and service frameworks within which libraries can reduce their print inventory and the opportunity costs associated with it (see here for example). The second is around the demand environment, as books in digital form offer a better fit with research and learning workflows which are increasingly network based. The increasing availability of books in digital form supports patterns of discovery, analysis and use now common with other resources. Think for example of the practice of 'strategic reading' (or 'reading avoidance') where researchers are found to prospect the literature broadly in a digital environment, searching, consulting abstracts, scanning for terminology, diagrams and so on (interestingly described by Renear and Palmer here). For many purposes, people will prefer the digital versions and will shift use. This is not to say that people will not continue to read physical books, but it is interesting to consider the pattern of adoption (and continued development) of the journal literature. The third is around the environment of supply, where there is major current activity. The post settlement Google Books institutional product offering, Amazon's attempt to 'iPodify' books, the rise of the iPhone, and a range of other developments point to rapidly changing opportunities.

So the relationship with the book literature is going to change in significant ways, which may make the ebook/digital book distinction advanced above less relevant. In fact, Google Book Search already moves beyond it in important ways. And libraries are exploring various syndication models (with Amazon, for example, or Kirtas) or in collaboration with publishers such as the the Cambridge Library Collection, for example. Fragmentation, of technical platform, of format, of business model, and so on, will complicate service provision..

This poses major questions for libraries at all levels. From a (current) workflow point of view, we will see a shift of more activity out of the 'bought' materials workflow into the 'licensed' materials workflow. From a collections point of view we will see a rebalancing between local, shared and third party print and digital provision in ways now being worked through. There are bigger issues, already with us with the journal literature, about the curation of the scholarly record, about sharing of materials, and about assuring the type of access that is compatible with use and re-use in research and learning.

I was very interested to read the following remarks by David Nicholas in Update (behind a member wall) recently ...

nicholas.png

I think that libraries may be underestimating the impact and pace of change in the book world ...


August 21, 2009 03:15 AM

August 19, 2009

Ed Summers

New York Times Topics as SKOS

Serves 23,376 SKOS Concepts

INGREDIENTS

DIRECTIONS

  1. Open a new file using your favorite text editor.
  2. Instantiate an RDF graph with a dash of rdflib.
  3. Use python’s urllib to extract the HTML for each of the Times Topics Index Pages, e.g. for A.
  4. Parse HTML into a fine, queryable data structure using BeautifulSoup.
  5. Locate topic names and their associated URLs, and gently add them to the graph with a pinch of SKOS.
  6. Go back to step 3 to fetch the next batch of topics, until you’ve finished Z.
  7. Bake the RDF graph as an rdf/xml file.

NOTES

If you don’t feel like cooking up the rdf/xml yourself you can download it from here (might want to right-click to download, some browsers might have trouble rendering the xml), or download the 68 line implementation and run it yourself.

The point of this exercise was mainly to show how thinking of the New York Times Topics as a controlled vocabulary, that can be serialized as a file, and still present on the Web, could be useful. Perhaps to someone writing an application that needs to integrate with the New York Times and who want to be able to tag content using the same controlled vocabulary. Or perhaps someone wants to be able to link your own content with similar content at the New York Times. These are all use cases for expressing the Topics as SKOS, and being able to ship it around with resolvable identifiers for the concepts.

Of course there is one slight wrinkle. Take a look at this Turtle snippet for the concept of Ray Bradbury:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@Prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<http://topics.nytimes.com/top/reference/timestopics/people/b/ray_bradbury#concept> a skos:Concept;
    skos:prefLabel "Bradbury, Ray";
    skos:broader <http://topics.nytimes.com/top/reference/timestopics/people#concept>;
    skos:inScheme <http://topics.nytimes.com/top/reference/timestopics#conceptScheme>
    .

Notice the URI being used for the concept?

http://topics.nytimes.com/top/reference/timestopics/people/b/ray_bradbury#concept

The wrinkle is that there’s no way to get RDF back from this URI currently. But since NYT is already using XHTML, it wouldn’t be hard to sprinkle in some RDFa such that:

<html xmlns="http://www.w3.org/1999/xhtml"
    xmlns:skos="http://www.w3.org/2004/02/skos/core#">
...
<h1 about="http://topics.nytimes.com/top/reference/timestopics/people/b/ray_bradbury#concept" property="skos:prefLabel">Ray Bradbury</h1>
...
</html>

And voila you’ve got Linked Data. I took the 5 minutes to mark up the HTML myself and put it here which you can run through the RDFa Distiller to get some Turtle. Of course if the NYT ever decided to alter their HTML to provide this markup this recipe would be simplified greatly: no more error prone scraping, the assertions could be pulled directly out of the HTML.

by ed at August 19, 2009 04:50 AM

Roy Tennant: Digital Libraries

Sony Charts an Open E-Book Path

In the news recently was an announcement that Sony was dropping a propietary e-book format for its reader, and adopting the EPub format. The Intern...

August 19, 2009 01:32 AM

August 18, 2009

Andrew Pace

Web-scale: Portfolio Director's Cut

As you might imagine, I've been doing a lot of presentations about Web-scale lately...both the general concept and how it applies to the web-scale management services that my team is building for circulation and delivery, print and licensed acquisitions, and license and rights management.

Lots of people have been asking for copies of the presentation that I gave at ALA.  I used to always have problems sharing slides.  For one thing, the joke slides never work out of context; for another, I try very hard to avoid bulleted lists of things, also making context-less Powerpoint viewing difficult.  Now I can gladly say, "Let's go to the video tape!"

webscale-presentation.jpg


Powerpoint, slides, and video all in one.  Unlike movie directors, however, I find video of myself a bit hard to watch, so I hope the rest of you will enjoy it.  There's a good Q&A session at the end that was actually my favorite part of the presentation.  The presentation covers not only the general web-scale strategy and cloud computing platform, but also more specifics about the Web-scale management services Library Advisory Council, pilot participants, component functionality, and development timelines.

There are a lot of other great conference presenatations and updates available on the OCLC Presentations site--a little something for everyone.

August 18, 2009 09:21 PM

August 17, 2009

Edward M. Corrado: Comments

Edward M. Corrado

Privacy, Borders, and the Internet

I was reading an article about Facebook and Canada’s Privacy Commissioner today. In a nut shell the article says that Canada’s Privacy Commissioner doesn’t meet Canada’s privacy laws. Facebook apparently disputes this but whether or not it does or does not isn’t the point of my post. This article has reminded me of others before it, only the names have changed.

What happens is a social networking site or other Web site is created in one country and because citizens of another country use it, they seem to be expected to live up to the second countries privacy (or other) laws. Now I don’t know if Facebook has an operation in Canada or not, but I know that other past stories I’ve read had countries (or states) taking or threating legal action against Web sites that did not have any operations in their jurisdiction. This is just madness and has to stop. I’m not saying Facebook has the world’s best privacy model. I agree they should do more to protect users privacy but you can’t expect a Web site operator to know and follow the laws of 195 countries not to mention laws of various territories, states, etc. that make up these countries. As long as the privacy terms are disclosed and follow the laws of the country the site is incorporated in, it should be up to the user to determine if the privacy is adequate. With the never ending news stories of lost laptops with social security numbers and other personal data, the privacy of Facebook is not really a huge concern for me. I just figure that everything I put on there may at some point be seen by friends, enemies, criminals. people who couldn’t care less, employers, and my mom.

Really, what do you need for a Facebook account now-a-days, an e-mail address? Sure they say to use a real name but I know many people who don’t and you can easily get a pseudo-anonymous e-mail address. In other words, everything you post on Facebook is what you decided to make available to at least a limited public sphere of friends. Whenever you give something to friends such as a phone number or tell them a tale of your latest adventure you can, or should, consider that they will share that with their friends. No matter what the policy is, even if it lives up to Canada’s privacy policy (or any country’s policy) will not save you from yourself. Users of social networking and other Web sites need to keep this in mind and not post or share anything that they would have a problem with being public.

by ecorrado at August 17, 2009 12:47 PM

Talis: Nodalities

David James talks about Government transparency and the work of Sunlight Labs

Sunlight Labs logoIn my latest podcast I talk with David James of Sunlight Labs, part of the Sunlight Foundation in Washington, DC.

We discuss the Labs’ work to increase Government transparency by making public sector data such as that disseminated via Data.gov more useful.

During the conversation, we refer to the following resources;

This conversation was recorded on Friday 14 August, 2009.

For other Talis podcasts in this Nodalities series, see here

In my latest podcast I talk with David James of Sunlight Labs, part of the Sunlight Foundation in Washington, DC. We discuss the Labs' work to increase Government transparency by making public sector data such as that disseminated via Data.gov more useful. During the conversation, we refer to the following resources; Amazon Public Data Sets Apps for America 2 Capitol Words Data.gov e-Democracy Camp Government 2.0 Camp Jim Hendler and Li Ding podcast, talking about conversion of Data.gov resources to RDF Vivek Kundra Lexis Nexis Mapumental video MySociety National Data Catalog OpenCongress OpenSecrets.org Pew Charitable Trusts Principles of Open Government Data PublicMarkup.org Read the Bill Recovery.gov resource.org Sunlight API Sunlight Foundation Sunlight Labs Sunlight Labs Google Group Transparency Camp Westlaw This conversation was recorded on Friday 14 August, 2009. For other Talis podcasts in this Nodalities series, see here

by richard.wallis@talis.com at August 17, 2009 10:46 AM

Lorcan Dempsey

Counting titles and authors

By: dempsey

Categories: Analytics and measurement• Books, movies and reading ...• Identity management, IPR and e-commerce

The proposed Google Books settlement has created a strong interest in quantifying publications and authors, to get a better sense of the scale of impact. We have been looking at Worldcat and hope to publish an analysis later this year.

Here is an issue that came up this week: how many print books were published in the US since 1923, and how many authors were associated with those books? Here are some numbers, acknowledging that they provide good indications based on the data we have and what we can do with it, not definitive answers.

  • Print books published in the US in 1923 or later: 12,582,962
  • Unique personal authors: 3,685,778
  • Unique corporate authors: 977,679

Now, 'book' is a pretty vague term. This analysis uses the definition we used in the Anatomy of Aggregate Collections paper we published a few years ago, which analyses the collections of the orginal Google 5 libraries, which was as follows:

Although there is no unambiguous bibliographic definition of a book, libraries have often used monographic language materials as a proxy for books, and this practice is adopted for this study. More specifically, in the context of a MARC21 record, a book is defined as a language-based monograph, identified by the codes "a" and "m" in bytes 6 and 7 of the leader, respectively. For the purposes of this study, theses/dissertations and government documents are excluded from the analysis, since these materials are usually acquired and managed as separate segments of the library collection. Records describing books in print format were identified by eliminating all non-print formats, such as digital, microform, Braille, and so on.)

What we are counting are 'manifestations' (in FRBR terms), which might relate to 'title' in common usage. There would be more individual copies. We pull together authors as best we can.

Here is the ranked list of the personal authors by number of manifestations published in the US after 1923.

  1. Shakespeare, William 1564 1616
  2. Marsh, Carole
  3. Twain, Mark 1835 1910
  4. Rudman, Jack
  5. Dickens, Charles 1812 1870
  6. Jackson, Ronald vern
  7. Bloom, Harold
  8. Christie, Agatha 1890 1976
  9. Stevenson, Robert Louis 1850 1894
  10. Cowley, Joy

An interesting list; I have remarked on the Bloom phenomenon before.

Here is the ranked list of corporate authors:

  1. society of automotive engineers
  2. american national standards institute
  3. national business institute
  4. national learning corporation
  5. foreign technology div wright patterson afb ohio
  6. national bureau of economic research
  7. sothebys firm
  8. sotheby parke bernet inc
  9. electric power research institute
  10. naval postgraduate school monterey ca

It will be seen from the list of corporate authors that our working definition pulls in standards and art catalogs. Remember that we are not counting theses and government documents. This is a reminder that although we may have a common-sense notion of a 'book' based on an academic or trade publication, it actually requires some discretionary interpretation to bound the population of books in an operational way for this type of analysis.

And a final reminder: these lists are based on print books published in the US since 1923, not on an analyis of the whole of Worldcat.

The actual analysis was done by my colleagues Jenny Toves and Brian Lavoie.


August 17, 2009 02:10 AM

August 16, 2009

John LeMasney

Twitter Weekly Updates for 2009-08-16

Powered by Twitter Tools

by John LeMasney at August 16, 2009 02:22 PM