Using AI to Accelerate Digitization at Boston Public Library
Today, as part of our mission expansion, we’re announcing a collaboration with the Boston Public Library to develop AI-driven tools capable of accelerating new digitization of large collections at libraries across the world, starting at BPL.
The Boston Public Library (BPL) is embarking on a new initiative to digitize hundreds of thousands of historically significant items that are currently inaccessible to the public. For example, BPL has collected over 3 million government documents since becoming a Federal Depository Library nearly two centuries ago. For collections at this scale, conventional digitization approaches that rely heavily on manual processes face an impossible choice: either sacrifice depth for breadth by creating minimal descriptions, or drastically limit what gets digitized at all. Al tools are beginning to be integrated into this process, but often enter the workflow in its later stages, limiting their ability to address this fundamental dilemma.
The Institutional Data Initiative and the Boston Public Library are working to change this by collaborating at the outset of a large digitization project, allowing us to explore how AI might complement human expertise and strengthen the process in its earliest stages. This includes researching opportunities to generate machine-readable representations of items, add searchable metadata, and begin the structuring of entire collections—all at the moment each item leaves the imaging station. Our goal is to develop methods and tools at our Boston-area institutions that can support the expert staff of libraries everywhere, enabling them to increase the breadth of materials that can be digitized and the speed at which they’re made available to the public.
Since its launch, the Institutional Data Initiative has worked with knowledge institutions, like the Boston Public Library, to structure, analyze, and publish their existing digital collections as data to facilitate responsible AI training. IDI’s mission is to ensure that as AI advances, it does so in ways that strengthen our vital institutions and the knowledge stewardship they provide. We began by developing a dataset of nearly one million public domain books, scanned at Harvard Library. Now we’re expanding our work to help institutions worldwide. If you're an institution looking to make your collections more accessible, or a technologist excited about building tools that bridge institutional knowledge and AI, we want to hear from you.
Greg Leppert
Executive Director, Institutional Data Initiative