Prepress
iSquare selects callas software technology for PDF/A-compliant archival of daily newspapers at the German National Library
Tuesday 15. February 2011 - As part of a project for the German National Library, the Berlin-based software maker iSquare has selected pdfToolbox technology, made by callas software. iSquare has been preparing "e-Paper" editions of 300 German dailies in PDF/A format, which the National Library needs for long-term archival and preservation for posterity. The pdfToolbox solution from callas software ensures that the files collected overnight by iSquare's spider program are converted into high-quality PDF/A files.
Commissioned by the German National Library, the job requires that the content published by German dailies is archived long-term. A call for tenders was therefore posted at the end of 2009 to find a service provider for the “collection, conversion and preparation of electronic editions of daily newspapers”. Software provider iSquare was awarded the contract. Because the tender invitation stipulated that all electronic newspapers should be archived in the long-term PDF/A format, after a comprehensive research on the market, iSquare integrated software that includes pdfToolbox technology from callas software into its overall solution.
“We were especially impressed by the high quality of the results produced by the callas software technology,” explains Michael Kapst, President of iSquare. “callas pdfToolbox is therefore the ideal addition to our solution for the German National Library.”
Complex process demands numerous individual steps
The task of providing the German National Library electronic editions on a continuous basis requires an extensive range of tasks. First, the iSquare ePaper Manager runs its spider program to retrieve the electronic versions from the various websites of the daily newspapers between 8:00 a.m. and 10:00 a.m. “The publishers offer various ways to download the ePaper,” explains Kapst. “The publisher sites often require selecting the pages you want to include in the PDF files. With callas pdfToolbox you can directly select individual pages and/or categories such as politics, sports etc. Many publishers also require you to first select the region-specific edition.”
Newspaper publishers are required to offer their electronic newspapers in a convertible format. If this is not the case, this is documented and a complaint is filed separately. An XML metadata record is also created for each edition that has been collected. Finally, the editions are checked to verify their completeness and consistency and then stored in a database, which can be accessed using the web interface. Here the data can be accessed for further editing and to correct any potential errors. The data is then converted into a PDF/A file. For this purpose, iSquare decided to integrate callas pdfToolbox into its solution to ensure that all PDF/A specifications are fully met and errors are automatically corrected. The callas pdfToolbox software analyzes the PDF documents and also handles imbedded objects such as fonts, metadata, images or compression algorithms if necessary. This way, any hidden problems can be identified and resolved in the conversion process.
The 8,400 pages processed on average each day, including metadata, are generally available to the German National Library for retrieval by noon via an OAI-compliant (Open Archives Initiative) interface.
“The institute is not very concerned with speed so much as it is with the completeness and correctness of the files. And this is where our callas pdfToolbox software makes its essential contribution,” callas software CEO Olaf Drümmer concludes.