WebArchiv, digital archive of Czech web resources
"WebArchiv is a digital archive of Czech web resources which are collected with the aim of their long-term preservation. The National Library of the Czech Republic, in cooperation with Moravian Library and Institute of Computer Science of Masaryk University, has been organizing preservation of these documents since 2000. Tools developed by the Internet Archive, and the International Internet Preservation Consortium (IIPC) respectively are used for web archiving. WebArchiv is a member of IIPC from 2007.

What is the purpose of web archiving?
  • The need to preserve for future generations non-print documents of cultural, artistic and historic value
  • Enormous growth of electronic online resources published solely on Internet
  • The ephemeral nature of electronic resources – valuable documents can be irretrievably lost
Which resources can be found in WebArchiv?
  • Digital documents freely available via Internet
  • Publications with research and artistic focus, news and current affairs
  • Periodicals, monographs, conference papers, research and other reports, scholarly publications, etc.
  • Textual, and to some extent also visual and sound, documents existing only in digital format

WebArchiv content

The aim is to archive everything that has ever been published on Internet within the Czech web. However, this goal cannot be technically reached and besides, not all resources published on web are by their nature suitable for archiving (e.g. promotional material). For these reasons the archiving is following three paths:

The main aim of the WebArchiv project is to implement a comprehensive solution in the field of archiving of the national web, i.e. bohemical online-born documents. That includes tools and methods for collecting, archiving and preserving web resources as well as providing long-term access to them. Both large-scale automated harvesting of the entire national web and selective archiving are being carried out, including thematic „event-based“ collections. At present these methods are tested and are a subject of further research. [...] Collecting online-born documents Strictly from a technical point of view, collecting online documents is an automated process carried out by a set of software tools that harvest, index and save data in the archive according to preassigned parameters. At present open-source software tool (Heritrix) is being used for web crawling. Besides that, a set of criteria is being defined for selecting online-born documents in order to register them in the Czech National Bibliography. In this context, finding a suitable solution of the legal issues is considered necessary. Archiving and preservation Harvested files including relevant metadata are saved in standardized archival formats supported by the IIPC consortium. The data is stored on a dedicated redundant disk array (RAID) with expected migration to National Library’s new data storage facility in the near future."

Last update

Saturday, 17 March 2012 - 10:18pm
Your rating: None
No votes yet
Type of source: