Originally published in June 2019. For further reading on this topic, see “Archiving to Connect During a Crisis: How We Can Unite to Document COVID-19.”
The websites, tweets, blog posts, Facebook updates, videos, and online articles that we create today constitute the historical record we’ll depend on tomorrow. Without web archiving efforts, how will people in the future understand the time we are living in now?

The U.S. Department of Justice considers presidential tweets to be official statements.
Despite the fact that there is no centralized responsibility for its preservation, web content is rapidly becoming the official record. For example, the National Archives and Records Administration (NARA) stated in 2017 that it considers the President’s tweets to be official government records. In a court case later that year, the United States Department of Justice affirmed that the government treats the President’s tweets as official statements. As more and more of our public officials migrate to social media to connect with constituents and each other, the need for preservation of these official records grows. However, under current legislation, NARA issues records management recommendations and guidelines only. No mandate for the preservation of these public records exists.

The social media site announced that 12 years of photos, videos, and music including nearly 50 million songs from 2003 to 2015 have been lost.
As government records are only a fraction of the content that exists online, we also need to concern ourselves with the numerous companies, non-profit organizations, and individuals that also contribute to the Internet landscape but don’t necessarily take responsibility for preservation. The vast majority of contributors to the digital record are unaware of their roles as historical actors and many simply don’t have the forethought, expertise, or capacity to preserve their online presence. One needs only to look to the substantial loss of content recently reported by MySpace, once the most popular site on the web, to see that this can have dire consequences for the historical record.
This reality certainly contradicts the 21st-century adage that once you post something on the Internet it will be there forever. As a web archivist fighting this uphill battle, I’m in a race against time to preserve online content as the web becomes ever more privatized, complex, and ephemeral. I have often had the experience of selecting a website for preservation only to find that it has disappeared or radically changed before I can capture it. This “moving target” issue is compounded by the widespread use of interactive website components and third-party hosts, like social media platforms and website building templates that do not easily lend themselves to capture. The rise of temporary media like Snapchat stories adds to the challenge.
Since we simply can’t rely on content creators to preserve the digital record, the burden of preserving the web largely falls to the stewards of our collective knowledge – libraries, archives, and museums. The Internet Archive, a non-profit organization that hosts billions of archived web pages, is leading the charge. The group began an ambitious web archiving program in 1996, and the tools it has developed for capturing content and replaying archived pages have been widely adopted. However it would be folly to assume that the Internet Archive is capable of preserving the Internet alone.
In fact, there are many dispersed archivists engaged in capturing online content, but they are taxed. In the 2017 National Digital Stewardship Alliance (NDSA) survey on web archiving these archivists overwhelmingly reported that their institutions are absorbing the additional responsibility of collecting pieces of the web without a corresponding increase in staffing or funding to do so. As a result, web archivists are making tough choices about what is “good enough.”
Because the Internet is so large and the resources of those trying to capture it are so few, many of these organizations use automated tools to capture content on a grand scale. This “set it and forget it” approach was common in the early days of web archiving, as advocates such as the Internet Archive encouraged organizations with limited capabilities to capture something over nothing at all. Now that the field has matured a little, web archivists are realizing the limitations of following that prescription. Much of the content that we thought was captured simply wasn’t. Yet quality assurance is not always possible due to the scale of the project or time, budgetary and/or staffing constraints.
What can you do to help? Some of the challenges we face in preserving the web could be countered by including web developers in the process. If you manage a website, consider looking at preservation from the beginning. If preservable design was a higher priority, many of the difficulties that web archivists face would be eliminated. Columbia University has published excellent guidelines for making websites more preservable.
Archivists at HAI are actively engaged in the important work of preserving the web. One of our longest-running projects is a collaboration with the National Library of Medicine to capture web content about HIV/AIDS in the early 21st Century. The collection of websites and social media provide an important resource to see how research, treatment, and attitudes toward AIDS evolved over time. Other web archiving projects with NIH are also underway.
Web archiving is still relatively new and we’re in the process of developing best practices. We’ll continue to share insights on what we’re learning in this blog. If you’re curious as to how you can make your web content more easily archivable or interested in a web archiving program but not sure where to start, follow us or drop us a line. We’d be happy to help you make history!
Further Resources and Reading
- “Freshman US Lawmakers Setting New Rules for Social Media,” Voice of America
- “The Government Has an Instagram Problem,” Medium
- “How Governments Deal With Social Media,” The Atlantic
- Guidelines for Preservable Websites, Columbia University Libraries
- “Why there’s so little left of the early internet,” BBC
- “Myspace deleted 12 years’ worth of music in a botched server migration,” The Verge
- “Web Archiving in the United States: A 2017 Survey,” An NDSA Report, October 2018, posted on Center for Open Science
- The Wayback Marchine: https://archive.org/
- App for saving web pages: Save Page WE, https://chrome.google.com/webstore/detail/save-page-we/dhhpefjklgkmgeafimnjhojgjamoafof?hl=en-US
- From Preservica: Real-world digital preservation blog post about Boston City Archives
- PDF link to U.S. District Court for District of Columbia, Plaintiffs’ post-briefing notices, James Madison Project, et al., v. Department of Justice, et al.: https://assets.documentcloud.org/documents/4200037/Trump-Twitter-20171113.pdf
- PDF link to letter from Architect of the United States to Senate Committee on Homeland Security and Government Affairs regarding Trump Administration compliance with Presidential Records Act: https://www.archives.gov/files/press/press-releases/aotus-to-sens-mccaskill-carper.pdf
*All “Further Resources and Reading” links above retrieved on 4/26/19