Investigating the Recent Historical Past: Using the Internet Archive Wayback Machine for Litigation Research

By Justin Broubalow, Research Historian

Our legal clients often ask us to investigate events in the recent historical past. Roughly spanning the last twenty or so years, this encompasses a period when the fact record was likely born-digital—that is, it was natively created in digital formats such as websites, emails, social media posts, images and videos, and PowerPoint presentations. When undertaking litigation research projects focused on recent events we cannot necessarily rely on traditional archival and library collections. Instead, we leverage our knowledge of federal, state, and local recordkeeping practices and employ some creative online research strategies.

One resource we often utilize is the Internet Archive Wayback Machine. Launched in 1996, the Wayback Machine is a searchable repository of archived web pages and supporting images, videos, scripts, and other web objects. Web pages are “archived” in the Wayback Machine either after the site’s software “crawls” a public webpage or after content creators voluntarily submit their pages to the repository. The Internet Archive, which is a non-profit organization, currently consists of more than 406 billion web pages in over 40 languages and can be searched by web address, keyword, or date range, as well as through several APIs.

The timeline at the top of the page shows how many times the website was crawled in a given year. Blue circles on the calendar below indicate specific capture dates.

The Wayback Machine has enormous value for legal clients who seek to understand what was posted online in the recent historical past and how it may have changed over time. To demonstrate the value of the Wayback Machine, we investigated how content related to e-cigarettes, or e-cigs, evolved as the public health risk posed by vaping came to light in 2019. A search of the Wayback Machine provides unique insight into how e-cig products were previously marketed and advertised and will certainly be scrutinized in the flood of expected litigation impacting the industry.

The Wayback Machine currently has over 400 captures of eSmoke, an e-cigarette commerce website, dating back to early 2009. In a February 2009 capture, eSmoke’s website features an “Advantages” section that listed the benefits of e-cigs over traditional tobacco cigarettes. It asserted that the product was healthier than cigarettes and allowed users to “[a]void dozens of known toxins, tar, carbon monoxide, and carcinogens that are found in smoke.” By contrast, the February 2019 capture indicates that eSmoke added a warning tag to its homepage, noting that their e-cig products “contain nicotine, a chemical known… to cause birth defects or other reproductive harm.”

In 2009, eSmoke touted the “Advantages” of vaping, calling it healthier, hygienic, and safe.

Like any resource, the Wayback Machine has limitations. The Wayback Machine only archives web pages that existed at a specific date and time and thus its content represents only a small slice of the Internet at any particular time. Furthermore, any understanding of the e-cig industry would be incomplete without the review and analysis of traditional archival and secondary sources. This includes industry trade literature, the scientific and medical literature, periodical and newspaper databases, and federal and state regulatory documents. Yet, even with its limits, the Wayback Machine is an important tool and vivid example of how HAI can reconstruct the past that occurred even just yesterday.