Call +1.888.916.3999 or email sales PageFreezer Facebook PageFreezer Twitter PageFreezer Linkedin PageFreezer YouTube PageFreezer Instagram


Author Archive | Martin Schofield

Web archiving concept

The Basics of Website Archiving

Web archiving concept

Generally speaking, whatever you post on the Internet stays on the Internet. Nothing is ever truly deleted online. Take, for example, all of the GeoCities online communities that were removed by Yahoo! in 2009. While they are no longer live, these archived websites – with their glittering text and late 1990s/early 2000s-era gifs – still remain, alive and kicking, on a popular Internet archiving service.

But what is website archiving? In a nutshell, website archiving is similar to the traditional archiving of documents, only digital. The process is the same: archivists select which information to save, then they store it and preserve it in an archive, which is then made accessible to the public. Web archivists typically oversee archived websites.

Since the Internet is so huge, website archiving organizations use automated processes to collect websites. Using specially designed software known as crawlers (much like those used by Google to index pages), web archivists harvest websites from the live Internet and preserve them as snapshots of information at a particular point in time. These crawlers travel the Internet and find websites to copy and save. These archived websites can then be navigated as if they were still live. The best-known example of this is undoubtedly The Wayback Machine, which has saved around 357 billion web pages over time.

Types of Web Archiving

There are three main ways of archiving content from the Internet: client-side web archiving, transaction-based web archiving, and server-side web archiving.

Client-Side Web Archiving

Relatively simple and scalable, client-side web archiving is the most popular method of web archiving. This method can archive any website that is available for free on the Internet. The crawlers in client-side web archiving imitate the way that users interact with websites. This usually means starting from a seed page, and then following and getting links from internal pages. The crawlers fetch an array of information and web material – from documents or text pages to photos to audio and video files. It only stops once it reaches the boundary of the domain in which they are operating.

Transaction-Based Web Archiving

Transaction-based web archiving is operated on the server-side. This method of web archiving requires access to the web server hosting the content. It needs collaboration and agreement with the server’s owner. In this approach, content that has never been viewed will not be archived; only web content that was viewed, even just once, will be archived. With this method, it is possible to record exactly what data was seen and when.

Server-Side Web Archiving

This method of web archiving foregoes the HTTP interface and goes directly to the server. Basically, server-side web archiving directly copies files from the server. Like with transaction-based web archiving, it requires the collaboration and consent of the server’s owner. The issue with this method is how to translate the copied scripts, database files, and templates to a usable archived website that can be easily navigated. However, the main benefit of this approach is that it copies and archives parts of the site that are inaccessible to client-side crawlers.

Because of website archiving, important, even historical, data from the Internet can be saved and preserved for future generations. For companies, however, archiving of web content is often a legal requirement. Financial services, for example, are required by law to keep detailed records of all content that appeared on company websites, just as they need to archive all other forms of customer communication. Protecting against false claims is another popular reason for archiving pages.

If you have a business and are looking for ways to archive your company’s website – whether for regulatory compliance or liability protection – then you can turn to PageFreezer Software Inc. We offer innovative website archiving tools and electronic records management solutions that are capable of capturing complex client-side generated Javascript and AJAX frameworks, as well as password protected sites. Contact us today to schedule a demo or request a quote.

Social Media Platforms Move to Make Posts “Ephemeral”

social media icons

As with many things in life, nothing lasts forever on the internet.

Servers get turned off, websites get taken down, you hit one wrong button and history that has taken years of hard work to build is erased. But, just as there is no certainty on the persistence of anything, there is also no guarantee that something will completely disappear, either.

From inappropriate tweets posted in a blind rage and embarrassing comments on forums to ridiculous-looking photos from an era when outlandish outfits were the norm, things posted on the internet have a stubborn way of persisting. What’s more, it’s difficult to tell what will disappear and what will last. But precisely because of the persistent nature of internet posts, some social media sites are starting to look at making the eventual disappearance of content a fundamental feature.

On Ephemerality and Freedom

Social media companies are increasingly looking into intentional ephemeral interactions as an antidote to the digital data hoarding that has been happening on the internet for the past 30 years. Intentional ephemerality, as first demonstrated by Snapchat’s auto-delete feature, lends a kind of security blanket to users. It provides an assurance that whatever they post on social media will not be used against you in the foreseeable future.

A study by the Social Media Lab at Cornell University confirms that for users, the assurance of a digital expiration date can be liberating. Respondents of the survey claimed that they are able to share more on apps that allow auto-deletion. Ephemerality allows people the freedom to share small peeks into the mundane details of their lives, from photos of pets to “really ugly selfies,” as they don’t have to worry about that content lurking somewhere in the digital realm, waiting for an inopportune moment to reappear.

Reducing Permanence on Social Media

Mark Zuckerberg alluded to this when recently explaining what he referred to as “reducing permanence” online. In a long public post, the Facebook CEO announced that the company will be focusing on building a more “privacy-focused” platform. He says he believes that the future of communication will significantly shift to encrypted services where people who seek privacy can be confident that their conversations will be secure, and that their messages won’t linger forever.

But while many experts think it may be a good thing that less data stays online permanently, a more ephemeral internet would present its own problems. Historians who compile records of this age, for example, may increasingly struggle to collect information from Social Media sites, and since so much public discourse takes place on these channels, this would be a significant loss.

Intentional ephemerality may also pose a challenge to organizations that use social media for both internal and external business communication. In case of legal proceedings, an audit, or any situation where a company needs to protect itself or prove compliance, ephemerality can make pulling up an accurate history of relevant posts, tweets, and any other content near impossible.

The Solution: Social Media and Website Archiving

PageFreezer offers a solution for archiving websites, including social media sites. Our tool offers a comprehensive records retention policy for your website and social media content. We provide a user-friendly and affordable platform that allows you to preserve your online presence in a regulation-compliant way. Learn more about our website archive solutions. Book a demo or call +1.888.916.3999 today.