Archiving the Socials: LinkedIn, Facebook, and Twitter
Several years ago, I deleted my profile on LinkedIn. I felt that the platform was drifting away from its original usefulness, and I was curious what the experience of deleting a social media account would be like.
It's been amazing!
But there were also drawbacks. And these drawbacks eventually led me to investigate Facebook and Twitter's processes, as well.
You see, before I deleted the account, I took a backup. I held on to that backup, trusting that it would contain the information I had willingly surrendered to a corporate behemoth. But as I later discovered: no such luck.
It turned out that the LinkedIn backup tool had a limitation. It could only deliver information that had been updated "recently". This meant that all the historical information I'd entered when I first created my profile ... was gone. Who updates that, right? It's history.
I'd looked into the backup's contents because I needed pertinent information about specific dates that I couldn't quite remember. And it wasn't there. I had to find it elsewhere.
Fast forward to today, when I was working on copying a backup of all my photos to a new external HD. I recalled the years-prior LinkedIn incident and became curious about what the backup procedures would look like for Facebook and Twitter, my two remaining social networks. I decided to proactively investigate their deliverables.
I started with Facebook.
Facebook Backups
A quick google search found a guide on where to download an account archive.
It took me several minutes to find the "Settings" menu item. If I'd spent more time reading the google results (and noticing that the word "Settings" was actually a link), maybe that would have gone quicker, but it felt extremely buried in the main interface. Once in there, I clicked to download an archive. Naturally, given the overwhelming amount of data involved, the system told me I would receive an email with a download link. Then, it dutifully began gathering the requested information.
Once I received the email, I initiated the download. The sheer size of the archive was mind-boggling. I mean, after years of using FB and posting high-resolution photos (sometimes over 10MB each), I had expected the archive to be over a hundred megabytes. The actual size?
39.5 Megabytes.
C'mon, son.
When I unzipped the archive, I was treated to several folders of data and some
nice HTML pages that allowed me to navigate through it, starting with the
index.htm
in the main folder. It's a good thing I did, too, because at one
point there was a warning that a specific video couldn't be downloaded, and a
link was provided that would be valid only for 3 days. So I downloaded that 6MB
video file separately. It was a video of my son first learning to walk - if I
hadn't noticed the link, or had my own backup, it might have been lost forever.
The data was fairly thorough:
- Profile information, including employment history & some family relationship info
- Page Likes (which if you recall FB history, started out as "interests" and morph into pages later in a really weird technical decision)
- A seemingly-thorough history of authenticated sessions & IP addresses
- A list of all friends and when they were befriended
- A list of advertisers with your contact info (in my case, somehow just Spotify and AirBnB)
- A very thorough list of all messages sent and received
- A list of your timeline updates, plus your Photos, "Synced Photos", and Videos
There were some gaps, though:
- Timeline updates (and presumably messages) didn't include any URLs or photos (although presumably Timeline photos would be included under the Photos section).
- Timeline updates from you are mixed in with timeline posts from your friends. Expect dozens of "Happy Birthday" posts with ZERO ATTRIBUTION.
- Your comments on other peoples' posts ARE NOT INCLUDED
- Friends didn't include any uniquely-identifying account information, just their name and date added (this probably is a good thing in some cases)
- No information from the Pages I own, just a list of their names
- Uploaded photos were not the original files. I don't think I saw any that were over 200kb, and some of these these were files that used to be over 10MB.
Overall, it was definitely better than the LinkedIn experience. But the drawbacks to Photo and Video backup are significant and unfortunate.
Advice: Don't use Facebook as your only place for saving photos - it's not a safe backup. And, when you do download a backup, make sure to scour it for links that expire, so you don't miss any crucial content.
And now, onward to Twitter, my other major social network.
Twitter Backups
It was slightly easier to find the Download button on Twitter, if only because they make finding your account settings a bit easier than Facebook.
Twitter used the same concept of "We'll email you a link when we're done", which is good. The resulting zip file was again a bit smaller than I was expecting. I mean, almost 10 years of tweets - over 36,700 - including images, videos, Vines, and gifs, I expected a substantial file size.
9.1 megabytes
C'mon, son.
After unzipping the file, I saw the main reason for this unexpectedly low file size. Gifs aren't animated! They're single frames of gifs. Also?
IMAGES AREN'T EVEN INCLUDED.
They're linked directly from the Twitter CDN. What happens if you delete your account? Do they stay on the CDN forever - until the heat-death of the Universe?
Despite that, the included index.html
looked a bit more polished than
Facebook's, and it included a handy month-by-month navigation interface.
However, other concerns soon came to the fore. For example, there wasn't an easy way to see what day each tweet was made. Using Chrome Inspector, I was able to browse the source and determine that the data was there (hidden in the metadata of an invisible icon). Also, when replying to tweets, the original tweet wasn't included. This severely impacts the cohesiveness of the conversational experience.
Retweets were included, so it's a bit odd that reply tweets weren't. Embedded retweets (or quoted reply retweets) were also not included.
Things Twitter got right:
- Great month-by-month navigation tool
- Including Retweets
- Providing JSON and CSV data files
Things not so much:
- Images AREN'T EVEN INCLUDED, they are linked from Twitter's CDN. Do they continue to exist after you delete your tweet/account?
- The linked images on the CDN aren't "originals", they are lossy derivatives
- Your Followers/Following lists are not included.
- For that matter, neither are your Lists, if you've created any.
- Moments, DMs, and other things are not included
- Tweets being replied to are not included
- Embedded/quoted tweets are not included
- Dates/Times, while included in metadata, are not immediately visible in the viewing interface
My advice for Twitter users? Investigate a third-party API-driven backup tool if you're interested in a comprehensive archival record. But use the Twitter one to start, because it's kind of fun to be able to browse your whole history.
State of the Archive
It's entirely possible that LinkedIn has resolved my concerns about their backup process - it was, after all, several years ago. But given the way Facebook and Twitter are handling it, I am concerned that the only way to do a real backup from ANY social network is by accessing raw API calls.
That's the only way I can see to avoid losing out on useful information such as Timeline/Reply Context, Friends/Followers/Following, and even just metadata about individual posts (the lack of URLs and attribution for FB timeline updates is particularly troubling).
Most importantly, the way these services crush your photos is a strong, strong argument to back them up safely using a real backup tool, or an image sharing platform that respects originals.
Published: September 17, 2017