« Carbon Footprint T-Shirts (& Stuff) | Main | Lights, Camera, Talk! »

Resilience Fail (updated)

Quick question: where does this URL go to?

http://tinyurl.com/ya8p9vg

How about this one?

http://bit.ly/DkXOW

Would you have guessed that the first goes to a Computerworld article about business-appropriate avatars, and the second goes to the previous post on Open the Future?

The use of URL-shortening services is a classic example of short-term need trumping long-term resilience. Shortened URLs:

  • are not human-readable, and even the versions with user-generated mnemonics are little better than crude tags;
  • they don't provide contextual clues, which would offer a way to find the information later (if the article has expired, for example) by looking up relevant keywords or related concepts;
  • they rely on the continued presence of the particular shortener - any downtime or disappearance kills potentially millions of links.

    That is, URL-shorteners violate three key principles of resilient design: they offer no transparency, no redundancy, and no decentralization. They're classic single-points of failure.

    As a result, shortened URLs have little or no reference or archival value. A dead short URL is far worse than a dead standard URL, in fact, because (a) you have no way of getting contextual meaning, and (b) you can't even go look up the address on the Internet Archive. This is a real problem for those of us who think of the Internet as a tool for building knowledge. For better or for worse, services such as Twitter have gone from being ephemeral conversation media to being used as tools of collaborative awareness about the world. We can no longer assume that a link in a short message is of only transient value.

    Yet many of us (including me) rely heavily on shorteners when using URLs "conversationally," such as on Twitter or in an instant message chat. They take far fewer characters than a typical URL; in length-limited media such as Twitter, that's a critical advantage.

    So, in the immortal phrase, what is to be done?

    Given that the need for URL shortening will remain as long as we use character-limit media such as Twitter or SMS, I can think of a few steps that would help to return some of the information resilience to the system:

  • Embed shortening "behind the scenes" in Twitter and the like, so that senders just enter a full URL, and recipients see the full URL whenever possible. The full URL should show up on the web version, so that the real address gets archived.
  • Google, Bing, Yahoo, and the other search engines should auto-translate any shortened URLs they stumble upon when indexing pages, so that at the very least the cached version contains the full address. The Internet Archive should definitely be doing this.
  • All URL-shortening services should agree to make the records of short URL -> full URL links available to search and archival sites, under appropriate privacy conditions (e.g., all names/IP addresses of users stripped out, data only available if the company goes under, data only available after five years, users can choose to allow the URL link to expire).

    Any of these would be an enormous step forward, and the combination would make for a much more resilient system. Admittedly, all of these steps require a bit of coding work, and aren't going to be implemented overnight. However, nobody said resilience was easy -- just necessary.

  • Comments

    Smart thinking, Jamais. Note that some services (like BudURL) allow custom creation of more transparent short urls.

    I'm not sure this is actually a problem. Shortened url's are used primarily in tweets, email, or other ephemeral settings - places one would not expect them to be permanent archivable fixtures.

    The real problem is with wtf people are thinking if they use them in a more permanent setting, or one intended to be archived.

    I don't really see this as a problem that needs to be solved, frankly. To steal from Kibo, archiving twitter seems a bit like washing toilet paper. It is ephemeral by nature. Sure, you CAN archive it; if you do, your archival program should expand url's. Problem solved.

    Howard-

    Your point is well made -- certainly adding additional layers of technology to un-shorten the shortening technology would allow for usable archives. And I think that usable archives are a valuable long-term asset, since Twitter and other micro-blogging streams serve as a priceless "humankind awareness signal" -- knowing what is cool to the group is almost as good as knowing the cool thing.

    I think the much more grave danger, that Jamais didn't really touch on, is how URL shorteners tend to take the inherent un-breakability of the internet, and reduce it to a single bottleneck that can go under, taking thousands or millions of links with it. The linkrot caused by the death of tr.im earlier this year must have been on the order of millions, if not hundreds of millions of links.

    If this service stuck to plain text links, or links shortened via some universally interpretable algorithm, then a similar disaster would require millions of sites to simultaneously go down.

    Thanks folks; I've updated the piece, reflecting some of these observations.

    One of my crackpot projects is a registry of string mappings that things like URL-shorteners could use to make an archival copy of their mappings. There are other cases where a simple string-string pair is an important piece of metadata that could also use it.

    One of these days...

    The bigger problem is linkrot in general, especially of material that really isn't ephemeral. People are getting slightly better at it, but the most valuable thing that a website has is its inbound links. Throwing them away because of a software upgrade or a re-org is like burning down a library with the books in it so you can build a new library.

    I see both threat and inevitability in this. As Led Zeppelin sang, "all things are born to die!" in one of the best songs ever, That's The Way. That includes links. Our incessant urge to catalog and forever preserve information runs directly counter to fundamental qualities of Nature, as a dynamic nonequilibrium, of all things having tenuous existence.

    On the other hand, a key force of resilience and the Panarchy is Remembering, so as natural forces begin accelerating due to our collective activities as geologic force. So the more preserved, the larger the pool of possible adaptive solutions for us and future generations. Critical as uncertainty and potential thresholds loom. But virtual knowledge preserved without any correspondence use is really dead knowledge anyway.

    To preserve anything in this world, it must be by and through use.

    Post a comment

    All comments go through moderation, so if it doesn't show up immediately, I'm not available to click the "okiedoke" button. Comments telling me that global warming isn't real, that evolution isn't real, that I really need to follow [insert religion here], that the world is flat, or similar bits of inanity are more likely to be deleted than approved. Yes, it's unfair. Deal. It's my blog, I make the rules, and I really don't have time to hand-hold people unwilling to face reality.

    Archives

    Creative Commons License
    This weblog is licensed under a Creative Commons License.
    Powered By MovableType 4.37