Friday, January 26, 2007

Wikipedia, you are the strongest link

The Networker

Wikipedia, you are the strongest link

John Naughton examines the loop between Wikipedia and the major search engines and asks whether the encyclopedia is now as dominant as Google

Friday January 26, 2007

There are two kinds of people in the world - those who think Wikipedia is amazing, wonderful, or inspiring; and those who simply cannot understand how a reference work compiled by thousands of 'amateurs' (and capable of being edited by any Tom, Dick or Harry) should be taken seriously. Brisk, vigorous and enjoyable arguments rage between these two camps, and provide useful diversion on long winter evenings.

What's more interesting is the way Wikipedia entries have risen in Google's page-ranking system so that the results of many searches now include a Wikipedia page in the first few hits. There are several reasons for this. One is the sheer size and comprehensiveness of the online encyclopedia (1.6m articles in English when I last checked). Another is the burgeoning trend whereby bloggers, when mentioning a person, place or product, link to the relevant page in Wikipedia to avoid a digression from their discourse. They use Wikipedia links, in other words, as footnotes. A third is that fact that if you add the work 'wiki' to any Google search, you will be

So there's a nice positive feedback loop between Wikipedia and the major search engines. The prominence of the online encyclopedia, however, also makes it very desirable to have a link from it to your site. This has not escaped the attention of spammers, who edit Wikipedia pages to include spurious links to their meretricious web properties.

Recently, Jimmy Wales, Wikipedia's co-founder, decreed that henceforth all outbound links on the site would be given a special HTML tag ('No Follow') - which meant that the links become invisible to search engines. Google searches will still lead users to Wikipedia pages, but will not bring up further links from those pages.

The implications of this development are being hotly debated on the net. One obvious risk is that Wikipedia essentially becomes the web's equivalent of a 'black hole', sucking in links from all over the web, but giving nothing back. Here's how one blogger, Amit Agarwal, put it:

'Say you discover a cool feature in the iPod (called Stylus) and blog about it. Tomorrow, the Wikipedia contributors append the details of iPod Stylus (your discovery) to the Wikipedia page on iPod. They do attribute your blog but search engines will never see that attribution (or read your blog via Wikipedia) because of the no follow tag.

'Now that Wikipedia enjoys higher credibility and trust ... the search algorithms will rank the Wikipedia iPod page higher than yours (for queries like iPod Stylus) because the search engines are not aware that Wikipedia's content is actually based on your blog page. Result, your site appears after Wikipedia in the "iPod Stylus" search results and you get less or no traffic while Wikipedia gets to enjoy all the fruits of your labor.'

It's not clear what should be done about this. The problem of 'link spam' is real and growing, so it's reasonable for Wikipedia to protect itself. Some people are saying that the encyclopedia is now so dominant that links to it should henceforth be ignored by the search engines. After all, you don't google Google (so to speak) to find it. Is Wikipedia now in the same league?

One of the most useful concepts in technology is the 'success-disaster'. This is a product or service that is so successful that it overwhelms the organisation that invented it. The term was coined by the late Roger Needham, the great Cambridge computer scientist, and one of the wisest men I ever met.

Web 2.0 is riddled with incipient success-disasters because new web services can be created with very little upfront investment and, if popular, tend to expand exponentially. That's why, if you're a successful Web 2.0 start-up, salvation depends on being acquired by someone big before you are overwhelmed by your inability to cope with exponential growth. Blogger, YouTube and Flickr are classic cases in point - the first two taken over by Google and the third by Yahoo! just as they were staggering under their self-induced loads.

But now a fascinating article by David Carr in Baseline magazine reveals that it really matters who your saviour is. He describes the struggle of MySpace engineers to cope with exponential growth. Their difficulties are formidable because supporting MySpace users' activities is a very demanding task. Engineers are having continually to upgrade their back-end systems while supporting an ever-increasing body of subscribers. It's a bit like rebuilding the wings of a Boeing 747 in flight.

Ah yes, you say, but MySpace is owned by News Corporation - a big company with deep pockets. Surely it has the resources to cope? My reading of Carr's analysis is that while News Corp may have the money, it doesn't necessarily have the expertise. Managing a computing cluster of MySpace dimensions requires very sophisticated technical competencies. Only a handful of companies in the world - Google, Yahoo, Amazon, Microsoft and eBay - possess them. News Corporation doesn't.

No comments :