Once upon a time, knowledge was codified in books. As a method of information storage, books have several problems aside from physical storage, but the primary one is codification. This problem can be most commonly seen in a library.
If you’ve been to a library, you’ll know that shelves are arranged according to categories. Over here, Fiction. Over there Crime Fiction. On those shelves British History. On that shelf European Travel.
And as you walk around, it seems to make some kind of sense. I’m interested in British History, so that’s where I should head. But actually, it is an extraordinarily arbitrary assignation in a lot of cases. For example, if I’m interested in Henry VIII, I might find him not in ‘History’ but in ‘Biography.’ Similarly, the 100 Years’ War might fall within “British History”, “European History”, general “History” or even “French History” if the library is so-minded. But the key characters might be covered in more detail in “Biography” and there may be further reading in the travel sections.
Often there are no clear boundaries. The Woman in White by Wilkie Collins could be straight ‘fiction’ or ‘crime fiction’ depending on how you see it. And it is the personal nature of our views that colour attempts to categorise content. In Britain, we tend to consider ‘British History’ as a distinct entity from ‘European History’ and yet to the French, that distinction may make no logical sense.
The internet freed us from the tyranny of categorisation.
No longer did a book have to exist where a bibiographer chose to assign it, but could sit within a web of links and made accessible through search. Perhaps the best example being Wikipedia. To find the information there, one doesn’t spend a fruitless hour seeking through a categorisation system, but simply type whatever you want into the search box… est voila! The Hundred Years War entry comes up. And throughout, there are links to ancillary information – places, people, dates, battles – none of them needing anything more than a click to access and without any recourse to esoteric knowledge about how some librarian has decided things ought to be ordered.
Meta-engines like Google allow us to access incalculable amounts of data from innumerable sources swiftly and accurately with nothing more than Pidgin English and the click of a button. People decry the creation of what is snobbishly called ‘shallow knowledge’ but these are often the voices of gatekeepers: the kind of people who would protect knowledge under what they see as their own professional or expert curatorship.
In fact, early attempts to impose order on the web are still with us: DMOZ, Yahoo Directory, Best of the Web and so on. There, either through automation or human intervention, an attempt is made to assign every website into whatever category seems best.
Why is this is so bad? Consider a site which sells gardening products, offers gardening tips and has a gardening forum. It could easily sit within some shopping category, information category or community category – but why would you place it into any particular one? Actually, you are interested in why your lawn isn’t green and what you can do about it. The answer could lie in a handy hint, a product or a friendly forum and most likely in a journey that takes in all three. There is no simple category into which an answer or site can simply be put.
The proof of the pudding: when would you ever used a service like Yahoo Directory to find an answer to that question, when Google can offer you a hundred suggestions a minute.
And yet, for the past few years there have been numerous attempts to reimpose the spurious idea of curatorship onto the internet. The so called ‘semantic web’ being one example, XML sitemaps another. Not content with having to fill your web page with content, the people behind the ‘semantic web’ wish you to add additional markup to that content.
The reason? Not so that humans can better understand your content, but that machines might be able to make decisions about how to treat it. Here is a piece of markup for ‘author’ so that machines can identify who wrote what. Here is a piece of markup for ‘navigation’, so that machines can tell which bit of the piece can be ignored for indexing purposes. What utter rot it is. We have escaped the dogma of categorisation, only to find it being reimposed by stealth by technologists who would prefer us to solve a problem that only exists in their heads.
The latest bright spark is ‘schema‘. The merest look at the list of items immediately highlights the problem with such schemes: they are inherently limited by the imagination of the person creating the categorisation. Take the scope for the “AutomotiveBusiness” item:
Pop quiz: would a branch of KwikFit belong under ‘AutoRepair’, ‘AutoBodyShop’ or ‘AutoPartsStore’? Well depending on what’s happened to your car it could be any of them.
Drill down further, and the ludicrous nature of what is being proposed becomes clearer still. Having decided that KwikFit is, for the sake of argument, an AutoPartsStore, we are then intended to add markup for the contact person, the geographic co-ordinates, currencies accepted and a slew of other ancillary information which in all probably is already there: in the content.
And once our branch of KwikFit is so assigned, what happens when someone searches for something related to “AutoRepair”? Is our KwikFit branch excluded or included? Either way, it makes a mockery of the supposed benefit of categorisation, when you think about it.
What the people behind Schema (and those who proselytise about it) are trying to do is to get you to shortcut their problems by adding more and more code to describe various on-page elemenets. Their problem is how to understand context, relevancy and importance. Their ‘solution’ is to try and get you to tag your content so they don’t have to work on the extremely hard problem of doing the same via algorithmic means.
Luckily, it will ultimately become an irrelevance. The number of people who enact Schema will never be anything more than vanishingly small and any short-term boost in the search rankings that people see will soon come to an end as the inevitable sharks move in and start to abuse the idea of Schema (fake review scores, fake identities, misattributions and so on).
Don’t waste your brain energy on this walking dud for anything other than short term gain.