Microformats and Language Drift

I was sitting at the bar with Chaals and Danny Ayers (who I’d previously only known through mutual friends and by reputation) at the Fairmont Springs at lunch. He’s an RDF guy, and I put to him the question I’d put to Harry Halpin last night (while watching Super Troopers); Harry likes the loose structure of microformats (though he acknowledges the utility of established ontolologies for constrained domains like medicine and physics), and I wondered if maybe the linguistic model of exemplars would be useful in RDF and OWL to add some flexibility.

But if formal ontologies are too rigid, I think microformats is too loose. It’s great that people are tagging their content, and useful things can be done with these tags in the short term. But microformats is not immune to language drift. Someone will see a tag, misgrok the meaning from context, and idiosyncratically misapply it to other content. This is exacerbated by the international and multi-language nature of the Web.

For example, let’s say that someone had tagged some content with the word “meme” 15 years ago; it would clearly have referred to Dawkin’s model of “idea evolution” (where a concept is spread not through accuracy, but through adaption to its mental environment… an idea akin to Colbert’s “truthiness”). But a few years ago, it spread into common use as a synonym for “fad”; so far, it retains some superficial similarity to Dawkin’s idea. In a few more years, it will probably be a very dated word (like “groovy”) and may well shift to a meaning like “old-fashioned”; it would then have completely lost its essential meaning. So, a diagram of Dawkin’s model tagged “meme” would then be misinterpreted, misindexed, and regarded with confusion by a future reader.

In the long haul, RDF provides a more time-proof solution by providing conceptual context, not just a cluster of words.

One thought on “Microformats and Language Drift”

The day after posting this, I came up with a partial fix for this problem of language drift in semantics: a temporal marker. I believe that this is already recorded in sites like de.licio.us, though I don’t know how closely it is tied to particular instances of use of a given term (or “tag”), nor how well it is exposed. But future SemWeb work should account for the point in time at which a term was applied, and so derive a fixed meaning for the term from that date.

I think it would be interesting to watch the spread of terms this way, the ebb and flow of meaning and cohesion as a word is adopted across the boundaries of social groups and language families (and less across geography, as in the past). This whole hyperorthographic culture is so young that this is probably our first opportunity to do such a study. The next generations of linguists and symbolic logicians will have a treasure trove to dig through.

Comments are closed.

Reinventing Fire

Technology upside down and backwards

Microformats and Language Drift

One thought on “Microformats and Language Drift”