<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Reinventing Fire &#187; Microformats</title>
	<atom:link href="http://schepers.cc/category/microformats/feed" rel="self" type="application/rss+xml" />
	<link>http://schepers.cc</link>
	<description>Technology upside down and backwards</description>
	<lastBuildDate>Sun, 09 Oct 2011 19:47:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Formata Non Grata</title>
		<link>http://schepers.cc/formata-non-grata</link>
		<comments>http://schepers.cc/formata-non-grata#comments</comments>
		<pubDate>Sat, 10 Jul 2010 15:39:19 +0000</pubDate>
		<dc:creator>Schepers</dc:creator>
				<category><![CDATA[Metadata]]></category>
		<category><![CDATA[Microformats]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Semantics]]></category>
		<category><![CDATA[Standards]]></category>
		<category><![CDATA[SVG]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://schepers.cc/?p=160</guid>
		<description><![CDATA[<br/>Recently, a browser implementer asked me for examples of SVG. He was having trouble finding good examples of SVG in use, particularly as parts of an HTML document. This question has come up again and again, actually, and it always vexes me. I&#8217;ve been active in the SVG community for close to a decade, and [...]]]></description>
			<content:encoded><![CDATA[<br/><p>Recently, a browser implementer asked me for examples of SVG.  He was having trouble finding good examples of SVG in use, particularly as parts of an HTML document.  This question has come up again and again, actually, and it always vexes me.  I&#8217;ve been active in the SVG community for close to a decade, and I&#8217;ve seen thousands of amazing SVG files (and many more of mediocre to average quality), but somehow they seem to have disappeared or bitrotted over the years.  Some of those files only worked with the slightly-unstandard Adobe SVG Viewer, or didn&#8217;t quite work with Firefox&#8217;s incomplete support, I know, but surely not all of them.  Where is all the great SVG content I remember, the games and GUIs and design and development?  Where are all those files to be found?</p>
<p>I hear some browser implementers say that people just don&#8217;t use SVG.  Intuitively, this feels false to me, based on my own experience.  But could it be true?</p>
<p><span id="more-160"></span></p>
<p>The statistical insignificance of SVG is often cited by some people in the WHATWG, based on a large dataset of Web content indexed by Google.  In the WHATWG, where HTML5 started, great stock is placed on statistics, particularly <a href="http://code.google.com/webstats/index.html">those conducted</a> by the editor, Ian Hickson, a Google employee.</p>
<p style="text-align: center;"><a title="Based on a study of one BILLION documents!" href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-June/006726.html"><img class="aligncenter size-full wp-image-167" title="DrHTML5" src="http://schepers.cc/wp-content/uploads/2009/07/DrHTML5.png" alt="DrHTML5" /></a></p>
<p>There&#8217;s no question about it: HTML is the king of the Web.  I did some rough calculations, similar to claims I&#8217;ve heard before, by counting the number of returns for HTML files versus SVG files.  A <a title="SVG filetype search on Google" href="http://www.google.com/search?q=filetype%3Asvg">search for the filetype &#8220;.svg&#8221;</a> yields around 18,165,500 hits on Google.  (Note that this doesn&#8217;t count the false hits on the word &#8220;SVG&#8221; from St. Vincent and the Grenadines, Stan Van Gundy, the Sexy Valley Girls, or any of the numerous other bizarrities that the acronym stands for.)  SVG content makes up just 0.106% of all Web content, by my rough estimation.  Flash is almost 5 times as common as SVG.  That&#8217;s pretty grim for SVG.</p>
<p>But wait, let&#8217;s put that into perspective.  Flash is about 4.8 times more common than SVG.  HTML is roughly 838 times more common than SVG.  838 times.  Flash content comprises approximately 0.52% of all Web content, and HTML is roughly 189 times more common than  Flash.  So, Flash is clearly much more popular than SVG (even when you consider that some large percentage of Flash content is actually just encapsulated video content, these days).  But that doesn&#8217;t mean that nobody&#8217;s using SVG.  Nearly 20 million documents is pretty impressive, actually, especially given the fact that SVG has been hindered by a lack of native support in browsers for most of its existence (and more recently, even poor support by the Adobe plugin for IE), and a lack of common authoring tools for dynamic content (Inkscape is an excellent vector editor, but it doesn&#8217;t yet do animation or interactivity).</p>
<p>Eighteen million documents.  That&#8217;s a lot of files.  So, given that, why  is it so hard to find SVG content?</p>
<p>Maybe because the most popular search engine in the world, Google, doesn&#8217;t index SVG.</p>
<h3>Indexing SVG</h3>
<p>A long time ago, back in 2002, I made a <a title="Discussion of SVG text search and translation" href="http://schepers.cc/svgaccessibility.html">page</a> discussing my experiments with text search and translation.  The results were not very encouraging, but I reckoned it was just a matter of time.  I optimistically wrote to Google to encourage them to enable text search and translation of SVG files.</p>
<p>8 years down the line, things don&#8217;t seem to have changed much on that front.</p>
<p>To be fair, many SVG files don&#8217;t contain any text at all, not even a &lt;title&gt; element, so indexing them might not yield much.  But many other files do have at least a title, and SVG infographics and webapps usually have at least labels that might be meaningful as search terms.  Often SVG files are even text-heavy.</p>
<p>It&#8217;s not that Google doesn&#8217;t take note of the files&#8230; obviously, you can search for the filetype, or in the worst case, the specific file URL, and normally get back positive results.  But Google doesn&#8217;t seem to search the contents of the SVG files and present them in the relevant result set.  To test this, I tried searching for a few files that I knew to have indexable text content.</p>
<p>As an example, I looked for some SVG files on my little (long out-of-date) SVG promotion site, SVG-Whiz.com.  First, I searched for a file I knew to have a cogent block of text, my explanation of the distinctions between &#8216;display&#8217;, &#8216;visibility&#8217;, and &#8216;opacity&#8217;, called <a href="http://svg-whiz.com/svg/HideShow.svg">HideShow.svg</a>:  </p>
<p><object type="image/svg+xml" width="360" height="400" data="http://svg-whiz.com/svg/HideShow.svg">Please use a modern browser.</object></p>
<p>This file has been hosted on my site since 2003, I&#8217;ve gotten several positive comments about it, and a <a href="http://www.google.com/search?q=http%3A%2F%2Fsvg-whiz.com%2Fsvg%2FHideShow.svg">direct search for that file URL</a> turns up a few hits linking to it, so it&#8217;s seems like a reasonable candidate for indexing.  But what are the results of my in-site <a href="http://www.google.com/search?q=site%3Asvg-whiz.com%20opacity">Google search for the word &#8216;opacity&#8217;</a>?  Okay, that just turned up the explanation page linking to the SVG file in question.  Fair enough, maybe Google doesn&#8217;t treat SVG as a &#8220;document&#8221; file, only as an image.  So, how about an <a href="http://www.google.com/images?q=site%3Asvg-whiz.com%20opacity">image search for the same term</a>?  Nada.  So, maybe Google doesn&#8217;t consider SVG to be either a &#8220;document&#8221; nor an &#8220;image&#8221;&#8230; let&#8217;s <a href="http://www.google.com/search?q=site%3Asvg-whiz.com+opacity+filetype%3Asvg">search for the word &#8216;opacity&#8217; in the site &#8216;svg-whiz.com&#8217; with the filetype &#8216;svg&#8217;</a>.  As specific as that is, at the time of writing, I got not a single resulting hit.</p>
<p>Google can find the files&#8230; why doesn&#8217;t it do something with them?</p>
<h3>Comparison of File Extension Frequency</h3>
<p>So, what criteria does Google use to decide which file types it is going to index?</p>
<p>The <a title="Google's Filetype FAQ" href="http://www.google.com/help/faq_filetypes.html">Google FAQ on search filetypes</a> lists 23 file extensions that it indexes, and says:</p>
<blockquote><p><span style="line-height: 28.5px;">There are 13 main file types searched by Google in              addition to standard web formatted documents in HTML. The most common              formats are PDF, PostScript, Microsoft Office formats [...] </span><span style="line-height: 28.5px;">Google is also scouring the Web for additional file            types that are very rare. You may see them pop up in your results from            time to time.</span><span style="line-height: 28.5px;"> [...] </span><span style="line-height: 28.5px;">PDF formatted files are the most popular after HTML files.            PostScript and Microsoft Word files are also fairly common. The other            file types are relatively uncommon by comparison. </span></p></blockquote>
<p>So, I took the liberty of conducting my own survey of the relative frequency of various filetypes, as collected by Google itself, by using the <span style="line-height: 28.5px;">&#8220;filetype:extension&#8221;</span> query term. I&#8217;m not totally convinced this is at all an accurate means to collect and analyze the data, but it&#8217;s what I had at hand.</p>
<p>I put together a table that compares the different file types that Google explicitly mentions.  (I thought about representing the data as an SVG barchart, but I was afraid it wouldn&#8217;t be indexed&#8230; just kidding, the sheer volume of HTML files would make every other bar just a blip.)</p>
<p>I also threw in some other filetypes of interest, including some with functional similarity to SVG, such as Illustrator, PhotoShop, and Silverlight.  I expected non-Web filetypes such as Illustrator&#8217;s &#8220;*.ai&#8221; to be disproportionately underrepresentated in the results compared to their actual usage, and that was indeed borne out; it&#8217;s hard to know what percentage of SVG files are intended for and presented on the Web (I&#8217;ve spoken to many Inkscape users who only use SVG for print or local hard-drive, which surprised me), but I would guess that it is far, far more heavily tilted toward Web usage&#8230; but I still thought it would be interesting to compare.</p>
<p>What did surprise me was how &#8220;*.svg&#8221; compared to such ubiquitous file extensions as &#8220;*.txt&#8221;, and those for Excel, PowerPoint, and the venerable PostScript.  To be frank, the results make me question my methodology, or perhaps the accuracy of Google&#8217;s reporting.</p>
<table style="text-align:right; background-color:#A2953A; border:none; border-collapse:collapse; border-spacing:0px;" border="0" cellpadding="2px">
<tbody>
<tr style="background-color:#414602; color:#FFFFE3; border-collapse:separate;">
<th style="text-align:center;">File Type</th>
<th style="text-align:center; border-left:#FFFFE3 2px solid;">File Extension</th>
<th style="text-align:center; border-left:#FFFFE3 2px solid;">Number of Results</th>
<th style="text-align:center; border-left:#FFFFE3 2px solid;">Introduction</th>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;" rowspan="5">HyperText Markup Language</th>
<td style="color:#7A7822;">total</td>
<td><strong>16,574,700,000</strong></td>
<td>1991</td>
</tr>
<tr>
<td>html</td>
<td>8,180,000,000</td>
<td></td>
</tr>
<tr>
<td>php</td>
<td>7,010,000,000</td>
<td></td>
</tr>
<tr>
<td>asp</td>
<td>1,370,000,000</td>
<td></td>
</tr>
<tr>
<td>xhtml</td>
<td>14,700,000</td>
<td></td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Adobe Portable Document Format</th>
<td>pdf</td>
<td><strong>281,000,000</strong></td>
<td>1993</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;" rowspan="3">Shockwave Flash</th>
<td style="color:#7A7822;">total</td>
<td><strong>87,923,300</strong></td>
<td>1996</td>
</tr>
<tr>
<td>swf</td>
<td>87,900,000</td>
<td></td>
</tr>
<tr>
<td>fla</td>
<td>23,300</td>
<td></td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Microsoft Word</th>
<td>doc</td>
<td><strong>58,500,000</strong></td>
<td>1983</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;" rowspan="3">Text</th>
<td style="color:#7A7822;">total</td>
<td><strong>32,757,000</strong></td>
<td>1982</td>
</tr>
<tr>
<td>txt</td>
<td>32,600,000</td>
<td></td>
</tr>
<tr>
<td>ans</td>
<td>157,000</td>
<td></td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;" rowspan="3">Scalable Vector Graphics</th>
<td style="color:#7A7822;">total</td>
<td><strong>18,165,500</strong></td>
<td>1999</td>
</tr>
<tr>
<td>svg</td>
<td>18,100,000</td>
<td></td>
</tr>
<tr>
<td>svgz</td>
<td>65,500</td>
<td></td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Microsoft Excel</th>
<td>xls</td>
<td><strong>12,500,000</strong></td>
<td>1982</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Microsoft PowerPoint</th>
<td>ppt</td>
<td><strong>7,790,000</strong></td>
<td>1984</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Rich Text Format</th>
<td>rtf</td>
<td><strong>5,130,000</strong></td>
<td>1987</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Adobe PostScript</th>
<td>ps</td>
<td><strong>2,440,000</strong></td>
<td>1982</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Adobe Illustrator</th>
<td>ai</td>
<td><strong>135,000</strong></td>
<td>1986</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">OpenOffice.org</th>
<td>odt</td>
<td><strong>135,000</strong></td>
<td>1999</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">MacWrite</th>
<td>mw</td>
<td><strong>35,600</strong></td>
<td>1984</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Adobe PhotoShop</th>
<td>psd</td>
<td><strong>21,800</strong></td>
<td>1988</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;" rowspan="3">Silverlight</th>
<td style="color:#7A7822;">total</td>
<td><strong>8,493</strong></td>
<td>2003</td>
</tr>
<tr>
<td>xaml</td>
<td>8,180</td>
<td></td>
</tr>
<tr>
<td>xap</td>
<td>313</td>
<td></td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">CorelDRAW</th>
<td>cdr</td>
<td><strong>8,310</strong></td>
<td>1989</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;" rowspan="4">Microsoft Works</th>
<td style="color:#7A7822;">total</td>
<td><strong>2,266</strong></td>
<td>1987</td>
</tr>
<tr>
<td>wks</td>
<td>1,190</td>
<td></td>
</tr>
<tr>
<td>wps</td>
<td>801</td>
<td></td>
</tr>
<tr>
<td>wdb</td>
<td>275</td>
<td></td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;" rowspan="9">Lotus 1-2-3</th>
<td style="color:#7A7822;">total</td>
<td><strong>1,981</strong></td>
<td>1983</td>
</tr>
<tr>
<td>wks</td>
<td>1,190</td>
<td></td>
</tr>
<tr>
<td>wk1</td>
<td>571</td>
<td></td>
</tr>
<tr>
<td>wki</td>
<td>89</td>
<td></td>
</tr>
<tr>
<td>wk3</td>
<td>47</td>
<td></td>
</tr>
<tr>
<td>wk4</td>
<td>41</td>
<td></td>
</tr>
<tr>
<td>wku</td>
<td>33</td>
<td></td>
</tr>
<tr>
<td>wk5</td>
<td>6</td>
<td></td>
</tr>
<tr>
<td>wk2</td>
<td>4</td>
<td></td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Microsoft Write</th>
<td>wri</td>
<td><strong>1,200</strong></td>
<td>1985</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">Lotus WordPro</th>
<td>lwp</td>
<td><strong>148</strong></td>
<td>1995</td>
</tr>
<tr style="background-color:#7A7822; border-top:#FFFFE3 5px solid;">
<th style="text-align:left; vertical-align:top;">All Web Content</th>
<td></td>
<td><strong>17,081,255,598</strong></td>
<td>1989</td>
</tr>
</tbody>
</table>
<p><strong>Caveat:</strong> I originally compiled this information a few months ago, and when rechecking it for accuracy, I got a surprising result.  Normally, the <a title="Google search for SVG files" href="http://www.google.com/search?q=filetype:svg">Google search for filetype:svg</a> returned 18,100,000 hits, but late one night, it returned only 2,000,000 hits; now, my most recent check showed around 4,300,000 hits.  Jumping around in the results for an explanation, I noticed that there is a lot of duplication of Wikipedia content, and since Wikipedia uses SVG, that might account for some discrepancy.  One possibility is that  the lower figure represents 2 million unique documents, which are duplicated in a lot of places; the same should be said of any HTML content, and probably to a lesser extent of Flash content.  I don&#8217;t know if this is the right conclusion, but it would be an interesting data point.  Even with the much more modest figure of 2 million documents, I still think that represents an impressive body of work, particularly in light of the fact that SVG documents are normally authored individually, not through forum or blog software, or exporting or reformatting of email and text content as HTML.</p>
<h3>Conclusions</h3>
<p>I don&#8217;t think this is some grand conspiracy by Google to suppress SVG.  Simple neglect is much more plausible.  They don&#8217;t seem to see the value in indexing SVG.  But the end result is the same: SVG seems to be statistically underrepresented in terms of access through Google searches, and thus, it is harder to find SVG content. </p>
<p>Relying on the results of a search engine that doesn&#8217;t index SVG, to draw conclusions about how many people are using SVG is not statistically sound.   This is a bit like conducting a phone survey of English speakers in China, and concluding that nobody speaks with a Southern US accent.  I reckon y&#8217;all might could see the problem with that methodology if you lived here in North Carolina.</p>
<p>SVG is at least as plentiful on the Web, by Google&#8217;s own reckoning, as most other file types that Google does index. Search engines, Google included, should index SVG files. They should read the text inside the file, and if the file is referenced in an HTML page, they should associate those keywords with the SVG file, just as they do with raster images. SVG files should display in image searches, as well. Here&#8217;s a list of the kind of useful content that can be gleaned from most SVGs:</p>
<ul>
<li><strong>file name: </strong>while pretty primative, many files give some clue as to their contents in their name</li>
<li><strong>text elements</strong><strong>: </strong>there are several elements in SVG that contain text to be rendered to the screen, including &lt;text&gt;, &lt;tspan&gt;, &lt;textPath&gt;, and &lt;textArea&gt;, and the content should be indexed as if it were text in any other format</li>
<li><strong>embedded HTML: </strong>HTML (and other markup) can be embedded inline in SVG, and search engines should look for that and index it as they would standalone HTML content</li>
<li><strong>links: </strong>Google, and probably most other modern search engines, give weight to files that are linked from other files, and files referenced from SVG content should benefit in the same way; the @xlink:title and @rel can help define the relationship between the files</li>
<li><strong>descriptive elements: </strong>like HTML, SVG has a &lt;title&gt; element that doesn&#8217;t display, but adds to the information about its parent file or element, and SVG also has a &lt;desc&gt; element for a longer description</li>
<li><strong>metadata: </strong>SVG can contain RDF, RDFa, microformats, and ARIA markup, which search engines are starting to pick up on these days; these metadata can reveala lot about a file, from its license information to structured content (like calendars or dates or contact info) to intent (such as ARIA roles, which will soon be expanded to include things like different chart types)</li>
</ul>
<p>And the SVG Working Group would be happy to work with any search engine developers to make improvements to SVG 2 to help make indexing SVG content easier or more fruitful.</p>
<p>I&#8217;m not trying to pick on Google here (though I do note that a <a href="http://www.bing.com/search?q=svg+opacity+svg-whiz.com">Bing search for &#8216;svg opacity svg-whiz.com&#8217;</a> listed the SVG file as the first hit), I&#8217;m just noting a discrepancy and an opportunity for improving the experience that people have on the Web with regards to SVG.  At the very least, SVG should be recognized by Google as a legitimate file type, rather than a formata non grata.</p>
<h4>Update</h4>
<p>Rob Russell delivered great news to us in his SVG Open keynote.  As of August 31, 2010, <a href="http://googlewebmastercentral.blogspot.com/2010/08/google-now-indexes-svg.html?utm_source=feedburner&#038;utm_medium=twitter&#038;utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29">Google now indexes SVG</a> and delivers it in some search results.   Kudos to Google for stepping up!  I&#8217;m very pleased&#8230; solid results only 6 weeks later.  (I guess I should thank Slashdot, too.)</p>
]]></content:encoded>
			<wfw:commentRss>http://schepers.cc/formata-non-grata/feed</wfw:commentRss>
		<slash:comments>37</slash:comments>
		</item>
		<item>
		<title>Microformats and Language Drift</title>
		<link>http://schepers.cc/microformats-and-language-drift</link>
		<comments>http://schepers.cc/microformats-and-language-drift#comments</comments>
		<pubDate>Wed, 09 May 2007 21:25:03 +0000</pubDate>
		<dc:creator>Schepers</dc:creator>
				<category><![CDATA[Metadata]]></category>
		<category><![CDATA[Microformats]]></category>
		<category><![CDATA[Semantics]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.schepers.cc/?p=33</guid>
		<description><![CDATA[<br/>I was sitting at the bar with Chaals and Danny Ayers (who I&#8217;d previously only known through mutual friends and by reputation) at the Fairmont Springs at lunch. He&#8217;s an RDF guy, and I put to him the question I&#8217;d put to Harry Halpin last night (while watching Super Troopers); Harry likes the loose structure [...]]]></description>
			<content:encoded><![CDATA[<br/><p>I was sitting at the bar with Chaals and <a title="Danny's Blog" target="_blank" href="http://dannyayers.com/">Danny Ayers</a>  (who I&#8217;d previously only known through mutual friends and by reputation) at the Fairmont Springs at lunch.  He&#8217;s an RDF guy, and I put to him the question I&#8217;d put to Harry Halpin last night (while watching Super Troopers); Harry likes the loose structure of microformats (though he acknowledges the utility of established ontolologies for constrained domains like medicine and physics), and I wondered if maybe the linguistic model of exemplars would be useful in RDF and OWL to add some flexibility.</p>
<p>But if formal ontologies are too rigid, I think microformats is too loose.  It&#8217;s great that people are tagging their content, and useful things can be done with these tags in the short term.  But microformats is not immune to language drift.  Someone will see a tag, misgrok the meaning from context, and idiosyncratically misapply it to other content.  This is exacerbated by the international and multi-language nature of the Web.</p>
<p>For example, let&#8217;s say that someone had tagged some content with the word &#8220;meme&#8221; 15 years ago; it would clearly have referred to Dawkin&#8217;s model of &#8220;idea evolution&#8221; (where a concept is spread not through accuracy, but through adaption to its mental environment&#8230; an idea akin to Colbert&#8217;s &#8220;truthiness&#8221;).  But a few years ago, it spread into common use as a synonym for &#8220;fad&#8221;; so far, it retains some superficial similarity to Dawkin&#8217;s idea.  In a few more years, it will probably be a very dated word (like &#8220;groovy&#8221;) and may well shift to a meaning like &#8220;old-fashioned&#8221;; it would then have completely lost its essential meaning.  So, a diagram of Dawkin&#8217;s model tagged  &#8220;meme&#8221; would then be misinterpreted, misindexed, and regarded with confusion by a future reader.</p>
<p>In the long haul, RDF provides a more time-proof solution by providing conceptual context, not just a cluster of words.</p>
]]></content:encoded>
			<wfw:commentRss>http://schepers.cc/microformats-and-language-drift/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SVG Text, Semantics, and Accessibility</title>
		<link>http://schepers.cc/svg-text-semantics-and-accessibility</link>
		<comments>http://schepers.cc/svg-text-semantics-and-accessibility#comments</comments>
		<pubDate>Tue, 07 Nov 2006 09:24:29 +0000</pubDate>
		<dc:creator>Schepers</dc:creator>
				<category><![CDATA[Accessibility]]></category>
		<category><![CDATA[Metadata]]></category>
		<category><![CDATA[Microformats]]></category>
		<category><![CDATA[Semantics]]></category>
		<category><![CDATA[SVG]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[W3C]]></category>

		<guid isPermaLink="false">http://www.schepers.cc/?p=11</guid>
		<description><![CDATA[<br/>In addition to geometric shapes, SVG has advanced graphical text capabilities. In SVG 1.1, there are several elements specifically designed for the presentation of text. At the most basic level, there is the &#60;text&#62; element, which can have child &#60;tspan&#62; elements that can be positioned and styled independently of the other text content, like this: [...]]]></description>
			<content:encoded><![CDATA[<br/><p>In addition to geometric shapes, SVG has advanced graphical text capabilities.   In SVG 1.1, there are several elements specifically designed for the presentation of text.  At the most basic level, there is the &lt;text&gt; element, which can have child &lt;tspan&gt; elements that can be positioned and styled independently of the other text content, like this:</p>
<p><iframe width="155" scrolling="no" height="51" frameborder="0" src="http://www.schepers.cc/svg/mindTheGap.svg">Please use FF1.5+, Opera 9+, or IE with an SVG plugin!</iframe></p>
<p>Then there are more advanced options, like rotated text<br />
<iframe width="20" scrolling="no" height="131" frameborder="0" src="http://www.schepers.cc/svg/rotatedText.svg">Please use FF1.5+, Opera 9+, or IE with an SVG plugin!</iframe>   or the &lt;textPath&gt; element</p>
<p><iframe width="487" scrolling="no" height="262" frameborder="0" src="http://www.schepers.cc/svg/textPath-simple.svg">Please use FF1.5+, Opera 9+, or IE with an SVG plugin!</iframe></p>
<p>The future of SVG text support holds still more.  In the next version of the SVG specification, <a target="_blank" title="Scalable Vector Graphics Tiny 1.2" href="http://www.w3.org/TR/SVGMobile12/">SVG Tiny 1.2</a>, there are even more useful text features, like the &lt;textArea&gt; element that automatically wraps text to a shape (rectangles on for mobile devices, but any shape at all in future versions of the specification).  There is also the new ability to make any text editable by simply including an attribute to the text element.  And there are great features from <a title="Scalable Vector Graphics 1.1" href="http://www.w3.org/TR/SVG11/">SVG 1.1</a> (the current version) that are not yet widely implemented, such as SVG Fonts, which let you embed a font into an SVG file so the reader sees the page the way the author intended it, and the &lt;tref&gt; element that lets you directly quote text without duplicating it.  All these features will give more control to authors and give a better experience to users.</p>
<p>But is that enough?</p>
<p><span id="more-11"></span></p>
<h2>The Study of Meaning</h2>
<p>Semantics is the study of meaning; a cluster of dots in the shape of a word doesn&#8217;t mean anything to a machine, but real words and sentences can be connected to concepts, and that is why this is important in a graphical format like SVG.  All of these text features really are text, not just images of text.  They can be searched for, copied, indexed by search engines, and understood by <a title="Wikipedia article on screen readers" target="_blank" href="http://en.wikipedia.org/wiki/Screen_reader">screen readers</a>.  Screen readers are a kind of <a title="Wikipedia article on AT" target="_blank" href="http://en.wikipedia.org/wiki/Assistive_technology">assistive technology</a> that reads pages out loud for people with vision problems.  It&#8217;s not just for people; machines like it, too.  It has semantic value, and this moves us toward the <a title="Semantic Web homepage" target="_blank" href="http://www.semanticweb.org/">Semantic Web</a>.  There is also the &lt;switch&gt; element, which is an excellent way to provide different text content, such as your own translations, depending on the user&#8217;s computer <em>systemLanguage</em>.   Obviously, this adds meaning by disambiguating different possible translations that a computer-based translation might not catch (because people are smart, and computers are merely fast).  And it&#8217;s not just the visible text that provides meaning, but also the metadata (or &#8220;data about data&#8221;).   SVG&#8217;s &lt;title&gt;, &lt;desc&gt;, &lt;metadata&gt; elements are not rendered onto the screen (although the <a target="_blank" title="Opera Web Browser" href="http://www.opera.com/">Opera browser</a> does provide a tooltip for the title text), but can be indexed by search engines, and the use of script or special browser capabilites (such as native browser support for <a target="_blank" title="Resource Description Framework" href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>, a special kind of markup that describes the relationship between terms used in the metadata) can provide extended functionality.  As you might expect, the &lt;title&gt; and &lt;desc&gt; elements are meant to be human-readable, while &lt;metadata&gt; elements more frequently contain computer code.</p>
<h2>A Semantic Argument</h2>
<p>But in a recent post to <a target="_blank" title="SVG-Developers Mailing List at Yahoo Groups" href="http://groups.yahoo.com/group/svg-developers">SVG-Dev</a>, a representative from Adobe <a title="Leonard Rosenthol's post to SVG-Dev" href="http://tech.groups.yahoo.com/group/svg-developers/message/57280">pointed out</a> some possible shortcomings of SVG&#8217;s text capabilities compared to HTML or PDF.  Adobe is using SVG in its new <a title="Adobe Mars homepage" target="_blank" href="http://labs.adobe.com/wiki/index.php/Mars">Mars project</a>, which is an XML version of their popular printable document format.  Now, I don&#8217;t find HTML to have very much semantic content either (see this <a title="Post about semantics to the SVG mailing list" target="_blank" href="http://lists.w3.org/Archives/Public/www-svg/2004Nov/0296">very long post on semantics from a couple of years ago</a>), but Adobe&#8217;s Leonard Rosenthol points out that computers (specifically screen readers) can&#8217;t distinguish between paragraphs and columns of text, and particularly not tabular content (where each cell in a table has a context in both a row and a column).  I don&#8217;t think that paragraphs or columns are particularly semantic (and certainly aren&#8217;t universal even among written languages).  They are largely cosmetic print conventions, and although of course that matters for printed formats, it&#8217;s not as important for accessibility or spoken-word understanding, nor for machine understanding; although admittedly paragraphs do represent some structure (I&#8217;m having flashbacks to composition class in high school here), most often columns are simply a convenience of text flow within a constrained area and carry no distinction from one another, and where they do, it is typically simple tabular data. Tables are another matter, though, because of the specific associations involved in cell position.  There are other semantic orthographic properties of documents that matter more than print conventions, and which have sometimes been passed off as merely stylistic, such as emphatic text (represented by boldface, italics, or relative font sizes) that is indicated by tone in spoken language.Acronyms, abbreviations, definitions, and quotations all have HTML representations (although they are too rarely used), and there is no equivalent of those tags in SVG (by design); but that meaning is limited to a specific knowledge domain, that of generic textual reference.  There are other specific domains that have highly structured tagsets just dripping with meaning, like <a title="Chemical Markup Language" target="_blank" href="http://www.xml-cml.org/">Chemical Markup Language (CML)</a>, <a title="Linguistics Markup Language" target="_blank" href="http://www.sanesense.org/lgml/">Linguistics Markup Language (LGML)</a>,  <a title="Music Notation Markup" target="_blank" href="http://www.musicmarkup.info/">Music Markup Language (MML)</a>, <a title="Genetics Markup" target="_blank" href="http://www.mged.org/Workgroups/MAGE/mage.html">MicroArray and Gene Expression (MAGE</a> &#8212; contrasted with the almost inevitable <a title="SourceForge Zefania Bible ML" target="_blank" href="http://sourceforge.net/projects/zefania-sharp/">Bible Markup Language</a>)&#8230; mapping, geography, mathematics, psychology, literature, sociology, physics, architecture&#8230; every area of human endeavor has a systematic structure that is (or will soon be) encoded in some kind of tagging scheme, whether that be in the form of XML, RDF, or some other format.  But these semantics are largely lost on the Web.</p>
<p>SVG has taken pains to avoid including specific semantics in the specification, in order to remain a neutral presenatation format.  The only semantics are derived from the graphical geometry, which nevertheless can be extended to related domains like mapping and geography (and possibly feng shui).  But it would be ideal if the underlying structure carried all the rich semantics of the graphical data being presented, if a circle were identified as an atom, or a rectangle were a flowchart box, if your computer knew what we know when we look at something, or even explaining it to us when we don&#8217;t know what we&#8217;re looking at.  These are true problems that need a solution.  How can we add semantic properties to SVG, to the Web in general?</p>
<h2>Technical Solutions</h2>
<h3>RDF</h3>
<p>I&#8217;ve already mentioned RDF.  This is one way to add extra information to the geometric presentation.  Let&#8217;s use molecules as an example (see <a title="Molecule SVG source" target="_blank" href="http://www.schepers.cc/svg/metadata/rdf/carbon_dioxide.txt">source</a>):</p>
<p><iframe width="500" scrolling="no" height="201" frameborder="0" src="http://www.schepers.cc/svg/metadata/rdf/carbon_dioxide.svg">Please use FF1.5+, Opera 9+, or IE with an SVG plugin!</iframe></p>
<p>The code would look something like this&#8230; I say <em>something like this</em> because A) this is only a snippet and B) I&#8217;m no RDF guru and I&#8217;m sure I got something wrong:</p>
<pre>&lt;svg&gt;
&lt;line x1='840' y1='170' x2='600' y2='170'&gt;
&lt;metadata&gt;
&lt;rdf:RDF&gt;
&lt;rdf:Description rdf:about="http://www.example.org/bond"&gt;
&lt;m:atom_link rdf:resource="http://www.example.org/carbon-oxygen" /&gt;
&lt;/rdf:Description&gt;
&lt;/rdf:RDF&gt;
&lt;/metadata&gt;
&lt;/line&gt;

&lt;g fill='gray'&gt;
&lt;metadata&gt;
&lt;rdf:RDF&gt;
&lt;rdf:Description rdf:about="http://www.example.org/carbon"&gt;
&lt;m:bonds_to&gt;
&lt;m:oxgen rdf:about="http://www.example.org/oxgen" /&gt;
&lt;/m:bonds_to&gt;
&lt;/rdf:Description&gt;
&lt;/rdf:RDF&gt;
&lt;/metadata&gt;
&lt;circle cx='400' cy='170' r='20'/&gt;
&lt;circle cx='400' cy='230' r='20'/&gt;
&lt;circle cx='600' cy='170' r='20'/&gt;
&lt;circle cx='600' cy='230' r='20'/&gt;
&lt;circle cx='500' cy='200' r='100' fill='url(#carbon_fill)'/&gt;
&lt;/g&gt;
&lt;/svg&gt;</pre>
<p>The &lt;metadata&gt; element as a child element is obviously associated with its parent, thus &#8220;tagging&#8221; it.  There is no established method for placing the &lt;metadata&gt; element to best associate collections of elements, but setting it as a child of a &lt;g&gt; (group) element containing all the relevant children seems a likely choice.  This allows us to say that circle is a carbon atom, and that line is a molecular bond.  On the downside, RDF is a bit verbose, and it isn&#8217;t ideal to tag text.</p>
<p>Text content in RDF (or any given XML format) is not visible.  This is because SVG dictates that content inside unknown elements (that is, elements not defined in the SVG specification) is not rendered, even if it is also the child of a &lt;text&gt; element.</p>
<h3>Microformats</h3>
<p><a title="Microformats homepage" target="_blank" href="http://microformats.org/">Microformats</a> may be a good option here.  While most people associate the <em>class </em>attribute with CSS styling, both the <a title="HTML class attribute" target="_blank" href="http://www.w3.org/TR/html4/struct/global.html#h-7.5.2">HTML</a> and <a target="_blank" title="SVG class attribute" href="http://www.w3.org/TR/SVG/styling.html#ClassAttribute">SVG</a> specifications list one of its roles as &#8220;general purpose processing by user agents.&#8221;  I haven&#8217;t seen any microformats user agents (browsers or other programs) that specifically support SVG, but it shouldn&#8217;t be hard.  The basic syntax and functionality would be the same.  Basically, you might say something like <em>&lt;text class=&#8221;acronym&#8221;&gt;SVG&lt;title&gt;St. Vincent and the Grenadines&lt;/title&gt;&lt;/text&gt;</em>.</p>
<p>A minor problem with this is that SVG Tiny 1.1 doesn&#8217;t have the class attribute, but this should work fine with SVG Full, and SVG Tiny 1.2 does include the class attribute specifically for uses other than styling (e.g. micoformats).   A potential negative with microformats itself is that since it is such a loose ontology, I&#8217;m not sure it will scale up to complex contexts, and I think that it will be subject to the same semantic drift that natural languages suffer (where one word takes on a new meaning, like &#8220;decimate&#8221;); in fact, since there will be no immediate feedback, as is present in spoken language, I think the drift will be accellerated.  This makes for stale semantics down the line, but it will probably suffice for the short term (and who knows how long the content will last anyway?).</p>
<h3>XBL2</h3>
<p>Finally, there is a new technology being published by the W3C, one originating with Mozilla, called the <a title="eXtensible Binding Language Spec at W3C" target="_blank" href="http://www.w3.org/TR/xbl/">eXtensible Binding Language (XBL2)</a>.  Among other things, it allows an author to create templates that attach a presentation format, like HTML or SVG, to any arbitrary XML.  So you could take your Chemical Markup, or Genealogy Markup, or whatever, and create XBL SVG  templates for it.  Whenever there is a element that matches one of your templates, the appropriate SVG (or whatever) markup is created and tied to the target element, preserving the original semantics while providing an appropriate visualization.  Since this is a new technology, the obvious drawback to this approach is that there aren&#8217;t yet browsers that support it.  Also, it may not be appropriate for mobile devices.  But I think this will be the rich option available within the next couple of years.</p>
<h3 />
<h2>Conclusions</h2>
<p>SVG has inherent accessibility features, and specific built-in geometric semantics.  With a little work, semantically rich documents and applications can be built using SVG in combination with the appropriate ontology and a semantic tagging mechanism.</p>
]]></content:encoded>
			<wfw:commentRss>http://schepers.cc/svg-text-semantics-and-accessibility/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

