37 thoughts on “Formata Non Grata

  1. Good work, Doug.  Never attribute to malice that which is adequately explained by neglect.  At least Bing gets it right.

  2. While I find the premise of the article interesting, I find it very surprising that your search results where not compared with any other search engine results.  Without direct comparisons, these results are far too ambiguous to be meaningful.

  3. But what about *.xml files that contain svg? A file can use scalable vector graphics without being named as such.

  4. wow.  I have been waiting years for svg to catch hold as I am amazed at its capability and potential.  Your information is interesting at the very least.  There has to be a more sensible explanation for 1-no browsers building in support for a great technology and 2-the largest web indexer not indexing it than merely negligence.  So, what is it?

  5. I do honestly believe that the reason svg isn’t use more is lack of support in browsers. Most don’t want to use tech that require there users to install plugins unless it creates significant advantages. SVG doesn’t have a significant enough advantage in most cases. If it was supported in all browsers without plugins it probably would have significant enough advantages to be used.

    1. Caleb, I agree that that has been a bottleneck, but with IE9’s SVG support rounding out the native SVG support in all other modern browsers, and script libraries that transcode SVG into VML or Flash for older browsers, I think that limitation is a thing of the past.

  6. James, for some additional context, the comments about relative sparsity of SVG content relate to the results of a study of Google results, at http://code.google.com/webstats/index.html

    So, my post is to some degree a reflection on that study.

    That said, I’m not sure why you think it’s critical that I compare results from different search engines; that might yield some interesting data, but it wouldn’t address my specific question.  Google is the most widely used search engine, by far, so much so that it has become the kleenex of web searches. The fact that Google doesn’t report SVG files in it’s search results is clearly relevant to the question of why it’s hard for people to find SVG content.

    If you were to do a more comprehensive analysis of how well a range of search engines index and report SVG files, and let me know what you find, I would be happy to link to your article.  It might help us glean some wisdom about how to make SVG more search-engine-friendly, and how search engines can best present SVG content in their results.

  7. Colin, I don’t think my research was at all comprehensive, but I suspect that most SVG files do have a “.svg” or “.svgz” file extension.

    As SVG content is mixed more with HTML, especially with script libraries like Raphaël, it will be harder still to detect the use of SVG, at least with the crude methods I have at my disposal.

    Both Google and Opera have done studies around the most common elements used, and that would be a much better metric.

    But even assuming that there is stealth SVG content out there lurking behind “.xml” (and assuming that searches for “.svg” aren’t treated as shorthand for MIME Types), that content is still not presented in the search results, so it doesn’t help the findability.


  8. There are a lot of things google should index, but don’t. E.g. Google don’t index man- and info-documents either, making them very hard to find, despite that most of them are available somewhere on the web. I regularely need to read manuals for things that I don’t have installed on my computer (or if the creator of the linux-distribution have removed manuals to save space). On their makers sites they are usually inside some software package, repository or revision control system, and not viewable unless you download megabytes of other stuff or register as a user. But most of these documents are available as stand-alone-files through other sources on the web, but near impossible to find as they are not indexed by any search engine. You can’t even find those documents through google-code-search, even though it indexes files inside revision control systems AND show other files inside the same repositories as webpages.
    Inkscape is excellent for creating iamges, but it sucks as a tool for creating SVG. I’m not surpriced if nobody use Inkscapes SVG-files as their final product. Inkscapes SVG files is not good for the web or with other applications (not even those saved with the “plain svg”-file alternatiev in  the save dialog) and Inkscape lacks features for making web-suitable SVGs. Inkscape SVG-files is an intermediate file format, nothing else. Its predecessor Sodipodi is exellent for creating web-SVG (or SVG suitable for other applications to use), but has beeen in a half finished state for years and is not even available through the ubuntu-repositories anymore.

  9. The problem is that Google only index documents, not the metadata within the documents.  I wouldn’t expect Google to pick out GIF comments, or EXIF headers, or ID3 tags, so I similarly don’t expect it to look “inside” an SVG.
    PDF, doc, etc. are text-heavy, document representations that are very useful to search.  The metadata in a GIF file isn’t (on a global scale), and requires reading every single GIF.  Not expand that to JPEG and everything else and instead of downloading HTML, CSS, Javascript and “known” documents, you have to download, analyse and publish details of every single JPEG, MP3, etc. if you want to move your parameters to include SVG metadata results.  You can see what an ENORMOUS burden that would be even on Google and poor webmasters, who already complain if Google indexes them too often and sucks up their bandwidth, will be out for blood if the Googlebot suddenly starts downloading every 10Mb MP3, huge JPEG, etc. on their website too.
    Google doesn’t even download every JPEG – the Image Search algorithms work on the data and keywords given in the HTML itself that links to the JPEG.  Only the images that actually appear in searches are slowly analysed for size data and thumbnail creation on a throttled “delay” basis – they don’t immediately download every JPEG they see and start chopping out the size of it.  The costs would be enormous.
    Similarly, the look-inside-a-PDF/DOC/ODT functionality is user-driven (i.e. when someone WANTS to view that PDF as HTML) and then cached.
    The exact same problem applies to Flash in that Google can’t “see inside” the SWF files – it would take too much effort/bandwidth/processing to do it automatically, the user can’t “bookmark” it anyway, and there’s little of use in the millions of 25Mb+ SWF files out there.
    Indexing SVG specificially is a bit of a silly idea and it would start a precedent that would see Flash, Java apps, video formats, MP3’s etc. all similarly indexed all the time.  Webmasters would cry foul, Google’s traffic usage would go through the roof and people *still* wouldn’t be bothered to search for an in-file tag if, the majority of the time, they can’t see that tag when they look at the resulting page/file.
    It’s a silly idea, and all it could ever do would be to massively decrease the signal-to-noise ratio of the net in general.

    1. Lee, you seem to have a preconception about what an SVG file is likely to contain. Obviously, most SVGs are rather sparse on text, but many contain significant text or metadata, specifically information graphics and pages with layout options HTML+CSS doesn’t give you. Those graphics are definitely signal, not noise.
      The size of SVG images doesn’t normally approach that of the binary formats like MP3 and JPEG , and the relative metadata/text-to-other-markup ratio is reasonably high in many SVG, comparable to HTML. (BTW, you often wouldn’t need to download the entire binary file to extract metadata like ID3 in MP3s or XMP in JPEGs, so I don’t think your argument there holds water anyway; I’d be interested in what Google actually does there.)
      As for your comparisons with raster images and binary text documents, you are right that they are not necessarily downloaded and that they are user-driven… but you don’t seem to realize that they are indexed and displayed as part of the search results, otherwise you wouldn’t know about Google’s policy regarding them. SVG currently doesn’t seem to enjoy that same privilege, not even to the level of support given to rasters; even if Google merely treated SVGs on par with raster images, that would be an improvement.
      I appreciate your giving me food for thought, but I think you should also reexamine your premises about what SVG is and can be used for, and how readily and usefully it can be indexed.

  10. An interesting Fact about Google’s lack of interest in SVG:
    The Android web browser supoports Flash (which I dont give a damn about, but is important to Google which makes money out of serving annoying flash ads), but amazingly does not support SVG!!
    iPhone does not support Flash, but the iPhone browser will render SVG no problem.
    What does that say about Google’s support for Open Standards?

  11. You’re only counting SVG that is in a separate file. You haven’t accounted for SVG that is embedded directly in the web page. SVG only really gets interesting when it is embedded directly in the page and you can treat it just like any other web page element. To index the presense of SVG properly, you would need need to scan the individual web pages and look for SVG in the page. It’s like concluding that nobody uses “div” tags because you can’t find “div” files.

    The other problem you would have is that publicly accessible web sites are the least likely place to find SVG. There are still a lot of copies of MS IE in use, and that browser doesn’t know how to use SVG. If you are running an e-commerce site, then you need to write pages for the browser with the lowest capability, and that is MS IE.

    All of the other major browsers do use SVG however. If your web pages are used on an internal network where you can dictate which browser to use, then you can simply tell the users that they need any browser other than MS IE, and it will work.

    I have a software project that uses SVG as part of an industrial HMI system.

    Here’s a link to some demos:


    (The latest version of Apple Safari happens to have a bug that prevents the tank column indicators from updating, but the rest should work. There’s no known bugs for current versions of Firefox or Opera or for older versions of Safari.)

    Note that I’m using XHTML files here – there’s no SVG files to be found. However, those XHTML files are mainly SVG. There *are* SVG files involved, but you drag and drop them onto an Inkscape drawing and then embed that complete drawing into the web page. People are using this to monitor industrial processes, but they don’t put these web pages on the Internet for obvious security reasons. I know of several other HMI systems (some Free Software, some proprietary) who are also using SVG for their graphics.

    I’ve been doing this for several years and I’m very happy with the way SVG works. I have tested with Firefox, Opera, Chrome, Safari, Epiphany, and Midori and compatibility between browsers has been very, very, good for the sort of things that I want to do. There are occasional bugs (the new bug with Safari is a severe annoyance), and some browsers have come out with new features before others (animation is a new area), but the compatibility situation with SVG is no worse than it is with HTML or CSS. MS IE doesn’t work with SVG, but there are a lot of other AJAX features that I am using that won’t work in MS IE, so even if MS came out with full SVG support today, IE still wouldn’t be able to run these web pages.

    So, I’m very pleased with how SVG works. I also very pleased with how you can create SVG drawings using Inkscape, embed them in your web page, and then animate them with Javascript and CSS.

    If there is a problem, it isn’t an SVG problem. It’s an MS IE problem, and SVG isn’t the only problem that MS IE has. If you want to do anything practical with advanced web features like SVG you need to forget about making it work with MS IE. Once you make that decision, a lot of possibilities open up that weren’t available to you before. The sort of applications that I am interested in are for use on controlled internal networks and users have no objection to installing or using a browser that has SVG built in. It’s usually a lot easier to install a browser that has SVG built in to it than it is to install and use any of the alternative technologies.

    1. M Griffin, yeah, I realize that as one of many flaws in my methodology… I’d love to see someone do a more in-depth analysis of the situation. But presumably, any indexible SVG in (X)HTML actually will be indexed. This seems to bear out, too: a quick search for “svg translate circle” http://www.google.com/search?q=svg+translate+circle gives me pretty much exactly what I was looking for: my 2003 SVG translation/accessibility experiment page, containing both an externally referenced SVG circle with some text as well as an inline version, was the 3rd hit, with the summary text including, “This is a circle”, a phrase from the inline SVG:

      But I don’t think most traditional uses of SVG were inline; this may change because of script libs like Raphaël that inject SVG directly into the DOM, and because HTML5 better defines the behavior for SVG inline in HTML. Will that make SVG more or less “findable”? I guess what’s important is that the content will be more findable, regardless of the markup used.

  12. Your page is up to #1 now (no doubt Slashdot has an influence on that). However, the search finds your web page, and the text “This is a circle” is right on the page. I think that’s what it found.

    I don’t disagree with you about whether Google should index SVG files. For example, I can think of lots of applications where someone might want to search for SVG files related to pumps. It might also be a good idea to index it as a selectable image type.

    As for traditional uses of SVG, I think they fall into two categories. One is line drawings for Wikipedia (and Wikipedia derivatives). They serve up SVG to save bandwidth (as compared to PNG or JPEG files) and to give a better quality image. The other use is as an off-line graphics format. *Those* files from the second category just don’t typically end up in a place where Google can see them.

    The new category of use seems to be in AJAX type web applications where the page isn’t on the public Internet. Perhaps I’m in an unusual field, but over the last couple of years a number of unrelated systems using SVG have popped up in applications involving monitoring of machinery. I don’t know of *anyone* using Flash for this type of thing (although a few people might be doing one-off applications with it). What SVG seems to be competing with there is Java applets and ActiveX components. It’s not something that you would typically see on the public Internet however. It works fine though on controlled internal networks.

    As for injecting SVG into the DOM using Javascript, my experience with that has been that it works only for very simple cases. If you want to do anything more complicated than that (as in the demos that I pointed to) you really need a graphical tool where you can position the SVG elements by dragging them around. I tried doing this using traditional web development techniques, and it’s just too slow and cumbersome. Text and paragraphs in HTML can just flow wherever they need to, but complex graphics need to be correctly positioned relative to other elements. Tweaking X and Y coordinates in a text editor just takes too long.

    The demos that I pointed to in my earlier post actually use several techniques. The first one uses a drawing where the graphics are individual clip art elements which I assembled and positioned using Inkscape. I created the individual pieces of art in a text editor (I’m not an artist, so that was easier for me), but the final assembly was done with Inkscape. The complete drawing (with multiple layers) was then pasted as is into the web page. It’s about 175 kB of text, so it’s not really practical to tweak it by hand. Other people are doing things like drawing entire process plants with Inkscape and then animating the finished drawing.

    The second and third demos were created a different way. I created SVG “prototypes” and then pasted them into the drawing as assemblies. I then clone them and insert the where needed under the control of Javascript.

    I tried injecting basic SVG elements into the DOM, and I tried using traditional web techniques. It was like trying to build a house out of toothpicks. I can’t even imagine how people who talk about doing this sort of thing with “canvas” expect that idea to work with complex graphics. I just don’t see a realistic alternative to SVG in these sorts of applications available anywhere as an open standard. We either use SVG or it will be the 1990s in the web world forever.

    The *biggest* problem that I had when I started using SVG was finding good examples about how to do what I wanted. I had to do a lot of research and experiments to make things work, and I had a lot of failed efforts before getting something I was satisfied with. I’m not a web 2.0 campaigner or an SVG promoter, just someone who was trying to solve a very practical problem and thought that a web browser was a good solution if I could just figure out how to do some nice graphics. Better visibility for SVG in searches would probably have helped me a lot. I can’t argue with you on that point.

    I think what has held up SVG up to this point was two things. One was the market share held by MS IE. That has fallen to the point that IE can now (for some applications) be ignored. The other was that computers and web browsers had to get fast enough to make doing complex things with SVG practical. I think we really only got to that point a couple of years ago.

    I don’t want to blather on too much about this, but it’s a topic that interests me.

  13. Doug, great article. It is a real shame that Google doesn’t index SVG files. The W3C went out of its way back in 1998-2001 to ensure that SVG files would be an easy target for search engines.
    It probably would take Google less time to implement at least partially useful searching SVG files than it would take to read this post and everyone’s comments. The SVG language places all textual content as XML text nodes and everything that isn’t textual content is stored in attributes. As a result, Google and other search engines could extract reasonably useful search information on files with extension of *.svg (or *.svgz if the server is using HTTP 1.1 for compression) without XML parsing by simply extracting all text between the characters “>” and “</”. This would provide immediate value with little effort. For better searches, Google could actually parse the XML files and then extract out the XML text nodes.
    For finding hyperlinks in an SVG file, there is also a simple string-based approach and a more robust XML approach. The simple approach would be to look for the string xlink:href= and then extracting anything between matching single quotes or double-quotes. (I expect that 99% of SVG files use the namespace prefix of xlink:). The more robust approach would be to parse the XML into a DOM and look for all href attributes in the XLink namespace.

  14. Doug, well expressed… SVG has never enjoyed the attention deserved, but slowly over time it has gained increasing support. Even MS is supporting it now, with the next IE -what does Bing do in this regard? Maybe if Google listens they will do better at indexing the web.

  15. Along the lines of what @Jon said, your article made me wonder how Google treated other XML applications and XML docs themselves. Could this be a larger issue of Google not playing well with XML in general?
    Doing a quick search on file types of XML, RDF, and RSS, while those files were indexed, the content served wasn’t necessarily what I would expect (lots of HTML/XHTML masquerading in files not ending in .html — perhaps server-end fudging?).
    Does anyone know if Google knows how to read XML text nodes in general? Seems like, at simplest/dumbest, Google could just look for non-attribute text in XML-based docs. While not the best solution obviously, it would at least be recorded.

  16. You can find many SVG with the picture option of Google. In the Cache, Google show a png-Version, but there is a Link to the Original-Site.
    Pretty cool for Webdesigners is the embedding of SVG in HTML-Markup. A reference should be easy for Google, cause you (ever) have an SVG Tag.  Actually is there the limitation, that only in XHTMLl the modern Browser show this content (at the end of this year IE9 and Fox4 change this).
    Inkscape is a great tool. New in Version 0.47 you can save the SVG as optimezed SVG, which clean the code. In 2-3 simple steps you can round the floats in transform and path.
    You look for example? Take a look:

    http://www.batik-gbr.de  (Inlinecode in (x)html, all graphic made with Inkscape)
    http://www.pixelfans.de/test/svg/browsercheck.svg (a browsercheck without Javascript)

  17. Certainly no solution, but maybe a temporary helper/catalyst idea:
    Pushing HTML+PNG versions into Google index
    – Minimize the information lost in the translation,
    – Have a link to the original
    – Have a SVGoogle.com/….original_SVG_URL
    – Have scripts, crawlers and community fill database
    – Maybe Wikipedia people can help (host)
    PS: you have some PHP code at the bottom of this page?

  18. I’m not surprised that a lot of people see Inkscape primarily as a tool for local files and printing – it’s often billed as an Open Source equivalent to Illustrator, and I’ll wager that there are also far more people creating Illustrator files for print than for the web.
    With the current situation regarding IE and SVG files. I suspect that even web-savvy Inkscape users often see it as a tool for creating PNGs to use online, rather than using the SVGs directly. For example my own webcomic, “The Greys“, is created entirely in Inkscape, but for the sake of cross-browser compatibility our primary output is a rendered PNG image of each strip.
    We do also make the Inkscape SVG files available for download – and we embed plenty of metadata, including license info and full transcripts, in the hopes that one day Google will index them. They already render pretty well in Firefox and other non-IE browsers, but the lack of support for filters in IE9 will mean that it will be some years yet before we can switch to SVG as our primary format on the web site.

    1. MarkC, have you considered proactively publishing in SVG with a PNG fallback? It wouldn’t be any more work, and it would still work for IE users. It would also send a message about SVG being ready for prime time… the more people do that, the higher the confidence in others it inspires.

  19. Hi Doug, long page, please excuse me if I missed some paragraphs hidden in there.
    First, beware those “pages found” numbers from Google… as you noted towards the end of the essay, these will change with the wind. Has been a longstanding issue. Bottom line is to just not trust those “X pages found” numbers.
    Secondly, it’s interesting that you tried “filetype:svg”. In the past I don’t recall getting results for this, but today I do. SVG is not listed in Google’s arguments for the “filetype:” restriction: http://www.google.com/help/faq_filetypes.html I’m not sure what’s changing here.
    Third, are we sure that Google Websearch does not find the text within .SVG files? I did some quick testing with terms like “filetype:svg maori” and “filetype:svg maori -inurl:maori”, and seem to have found some SVG files where the term “maori” is only within the content, not within the URL. I may have erred here though, but the results seem more heartening than was suggested in the prior conclusion…?

    1. Hi, John, you’re right of course that the number of results are ephemeral… it’s just the only tool I could think of that I had ready access to.

      As for the “maori” results, if you look closely, you’ll see that the links in the results are to the HTML pages that describe those SVG files, not to the SVG files themselves. In fact, while the pages do link to SVG files, the rendered image on the page itself is a PNG. And in the “Maori earth oven” SVG file, the word “Maori” doesn’t appear anywhere other than the file name. So I’m pretty sure that my conclusion is sound.

      But I do think there’s a bright side: Google could change its treatment of SVG files. Even simple steps could be a real improvement, and just as people improved their HTML pages for SEO, so can we promote similar best practices for SVG files.

  20. “MarkC, have you considered proactively publishing in SVG with a PNG fallback?”
    I did consider it early on, but decided against it for a couple of reasons:
    1) We use WordPress with the ComicPress theme for the site for speed and simplicity, and I don’t know of a straightforward way to implement SVG+fallback code in that environment without creating the sort of maintenance issues we’re trying to avoid in the first place
    2) Even amongst browsers that do display SVG, not all renderers are equal. I usually test with Firefox which does a reasonable job of rendering the comics, but not perfect (when compared with the results we see in Inkscape). As we consider our primary output to be the comic itself, with promoting Inkscape and SVG as secondary to that, we’d rather have a “fixed” version of each strip that renders essentially identically across browsers.
    I’d like to use the SVG as the primary output eventually – the filesizes (expecially when gzipped) are usually smaller for a start, not to mention the possibilities offered by SMIL and Javascript*, but the browser support and associated tools just aren’t good enough yet. I will continue to keep an eye on the situation though, and hope to be able to revise that position one day.
    * As an incentive to download the SVG files, each of our comics contains one or more Easter Eggs, which are usually only accessible in the SVG. Some of those eggs do use JS if the file is loaded in a browser, and we’re experimenting with SMIL-based Easter Eggs which will probably be used once Firefox 4 is released.

  21. MarkC, Cool – I like to hide a couple easer eggs in my SVG comics too.  I haven’t figured out the right way to automate the rasterize, fallback, embedding thing, but I found it pretty straightforward to just do this:
    &lt;object style=”padding-left:75px” width=”600″ height=”900″ type=”image/svg+xml” data=”http://www.codedread.com/comics/005.svgz”&gt; … put img referencing png here … &lt;/object&gt;
    directly in the WordPress editor.  As long as it’s one line, I don’t think WP should munge up the elements…
    See http://www.codedread.com/blog/archives/2010/04/01/005/

  22. Ergh, I never know when angly-brackets will get escaped or not.  That should be
    <object style=”padding-left:75px” width=”600″ height=”900″ type=”image/svg+xml” data=”http://www.codedread.com/comics/005.svgz”>[img element with png fallback here]</object>

  23. As a newbie to SVG, I find examples by searching for the string i’m looking for, such as ‘path d=’  and I get some nice hits.  The search results are based on the content of the document.

    1. Susan, yes, great point. It shows that Google apparently does index the contents of SVG files, and could present them in their search results in more useful ways. Your specific search is clever and useful for a particular kind of search, but it should be more broadly applicable.

  24. Jeff, the WordPress side of the problem is more down to using the ComicPress theme which inserts our comic images directly without providing access to the code.
    There’s also the mixed capabilities of SVG-enabled browsers to consider. It’s not too bad right now, but when IE9 lands anyone using it to view the SVG versions of our comics won’t get any of the filters appearing, for example. The object tag lets you fall back to PNG when the browser doesn’t support SVG at all, but to fallback when there is SVG support but it lacks some essential features would require quite a bit more work.
    Hopefully in a post-IE9 future ComicPress, or something similar, will make it easier to automate this process. Perhaps let you upload an SVG file, auto-parse it for essential requirements, and auto-fallback based on the browser capabilities. I suppose I can dream…
    Even where support is present the rendering isn’t always the same, of course – the text on your comics (love the OuijaPad, BTW) tends to leak out of the speech bubbles on my Ubuntu+Firefox system, and I’ve noticed similar issues with my own comics when viewed in a browser.

  25. Hi! I am a webmaster trends analyst here at Google and one of the things that I do is to get our engineers in touch with issues like this :). This is an interesting blog post and a great discussion about SVG files. I’m not completely in the picture on SVG files and how their contents might be indexable or useful in web-search, but I’ll certainly pass this on to the teams here at Google to review.
    If you have more compelling use-cases where the contents in SVG files would be highly relevant (but not findable in the search results at the moment), it would be great to hear about them. I realize that adoption always depends on a number of factors; I know our teams are aware of SVG content, but maybe there’s something here that can help speed things up on our side as well :).

    1. John, thanks for replying! I will pull together some examples, and get back to you. Optimizing SVG searchability should be a collaborative effort between search engines, the SVG Working Group, authoring tool vendors, and content creators, so we are happy to have an ongoing discussion with the Google search team.

  26. SVG is great. Even this fairly extensive post doesn’t cover all aspects of SVG. SVG enjoys similar love other open source and web 2.0, and potential is still largely untapped. SVG will most likely remain a developers technology, even though it helps publish great web content. For example a relative newcomer to SVG can quickly design some compelling svg content without learning flash programming or css+javascript hacks or worse publishing content with text as a jpeg.
    In early svg days i stumbled onto and got very excited by tkzink, but that hasn’t progressed. Anyway, all these things I’m writing about is a fairly hard slog to discover on the web using a search engine. Getting to yet undiscovered svg content (even if it’s a blog about SVG such as this) is a losing proposition from the outset. You have to be prepared to cast your search net far and wide, and spend time wading through false hits. No wonder the general internet public has little or no exposure or say about SVG. In the end the results will show in the pudding, so to speak, when SVG content becomes more discoverable.
    Better indexing of SVG web content by google would be most welcome. Just like web2.0/ajax, wikipedia, youtube and many other great developments..this sort of thing will be driven by people who have courage and can reach some buttons.

Comments are closed.