{"id":393,"date":"2004-08-24T23:06:30-04:00","date_gmt":"2004-08-25T07:06:30+00:00","guid":{"rendered":"http:\/\/peterjanes.ca\/wordpress\/?p=393"},"modified":"2006-04-30T14:10:47-04:00","modified_gmt":"2006-04-30T19:10:47+00:00","slug":"xmdp-style-robot-profile","status":"publish","type":"post","link":"https:\/\/peterjanes.ca\/blog\/2004\/08\/24\/xmdp-style-robot-profile\/","title":{"rendered":"XMDP-style Robot&nbsp;Profile"},"content":{"rendered":"<div class='e-content'><blockquote cite=\"http:\/\/ln.hixie.ch\/?start=1093386514&amp;count=1\" title=\"Ian Hickson\"><p>I&#8217;m thinking that HTML should have an element that basically says <q>content within this section may contain links from external sources; just because they are here does not mean we are endorsing them.<\/q><\/p><\/blockquote>\r\n\r\n<p>I&#8217;m not convinced an HTML extension is necessary or desirable.  Instead, I think this might be better handled through a <a href=\"http:\/\/www.tbray.org\/ongoing\/When\/200x\/2004\/08\/19\/BackDoor\">back door<\/a> approach.<\/p>\r\n\r\n<blockquote cite=\"http:\/\/www.markcarey.com\/googleguy-says\/archives\/discuss-google-superbots-coming.html\" title=\"&#8220;GoogleGuy&#8221;\"><p>&#8230;imagine allowing a div that lets you block out links or sections of a page not to index\/follow.<\/p><\/blockquote>\r\n\r\n<p>It&#8217;s a simple idea, really: use an <a href=\"http:\/\/gmpg.org\/xmdp\/\">XHTML MetaData Profile<\/a> and specially-named classes to indicate whether the content of an element should be considered in some way <cite class=\"term\">undesirable<\/cite> to a robot.  This has several advantages over creating a new element, particularly that it preserves backward and forward compatibility with HTML and XHTML: it&#8217;s just a use of an ability that&#8217;s there already.  It also has an advantage over <a href=\"http:\/\/bradchoate.com\/weblog\/2002\/02\/18\/restricting-google\">specially-formatted comments<\/a> in that it doesn&#8217;t require <cite class=\"term\">tag-soup<\/cite> handling of XML documents.<\/p>\r\n\r\n<p>Like <a href=\"http:\/\/gmpg.org\/xfn\/\">XFN<\/a>, the use of special classnames would be indicated by an HTML <a href=\"http:\/\/www.w3.org\/TR\/html401\/struct\/global.html#profiles\">metadata profile<\/a>.  It might be desirable to have the classnames <a href=\"http:\/\/www.w3.org\/2002\/12\/namespace\">namespaced<\/a>&#8211;really just another use of the profile attribute&#8211;or otherwise made unique so they could be identified without the profile, but for now I&#8217;ll stick with the simple case.<\/p>\r\n\r\n<p>On to some examples.  The first shows the use of the profile <samp>http:\/\/example.org\/ignore<\/samp>, which indicates that the content marked with an <samp>ignore-content<\/samp> class is not to be used in indexing the page.<\/p>\r\n<pre><code>&lt;head <strong>profile=\"http:\/\/example.org\/ignore\"<\/strong>&gt;\r\n...\r\n<em>&lt;div <strong>class=\"ignore-content\"<\/strong>&gt;There once was a man from Nantucket...&lt;\/div&gt;<\/em>\r\n&lt;p&gt;This is not about <em>&lt;span <strong>class=\"ignore-content\"<\/strong>&gt;porn&lt;\/span&gt;<\/em>.&lt;\/p&gt;<\/code><\/pre>\r\n\r\n<p>Next, let&#8217;s mark some links as not to be followed; <samp>example.{tld}<\/samp> links shouldn&#8217;t be followed anyway, but this will reinforce that.  The text is fair game, though.<\/p>\r\n\r\n<pre><code>&lt;head <strong>profile=\"http:\/\/example.org\/ignore\"<\/strong>&gt;\r\n...\r\n&lt;p <strong>class=\"ignore-links\"<\/strong>&gt;This is <em>&lt;a href=\"http:\/\/example.com\/bogus\"&gt;a bogus link&lt;\/a&gt;<\/em>\r\nand so is <em>&lt;a href=\"http:\/\/example.net\/bogus\"&gt;this&lt;\/a&gt;.&lt;\/p&gt;<\/em><\/code><\/pre>\r\n\r\n<p>Finally, we&#8217;ll cause an entire page to be ignored, similar to the <samp>&lt;meta name=&#8221;robots&#8221; content=&#8221;noindex,nofollow,noarchive&#8221;&gt;<\/samp> convention.<\/p>\r\n\r\n<pre><code>&lt;head <strong>profile=\"http:\/\/example.org\/ignore\"<\/strong>&gt;\r\n...\r\n&lt;body <strong>class=\"ignore-content ignore-links\"<\/strong>&gt;\r\n<em>&lt;p&gt;The &lt;a href=\"http:\/\/example.com\/\"&gt;hot girls&lt;\/a&gt; cooled off with a glass of ice water.&lt;\/p&gt;<\/em><\/code><\/pre>\r\n\r\n<p>So, there it is.  Is this worth following up?  Would <a href=\"http:\/\/gmpg.org\/\">GMPG<\/a> be interested?  (I think it follows <a href=\"http:\/\/gmpg.org\/principles\">their principles<\/a>.)<\/p>\r\n\r\n<p>These are a few of the references and sources for this proposal that haven&#8217;t been linked above.  In no particular order, and subject to expansion:<\/p>\r\n<ul>\r\n<li><a href=\"http:\/\/simon.incutio.com\/archive\/2004\/05\/11\/approved#comment9\">Jim Winstead<\/a> and other commenters<\/li>\r\n<li>Tim Bray&#8217;s <a href=\"http:\/\/www.tbray.org\/ongoing\/When\/200x\/2003\/10\/15\/StillNoWebSite\"><cite class=\"title\">There&#8217;s Still No Such Thing as a Web Site<\/cite><\/a> and <a href=\"http:\/\/www.tbray.org\/ongoing\/When\/200x\/2003\/07\/30\/OnSearchTOC\"><cite class=\"title\">On Search<\/cite><\/a> (particularly <a href=\"http:\/\/www.tbray.org\/ongoing\/When\/200x\/2003\/07\/29\/SearchMeta\"><cite class=\"title\">Metadata<\/cite><\/a>)<\/li>\r\n<li><a href=\"http:\/\/www.robotstxt.org\/wc\/exclusion.html#meta\">Robots Exclusion <samp>META<\/samp> Tag<\/a><\/li>\r\n<li><ins datetime=\"2004-08-25T20:28:00-05:00\"><a href=\"http:\/\/www.lachy.id.au\/blogs\/log\/2004\/08\/link-relationships\">Lachlan Hunt<\/a> has another take.  It&#8217;s more focused on links, which is what Hixie&#8217;s original post was about, and incorporates even more metadata about each link.  <del datetime=\"2004-08-27T19:52:00-05:00\">I&#8217;m not sure a lot of that metadata describes <cite class=\"term\">relationships<\/cite>, which is part of the reason I went with classes instead of <samp>rel<\/samp>.<\/del><\/ins><ins datetime=\"2004-08-27T19:52:00-05:00\">I&#8217;ve seen the light, thanks to the discussion below.<\/ins><\/li>\r\n<\/ul>\r\n\r\n<p><a href=\"http:\/\/creativecommons.org\/licenses\/by\/2.0\"><img decoding=\"async\" src=\"http:\/\/creativecommons.org\/images\/public\/somerights.gif\" style=\"width:88px; height:31px; border:none\" alt=\"Creative Commons License\" \/><\/a> This item is licensed under a <a href=\"http:\/\/creativecommons.org\/licenses\/by\/2.0\">Creative Commons License<\/a>.<\/p><\/div><div class=\"syndication-links\"><\/div>","protected":false},"excerpt":{"rendered":"I&#8217;m not convinced an HTML extension just to mark content as unendorsed is necessary or desirable.  Instead, I think this might be better handled through a <a href=\"http:\/\/www.tbray.org\/ongoing\/When\/200x\/2004\/08\/19\/BackDoor\">back door<\/a> approach: <a href=\"http:\/\/gmpg.org\/xmdp\/\">XMDP<\/a>.","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"mf2_syndication":[],"venue_id":0},"categories":[3],"tags":[],"kind":false,"_links":{"self":[{"href":"https:\/\/peterjanes.ca\/blog\/wp-json\/wp\/v2\/posts\/393"}],"collection":[{"href":"https:\/\/peterjanes.ca\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/peterjanes.ca\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/peterjanes.ca\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/peterjanes.ca\/blog\/wp-json\/wp\/v2\/comments?post=393"}],"version-history":[{"count":0,"href":"https:\/\/peterjanes.ca\/blog\/wp-json\/wp\/v2\/posts\/393\/revisions"}],"wp:attachment":[{"href":"https:\/\/peterjanes.ca\/blog\/wp-json\/wp\/v2\/media?parent=393"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/peterjanes.ca\/blog\/wp-json\/wp\/v2\/categories?post=393"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/peterjanes.ca\/blog\/wp-json\/wp\/v2\/tags?post=393"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}