I’m thinking that HTML should have an element that basically says content within this section may contain links from external sources; just because they are here does not mean we are endorsing them.
I’m not convinced an HTML extension is necessary or desirable. Instead, I think this might be better handled through a back door approach.
…imagine allowing a div that lets you block out links or sections of a page not to index/follow.
It’s a simple idea, really: use an XHTML MetaData Profile and specially-named classes to indicate whether the content of an element should be considered in some way undesirable to a robot. This has several advantages over creating a new element, particularly that it preserves backward and forward compatibility with HTML and XHTML: it’s just a use of an ability that’s there already. It also has an advantage over specially-formatted comments in that it doesn’t require tag-soup handling of XML documents.
Like XFN, the use of special classnames would be indicated by an HTML metadata profile. It might be desirable to have the classnames namespaced–really just another use of the profile attribute–or otherwise made unique so they could be identified without the profile, but for now I’ll stick with the simple case.
On to some examples. The first shows the use of the profile http://example.org/ignore, which indicates that the content marked with an ignore-content class is not to be used in indexing the page.
<head profile="http://example.org/ignore">
...
<div class="ignore-content">There once was a man from Nantucket...</div>
<p>This is not about <span class="ignore-content">porn</span>.</p>
Next, let’s mark some links as not to be followed; example.{tld} links shouldn’t be followed anyway, but this will reinforce that. The text is fair game, though.
<head profile="http://example.org/ignore">
...
<p class="ignore-links">This is <a href="http://example.com/bogus">a bogus link</a>
and so is <a href="http://example.net/bogus">this</a>.</p>
Finally, we’ll cause an entire page to be ignored, similar to the <meta name=”robots” content=”noindex,nofollow,noarchive”> convention.
<head profile="http://example.org/ignore">
...
<body class="ignore-content ignore-links">
<p>The <a href="http://example.com/">hot girls</a> cooled off with a glass of ice water.</p>
So, there it is. Is this worth following up? Would GMPG be interested? (I think it follows their principles.)
These are a few of the references and sources for this proposal that haven’t been linked above. In no particular order, and subject to expansion:
This item is licensed under a Creative Commons License.