XMDP-style Robot Profile

I’m thinking that HTML should have an element that basically says content within this section may contain links from external sources; just because they are here does not mean we are endorsing them.

I’m not convinced an HTML extension is necessary or desirable. Instead, I think this might be better handled through a back door approach.

…imagine allowing a div that lets you block out links or sections of a page not to index/follow.

It’s a simple idea, really: use an XHTML MetaData Profile and specially-named classes to indicate whether the content of an element should be considered in some way undesirable to a robot. This has several advantages over creating a new element, particularly that it preserves backward and forward compatibility with HTML and XHTML: it’s just a use of an ability that’s there already. It also has an advantage over specially-formatted comments in that it doesn’t require tag-soup handling of XML documents.

Like XFN, the use of special classnames would be indicated by an HTML metadata profile. It might be desirable to have the classnames namespaced–really just another use of the profile attribute–or otherwise made unique so they could be identified without the profile, but for now I’ll stick with the simple case.

On to some examples. The first shows the use of the profile http://example.org/ignore, which indicates that the content marked with an ignore-content class is not to be used in indexing the page.

<head profile="http://example.org/ignore">
...
<div class="ignore-content">There once was a man from Nantucket...</div>
<p>This is not about <span class="ignore-content">porn</span>.</p>

Next, let’s mark some links as not to be followed; example.{tld} links shouldn’t be followed anyway, but this will reinforce that. The text is fair game, though.

<head profile="http://example.org/ignore">
...
<p class="ignore-links">This is <a href="http://example.com/bogus">a bogus link</a>
and so is <a href="http://example.net/bogus">this</a>.</p>

Finally, we’ll cause an entire page to be ignored, similar to the <meta name=”robots” content=”noindex,nofollow,noarchive”> convention.

<head profile="http://example.org/ignore">
...
<body class="ignore-content ignore-links">
<p>The <a href="http://example.com/">hot girls</a> cooled off with a glass of ice water.</p>

So, there it is. Is this worth following up? Would GMPG be interested? (I think it follows their principles.)

These are a few of the references and sources for this proposal that haven’t been linked above. In no particular order, and subject to expansion:

Creative Commons License This item is licensed under a Creative Commons License.

First Mac tip

I got rid of the Internet Connect icon!

While trying to set up my Mac’s VPN connection to work, I configured several L2TP and PPTP connections in OS X’s Internet Connect control panel. I later discovered that they don’t work with the Cisco VPN software we use at work (nasty words to Cisco) and so I removed them, but the toolbar icon didn’t disappear.

I knew the icon wasn’t there originally, so I was fairly sure it could be gotten rid of. Because it didn’t happen by itself, I decided to experiment.

Menus? Nope.

Preferences? No.

Dragging the icon? No… but hold on, the Option key seems to be a favourite of the single-mouse-button crowd.

How about Option-dragging the icon? Hey, it moved!

How about Option-dragging it to the trash? Bingo, no more icon!

So, thumbs down to Apple for not making the icon disappear in the first place, but thumbs up for making it at least somewhat intuitive to get rid of. (Although describing it as intuitive may be a stretch… others have had the same problem and I haven’t been able to find anyone else’s answer.)