Google off

Google on! Google off! Google on, Google off… the Googler!

But seriously.

ckaminski of the Web Standards Project recently highlighted the redesign of AT&T’s home page and developer Joe D’Andrea’s informative discussion of the project. One thing D’Andrea doesn’t mention, however, is the presence of the two comment-style directives <!--googleoff: index--> and <!--googleon: index-->.

Question answered: Those directives are not used by Google proper but rather our own slice-o-Google*, a mighty spiffy search appliance. Thanks Joe! Wonder why this is only an option on the Google Appliance and isn’t turned on web-wide?

I’ve seen googleoff/on on the aforementioned AT&T site and on several CBC news items (which also use the argument all). It’s also visible on a number of disparate sites because they have improperly-formatted comments; however, I believe this indicates that there are many more sites that use it properly.

I wrote about the benefits of a content-level robot exclusion scheme in August. googleoff/on is quite similar in concept, although it’s outside of the markup and thus requires a separate parser for XML-based content. (On the tag-soup web, though, it’s just another special case.) There’s also some comparison to be made to rel="nofollow"; it’s interesting to note that googleoff/on: follow could be used to accomplish exactly the same thing.

Whether it works or not is another question. I’m inclined at the moment to say it doesn’t: a search for the CBC article above including a term that is only present in the googleoff: index or googleoff: all blocks still finds the page, while a control request that includes a word not in the page at all returns no hits.

Still, it’s worth an experiment, so In the hopes that this will someday make its way to the web at large, I’ve added the directives to the comment forms, navigation links, and other non-content sections of my weblog pages. Currently a search for occurrences of petroglyphs wordpress on this domain returns only 3 results, and it’s my hope that that number won’t increase as Google respiders all the pages that now announce the presence of WordPress.

If by some chance the content exclusion works is ever turned on, it might be interesting to try several other tests. The index argument is a keyword from the robots meta tag, so perhaps follow would also work. (I’m presuming that all takes the place of index,follow and that the mere presence of googleoff implies the no prefix to those arguments.) Also, because the comments are external to the markup, it should be possible to nest or otherwise intertwine them.

rel="nofollow" broken

A few months ago, Google and several other search and aggregation companies introduced rel="nofollow" tagging. Rather than rehash the arguments over that, I’ll simply point to Lachlan Hunt’s cogent analysis of nofollow and add one point: the nofollow relationship should have been defined in a metadata profile as an additional link type.

Despite the implementation, at least Google et al are well-intentioned: comment spam is harmful and needs to be stopped. Only a few attempts have ever made it through my various blocking mechanisms and appeared on my weblog, but the bandwidth the spammers eat up trying to find pages they can exploit is double or triple that of the legitimate users of this site. Perhaps if there were a way to prevent them from finding comment forms in the first place….

Blocking WordPress wp-content listings

This Google search for wp-content directory listings should be of interest and concern to all of those folks who have recently set up a WordPress weblog. In short, it shows that everything in those directories—themes, plugins, images, whatever—is accessible from a single common point of reference. If this worries you, you might want to limit access: for those with Apache just add:

Options -Indexes

to a .htaccess file in that directory. Other directories from a standard install that may be of similar concern are wp-images and wp-includes; both may be restricted in the same fashion. The standard wp-admin directory includes an index.php file that will generally be used in place of a generated index, but any subdirectories will be open to the public so it might not be a bad idea to block it too. If you don’t want to worry about every individual subdirectory that might appear, add the line above to a .htaccess file in your main WP directory.

All new

Yep, it’s different. I’m finally a WordPress user. The only reason it’s taken this long is that I’ve had the silly idea that I’d have enough time to redo the original Petroglyphs stylesheet… which, as you can see, I’ve given up on for the time being.

There may be glitches as I redo some of the MovableType plugins I was using. Please bear with me.