Wednesday, 29 November 2006

Google Under the Hood - Part One

What Bot?

There is a lot of confusion surrounding the various Google bots . . . those "spiders" or "crawlers" that mysteriously come to your web site in the middle of the night, when no-one's looking, and poke around to see what's there . . . well, not quite, but they do visit your web sites. However, there is not just one Googlebot, there are, in fact, several bots, each of which has a specific job. If you take a look in your log files or your analytics software, you may well see the tell-tale signs of a visit from one of these creatures. Below is a field guide to these Google animals, their names and purposes.


This bot is a relatively new member of the family of Google bots and goes by the name of "AdsBot-Google." If you are an AdWords advertiser, this bot may well have visited you recently. This bot is one of the major players in Google's new "landing page quality" initiative and its sole purpose is to check the quality of your landing pages. How a bot does that, I have no idea, but it does just that!


A close cousin to the AdsBot is the MediaBot. Whereas AdsBot is for AdWords clients, MediaBot is for AdSense publishers. This bot goes by the name "MediaPartners-Google/2.1" and is the bot that crawls the sites of AdSense publishers in order to determine which ads are relevant to the site's content. So, if you have relevant ads on your site, at some point you've been visited by the MediaBot. There is a degree of confusion among those aware of this bot as to whether or not it plays a role in Google's main search index. I have concluded that it does play a minor role but, in my opinion, a relatively insignificant one. Officially, MediaBot's functionality is completely separate from the main Google search index. However, it seems to be clear that Google uses cached copies of sites retrieved by MediaBot to update Google's main search index but only when GoogleBot would have visted the site anyway. In other words, it just uses the MediaBot cache to save time and to save bandwidth on the host server. Thus, it appears that there is still absolutely no advantage to be gained (at least in terms of Google's organic search results) by having AdSense on your site.
The three remaining bots all serve related purposes, but each has a distinct task to fulfill.
This bot, officially known as "Googlebot-Image", crawls sites in order to index the images found on those sites. Thus, this is the bot that primarily drives Google's image searches. I have read that the ImageBot is a relatively infrequent visitor, perhaps visiting only every 6 months or so. Thus, if you have images that you desperately want to be indexed by Google, you may have to be pretty patient!
This is the little sibling of the main Googlebot, described below and it crawls web pages for the Google mobile index. It goes by the name of "Googlebot-Mobile."


Last, but by no means least, we come to the mysterious Googlebot. This bot is the Gemini of Google's bots, manifesting two separate personas, each with a distinct function: Freshbot and Deepbot.

"Freshbot," as its name suggests, simply looks for fresh content to index, so it is speedy but rather shallow, quickly moving from site to site in search of something new to devour. In sharp contrast to this is its big sibling, Deepbot. If Freshbot is the Christopher Columbus of bots, looking for new places, never before seen, then Deepbot is the Lewis & Clark or David Livingstone, exploring the inner depths of known lands. Thus, Deepbot is anything but shallow and is really the main driving force behind Google's organic search results. Deepbot is an omnivorous "deep crawler" as it tries to follow every link it comes across and download as many pages as it can for Google to index.