Googlebot gave up
Warning: copy() [function.copy]: Filename cannot be empty in /home/jlhdes/public_html/wp-content/plugins/mytube/mytube.php on line 220
There’s been some rumblings lately around the fact that the DMOZ home page was removed from the index. I don’t pay too much attention to the DMOZ, but in this case it was interesting. I started to follow various threads in the webmastering/SEO community diligently as I’ve seen this “lost my homepage” behavior many times in GWHG. I even made an appeal on behalf of the unfortunate webmasters which was ignored.
Matt Cutts, the true ambassador to the webmaster, came through and answered the question, even taking time from electronic cat gadgets and their pedometers to do so.
Hey all, I dug into this a little bit with the help of a couple crawl folks. It looks like when Googlebot tried to fetch http://www.dmoz.org/, we got a 301 redirect back to http://www.dmoz.org/ . It looks like that self-loop has been going on for several days. We were last able to fetch the root page successfully on Sept. 10th, but from that point on DMOZ was returning these 301-to-itself pages, and after a few days Googlebot gave up on trying to fetch the url.
This makes sense, as Googlebot hit the page it would get a 301 response saying that the new page was the page it hit. When that information got to the normal process that handles 301s it probably just faulted out. Since no other information on a page loads after a 301 (normally) they would have to remove the page as they’d have no data for it.
Here’s the odd thing
When I first heard of this, several days ago, i visited the DMOZ site, and viewed it just fine. Depending on your browser, you can’t view a page that redirects to itself, as this example I’ve set up. Internet Explorer will just sit there and spin, Firefox will eventually give you an error message, and using an online tool will let you know that there is an error.
Pure Conjecture
Matt Cutts has been doing this a long time and probably the best at speaking around issues when he needs to (protecting secrets, towing the company line, etc) but never has there ever been any appearance of being anything less than truthful, so I will by default dispel the idea that he was giving us bad information. So how can I not see a 301 redirect, no one else mentions that the page won’t load ANYWHERE in all the discussions, but yet Googlebot sees the behavior?
- All things considered, the simplest explanation is usually the best, perhaps the 301 redirect was briefly shown only when Googlebot happened to visit the site, but not long enough for anyone to take note of it.
- They somehow managed to return a 301 response code, but not the redirect. This is something I tried to simulate on many platforms but could not. The browsers and tools I used all seemed to expect the redirect location and either defaulted to one or erred out. Google on the other hand doesn’t actually CRAWL anything, they just hit the page and return back with whatever it saw. I don’t know enough about how the interwebby works to really say if this is a possibility or not, it is after all pure conjecture.
- They were cloaking their 301 only showing it to Googlebot (or other bots for that matter) and not to regular users with a browser or not from Google’s IP range.
- Perhaps the 301 was referrer based, and when there was no referrer it showed the redirect. Googlebot, since she runs on a predetermined schedule of URLs to crawl would not show a referrer.
Any other ideas that I am too simple to see?

