25th September 2007

Googlebot gave up

Feedback LoopThere’s been some rumblings lately around the fact that the DMOZ home page was removed from the index. I don’t pay too much attention to the DMOZ, but in this case it was interesting. I started to follow various threads in the webmastering/SEO community diligently as I’ve seen this “lost my homepage” behavior many times in GWHG. I even made an appeal on behalf of the unfortunate webmasters which was ignored.

Matt Cutts, the true ambassador to the webmaster, came through and answered the question, even taking time from electronic cat gadgets and their pedometers to do so.

Hey all, I dug into this a little bit with the help of a couple crawl folks. It looks like when Googlebot tried to fetch http://www.dmoz.org/, we got a 301 redirect back to http://www.dmoz.org/ . It looks like that self-loop has been going on for several days. We were last able to fetch the root page successfully on Sept. 10th, but from that point on DMOZ was returning these 301-to-itself pages, and after a few days Googlebot gave up on trying to fetch the url.

This makes sense, as Googlebot hit the page it would get a 301 response saying that the new page was the page it hit. When that information got to the normal process that handles 301s it probably just faulted out. Since no other information on a page loads after a 301 (normally) they would have to remove the page as they’d have no data for it.

Here’s the odd thing

When I first heard of this, several days ago, i visited the DMOZ site, and viewed it just fine. Depending on your browser, you can’t view a page that redirects to itself, as this example I’ve set up. Internet Explorer will just sit there and spin, Firefox will eventually give you an error message, and using an online tool will let you know that there is an error.

Pure Conjecture

Matt Cutts has been doing this a long time and probably the best at speaking around issues when he needs to (protecting secrets, towing the company line, etc) but never has there ever been any appearance of being anything less than truthful, so I will by default dispel the idea that he was giving us bad information. So how can I not see a 301 redirect, no one else mentions that the page won’t load ANYWHERE in all the discussions, but yet Googlebot sees the behavior?

  1. All things considered, the simplest explanation is usually the best, perhaps the 301 redirect was briefly shown only when Googlebot happened to visit the site, but not long enough for anyone to take note of it.
  2. They somehow managed to return a 301 response code, but not the redirect. This is something I tried to simulate on many platforms but could not. The browsers and tools I used all seemed to expect the redirect location and either defaulted to one or erred out. Google on the other hand doesn’t actually CRAWL anything, they just hit the page and return back with whatever it saw. I don’t know enough about how the interwebby works to really say if this is a possibility or not, it is after all pure conjecture.
  3. They were cloaking their 301 only showing it to Googlebot (or other bots for that matter) and not to regular users with a browser or not from Google’s IP range.
  4. Perhaps the 301 was referrer based, and when there was no referrer it showed the redirect. Googlebot, since she runs on a predetermined schedule of URLs to crawl would not show a referrer.

Any other ideas that I am too simple to see?

If you liked this post please buy me a beer. Thanks.

This entry was posted on Tuesday, September 25th, 2007 at 11:55 am and is filed under Google, Matt Cutts, SEO, Webmastering. You can follow any responses to this entry through the RSS 2.0 feed. All comments are subject to my NoFollow policy. Both comments and pings are currently closed.

There are currently 5 responses to “Googlebot gave up”

Why not let me know what you think by adding your own comment! All the cool kids are doing it.

  1. 1 MyAvatars 0.2 On September 25th, 2007, Webado said:

    Well, I havent’ tried sending throygh a 301 header without also sending a redirections. But I have sent through 404 headers whiel continuing to serve a page. A robots will get a 404 and not bother with the page’s content. I supspoe the same thing could be done with a 301. Just add this at the top of a page:

    and then continue merrily with the actual content.

    Why they’d do it? Who knows …. self-sabotage? Error?

  2. 2 MyAvatars 0.2 On September 25th, 2007, Webado said:

    Oh pooh!

    My carefully crafted php script got annihilated LOL but my typos were left behind ;)

    Well, it was:

    header ( “HTTP/1.1 301 Moved Permanently” );

  3. 3 MyAvatars 0.2 On September 25th, 2007, John Honeck "JLH" said:
    Webado, I’ve managed to create just such a page. It shows a 301 as the response, but then lets the user see the content. It’s at:

    http://www.jlh-design.com/PHPredirect-broken.php

    Checking the headers shows the 301 response.

    http://oyoy.eu/page/headers/?full=1&url=http://www.jlh-design.com/PHPredirect-broken.php

    Is it possible that DMOZ could have screwed up their code this much?

  4. 4 MyAvatars 0.2 On September 26th, 2007, Webado said:

    It’s funny, I am not getting your page using IE7 (Internet explorer cannot display the page …) but I am getting it in Firefox, with header shown as “301 OK”. Oy-oy believes it’s an actual redirection and times out after 2 such “redirections”. Googlebot is probbaly getting mislead the same way.

    Maybe DMOZ had a redirection to take care of canonical form or index back to root and messed it up for a while. Theres’a always a rookie webmaster everywhere, isn’t there? Been there, done that … LOL

    But look at http://oyoy.eu/page/headers/?full=1&url=http%3a%2f%2fdmoz.org%2f .

    See how for dmoz.org it shows the server as ArtBlast/3.5.5 and after the 301 redirection to the www form it’s now an Apache/2.0.59 (Unix) server… Huh???

  5. 5 MyAvatars 0.2 On September 26th, 2007, John Honeck "JLH" said:
    I’d imagine they need a lot of load balancing servers just to keep up with the editors deleting their competition’s submissions, they sure don’t need them for approving new submissions.

    Then again, it is run by AOL, so that can mean just about anything is possible.

  • Please Support

  • Marquette University

  • Sponsored