Welcome to WebmasterWorld Guest from 54.159.190.106

Forum Moderators: open

Message Too Old, No Replies

Googlebot Errrors

Looking for "ht" extention instead of "htm"

   
6:02 am on Jan 29, 2003 (gmt 0)

10+ Year Member



I am getting some hits from Googlebot where it is looking for example.ht pages instead of example.htm. I analyzed error pages and reviewed the page where those error links could be picked up by the robot, but there everything seems to be OK.

Any suggestions?

6:13 am on Jan 29, 2003 (gmt 0)

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Incorrect external links, mayhap?
7:30 am on Jan 29, 2003 (gmt 0)

10+ Year Member



Thank you for your reply, but I do not see any backlinks from other sites. Maybe they are with low PR. I really doubt that someone linked to those pages as these pages are with limited information.

I will be checking my access logs from specific sites this month.

7:46 am on Jan 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This happened to my site yesterday also. Googlebot hit a file called deny.htm . The real filename was deny.html . Googlebot came close to violating a robots.txt prohibited file and triggering a spider trap. :)

Actually, Googlebot has been very well behaved lately and I only posted this in the event there is a problem that you should know about. I thought the last character being stripped from the extension was curious because AltaVista's Scooter had similar problems a few months ago (on a massive scale).

7:51 am on Jan 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<added>I just rechecked the logs and Googlebot is still looking for files with a .htm extension. I think you have got a problem there. I'll E-mail the details to search quality(at)google.com</added>
7:53 am on Jan 29, 2003 (gmt 0)

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Thanks, Key_Master. We'll check it out.
9:45 am on Jan 29, 2003 (gmt 0)

10+ Year Member



It might be a Google thing - I have the same. Googlebot has been lokking for *.ht files at my site to day. I will be checking my site, but have had no porblems before.
9:50 am on Jan 29, 2003 (gmt 0)

5+ Year Member



same here, some examples:

64.68.82.46 - - [29/Jan/2003:11:36:17 -0500] "GET /projects.ht HTTP/1.0" 302 275
"-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

should be projects.htm

this also occurs on directories:

64.68.82.51 - - [29/Jan/2003:11:35:13 -0500] "GET /tristan HTTP/1.0" 301 299 "-"
"Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

should be the directory /tristan/

11:25 am on Jan 29, 2003 (gmt 0)

10+ Year Member



Just wanted to confirm that I've encountered a similar phenomenon. Freshbot requested a sizable number of pages and in each case the final character of the URL was missing.
6:19 pm on Jan 29, 2003 (gmt 0)

10+ Year Member



By the way, I've already seen the deep crawl Googlebot (IP begining with 216) on my index page.

Anybody else?

9:54 pm on Jan 29, 2003 (gmt 0)

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Just talked to someone about this. Problem found, and should be solved now. Post an update here if you see any problems from now on. Thanks for reporting this! :)
7:57 am on Jan 30, 2003 (gmt 0)

10+ Year Member



Hi Tristan,

It looks like your "404 - Not found" redirection isn't set up properly as the line:
64.68.82.46 - - [29/Jan/2003:11:36:17 -0500] "GET /projects.ht HTTP/1.0" 302 275
"-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

shouldn't show a 302 header ...

When using the "ErrorDocument 404... " directive in your .htaccess file, don't use full URLs otherwise you'll never return the correct 404 header. Use something like:
ErrorDocument 404 /myerrorfile.htm

The 302 header means "temporarily moved" ... are your files located on a cluster?

Dan

11:32 am on Jan 30, 2003 (gmt 0)

10+ Year Member



If anyone needs another reason why Google is number one, read this thread carefully. When Microsoft is notified of a problem with their software, they generally take months to resolve it, not a few hours. For a company of Google's size, this reponse time is phenomenal.

It's not just the quality of the search results y'know...

11:44 am on Jan 30, 2003 (gmt 0)

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Very nice jetboy_70! ;)

Totally, absolutely correct.

7:19 pm on Jan 30, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Well spotted, hetzeld!

Use ErrorDocument 404 /myerrorfile.htm,
not ErrorDocument 404 [example.com...]

This can cause serious trouble with search engines, the least of which is that your custom 404 page will start showing up in the SERPs.

See the Apache Core Features [httpd.apache.org] documentation for details.

Jim

8:35 pm on Jan 30, 2003 (gmt 0)

5+ Year Member



hey, thanks alot dudes!
gonna fix that in my .htaccess'es now

again: thanks!

8:46 pm on Jan 30, 2003 (gmt 0)

5+ Year Member



I just changed my .htaccess to contain the lines:

ErrorDocument 400 /
ErrorDocument 403 /
ErrorDocument 404 /
ErrorDocument 500 /

(before it was "ErrorDocument 404 [<mysite.com...]

but now when I test the 404 redirection, and for example got to
[<mydomain>.com...]

I get the / page, but the address in my address bar doesn't change
to [<mydomain>.com...]
Is it possible to give the correct 404 headers, AND change the address
to [<mydomain>.com...]

Since this isn't really on topic anymore, feel free to sticky me

thanks alot!

10:57 am on Jan 31, 2003 (gmt 0)

10+ Year Member



Hi Tristan,

You won't be able to change the apparent URL (the one in your browser's address bar) using the ErrorDocument directive.

To achieve this, you could use the mod_rewrite with an external redirect ([R] flag)
I used this a few times for "file not found - 404" errors as this is solved using a quite trivial rule, but I'm not sure that this URL rewrite could be applied for all error codes (especially the 500 code)

Dan

 

Featured Threads

Hot Threads This Week

Hot Threads This Month