Forum Moderators: open
Okay, this one's got me stumped, and the question might be better off in the .htaccess forum, the robots.txt forum, or even the forum forum ... but since the bizarre behavior is specific to googlebot, here goes:
I was looking through my logs and found that almost every googlebot request had the form:
66.249.66.45 - - [07/Jan/2005:22:31:13 -0800] "GET /subdirectory//viewtopic.php?t=2906 HTTP/1.1" 200 5439 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Note the double slash. No other GET requests (including those from other bots, eg. Yahoo) have this format. I've looked through my site's link structure, and the links in this particular forum (phpBB) are all of the form <a href="viewtopic.php...>, and I can't find any double-slashed links in the entire site (using a browser). Yet Googlebot somehow thinks they're there!
What's worse is that Googlebot is ignoring the corresponding robots.txt entries for this site. For example, if the following is blocked:
Disallow: /subdirectory/memberlist.php
I see Googlebot happily getting:
/subdirectory//memberlist.php!
Of course this goes for all of the disallows in robots.txt. And it's *only* happening with googlebot -- other bots have normal GET requests and respect the robots.txt file.
As a temporary fix, I've added the following entries to robots.txt, for example:
Disallow: /subdirectory/memberlist.php
Disallow: /subdirectory//memberlist.php
To try to prevent Googlebot from acting on these strange requests with the double-slashes. But three questions remain...
1) Any ideas on how this might have come to be?
2) Are the oddball entries now in my robots.txt (to defend against the odd googlebot requests) going to cause problems?
3) Will googlebot see duplicate content on my site? Or will it equate /subdirectory/viewtopic.php with subdirectory//viewtopic.php?
Thanks in advance for any and all ideas on this one.
Best regards, Dave.