|Googlebot crawling // and others|
How do I 301 that?
| 12:55 am on Nov 10, 2005 (gmt 0)|
Jagger 3 gave me a bit of hope that the cannonical problems can be fixed with 301's finally, but the last while I've been seeing Gbot crawling // and mysite.com/?string_that_never_existed and it's getting a 200 response.
Sample from my logs:
18.104.22.168 - - [08/Nov/2005:05:19:47 -0500] "GET /?&config=935&stage=7182 HTTP/1.1" 200 12565 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
22.214.171.124 - - [08/Nov/2005:05:24:02 -0500] "GET // HTTP/1.1" 200 12565 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Should I wait it out or try add even more 301's to htaccess? I'm already 301ing old .shtml to .php and www to non www.
With jagger 3 my inner pages are finally out of supplemental, and the right urls are listed... except my main page. It's still showing url only with the www's. Could it be because of a duplicte penalty from these?
I've checked and nowhere do I link like that, ran a link checker and everything's fine.
Small rant to let off steam... Whatever happened to design for visitors and forget the search engines, it seems lately all I do is try to get Google to index properly. The other SE's can spider my site properly without needing their hands held the entire way, what happened to Google?!?
| 4:22 pm on Nov 11, 2005 (gmt 0)|
Its also crawled pages like mysite.com/page.php/, also despite a 301 from www to non www the main page can only be found with the www's, and its still url only.
Why, and what should I do?
Anyone else seen it crawling nonsense urls?
| 5:30 pm on Nov 11, 2005 (gmt 0)|
I ran into this a few weeks ago. There was a report of this problem from about a year ago, but no resolution was ever posted. The upshot seemed to be it was a Google bug, but who knows? Anyway, I installed a 301 and Google seems to have absorbed it. I haven't spotted the bad requests for a while anyway.
| 5:46 pm on Nov 11, 2005 (gmt 0)|
Someone has fed Google some trash directed at your site.
Who did the feeding I don't know.
But once Google has a url it goes out and asks the server hey server do you have anything for me at url.
If the server answers 200 and here it is, Google is happy and stashes what it got under the url as content to be indexed.
It appears that servers answer 200 when they shouldn't or that they don't come with clear setup instructions or that they have poor default setups.
Up until Google stopped ignoring query strings it made little difference, now you get duplicate content as a result.
With the trailing slash the duplicate content issue again comes into play.
The same problem can occur in other situations as well.
You might wish to visit the Apache forum and go through the postings there.
| 6:41 pm on Nov 11, 2005 (gmt 0)|
Ah, ok.. that's what I thought. Back to the 301's then :S
Thanks for the advice.
| 3:04 am on Nov 15, 2005 (gmt 0)|
Work with the standards. The Net is made up of standards.
Google needs to adapt, not you.
| 4:20 am on Nov 15, 2005 (gmt 0)|
> Google needs to adapt, not you.
That's fine if you're not losing visitors and money because of this problem.
For Apache solution: [webmasterworld.com...]