Forum Moderators: Robert Charlton & goodroi
I have a page on my site http://example.com/page1.shtml
and another http://example.com/page8.shtml
Googlebot is asking for: http://example.com/page1.shtml/page8.shtml and the server is serving page1.shtml, with a http code 200.
So that means I now have a page1.shtml and a page1/page8.shtml with the same contents, right? And this didn't just happen once, but quite a few times, each time with different pages, i.e., page3.shtml/page14.shtml.
Where in the heck is Googlebot coming up with these crazy URLs, and any idea how to do Mod Rewrite to put a stop to it?
[edited by: tedster at 4:15 am (utc) on Oct. 8, 2006]
[edit reason] use example.com, de-link [/edit]
Googlebot is asking for: http://example.com/page1.shtml/page8.shtml and the server is serving page1.shtml, with a http code 200.
Perhaps you have an incorrect relative url in your site that points to page8.shtml (perhaps from page1.shtml)
Then that url would be indexed as such. A base href tag would solve this if it were the issue.
[edited by: CainIV at 4:39 am (utc) on Oct. 8, 2006]
I was wondering if it was some sort of system to look for dups, with the text being the random check for unique content.
Its left me wondering as well. At one time I went through my hyperlinks in case I had screwed one up and included some text in the link, but nope the problem doesn't seem to be at my end :)
Now how wrong is that?
I've disallowed that directory for now, so there won't be any reason to try to index anything. I'm going to redo that section at some point anyway, so might as well let it be dormant until that time.
The pages in this section do have an include file, perhaps that is the problem. Is an Iframe preferred over an include?
The correct Url = page.asp?iData=4&iCat=3&menID=2
Two days ago G re-arranged the query string to page.asp?iCat=4&iData=3&menID=2 which will create duplicate content.
Today it was crawling with the wrong string and 'invented' iCat numbers which don't exist so each request returned a 500 error.
And no, those numbers never existed and the pages with the correct query string have been crawled time and time again so there is no reason for G to suddenly make up its own brand of string.
PS : G has now been crawling uninterupted from 16:40:45 till 17:02:28 while creating 100 errors
[edited by: Staffa at 4:08 pm (utc) on Oct. 8, 2006]