Forum Moderators: open

Message Too Old, No Replies

Google indexed the same page twice.

Cap and no cap.

         

twilight47

6:45 pm on Oct 13, 2003 (gmt 0)

10+ Year Member



Google recently indexed one of our main pages with both:

www.widgets.com/Blue.html

and

www.widgets.com/blue.html

I assume this may be due to a link that has the page incorrectly. Obviously Google sees this as a different page, but will Google correct this on their own?

ciml

8:40 pm on Oct 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Those are different URLs, and there's no reason why they can't have different content (although this may change now that Google treats "/index.html" in links as "/").

If Google finds identical content, then expect the two URLs to be merged in Google. If the content changes between the times the two URLs are fetched (eg. an upload, or dynamic content) then expect them to remain separate.

I would be inclined to use lower case for all links, and if you're woried maybe to issue HTTP status 301 redirects from /Blue.html to /blue.html

twilight47

9:46 pm on Oct 13, 2003 (gmt 0)

10+ Year Member



Those are different URLs, and there's no reason why they can't have different content (although this may change now that Google treats "/index.html" in links as "/").

The problem with this is only page that exists is
www.widgets.com/Blue.html.

The other doesn't exist...except in Google's index.

ciml

11:09 pm on Oct 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You mean that /blue.html returns a 404 error?

If the server header [webmasterworld.com] is 404 then /blue.html should disappear as soon as it's been crawled. If it's /robots.txt excluded then it will remain whether it returns 404 or not.

twilight47

1:06 am on Oct 14, 2003 (gmt 0)

10+ Year Member



It doesn't get a 404 error message. It goes to the straight to the /Blue.html when you imput the /blue.html and there is no 301 redirect or /blue.html page on the server. I can't figure it out.

mack

1:13 am on Oct 14, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



has your server perhaps been configured to handle cases in this way?

I think this can be done in apache using httpd.conf

Mack.

BigDave

1:25 am on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I bet the problem is that it is a windoze server. Microsoft can't tell the difference between upper and lower case in file names.

mack

1:36 am on Oct 14, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Good point BigDave. In order for Googlebot to find the page with the wrong case someone must be linking to you using the wrong case.

It might be worth doing a 301 to point it, in the right direction.

Mack.

ciml

10:31 am on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry twilight47, "The other doesn't exist...except in Google's index" does not agree with "It doesn't get a 404 error message. It goes to the straight to the /Blue.html when you imput the /blue.html and there is no 301 redirect or /blue.html page on the server".

Leaving the server technology to one side (although I agree with mack and BigDave), when a WWW agent (browser, robot, whatever) connects to /blue.html it is returned a page; therefore it does exist. I'd be willing to wager that /BLuE.hTmL will exist too, if it's requested.

On other servers, /Blue.html, blue.html and /BLuE.hTmL might serve different content so Google should not assume that they're the same (even thought they might make that assumption one day).

When you use Brett's server headers [webmasterworld.com] tool for /blue.htm, do you see a "Content-Location" header value for /Blue.htm? I don't think it'll help now, but it's good practice.

My hope is that the Content-Location will one day be used by search engines to avoid this problem without URL guessing-games.