Forum Moderators: open

Message Too Old, No Replies

O'Reilly & Associates Website V1.1h and canonical issues

         

doughayman

2:17 pm on Feb 19, 2007 (gmt 0)

10+ Year Member



Hi guys,

Slightly off-topic, but I thought that this was the best home for this question.

I seem to have canonical capitalization issue with Googlebot and my Windows 2000-based server, and am looking for suggestions to remedy.

I am running an old (but stable) webserver (O'Reilly and Associates Website Professional V1.1h) under a Windows 2000 environment.

I have a subdirectory architecture under my domain, with a multitude of sites under my domain (this has been in existence for 10 years or so). The structure looks like:

www.domain.com/sub1/HomePageName1.htm
www.domain.com/sub2/HomePageName2.htm
.
.
etc.

Although Googlebot usually picks up these home page names with appropriate capitalization (e.g., HomePageName1.htm), on occassion, Googlebot fetches the page without capitalization (e.g., homepagename1.htm). Although I do NOT have a separate page in this folder without the capitalization, Googlebot believes that it finds it OK, since I see a return code of 200.

Whenever Googlebot fetches the page without capitalization (happens may be once every 2 months or so), the ranking for this page gets severely impacted AND the page rank for HomePageName1.htm goes from its normal PR3 to PR0. This is obviously due to some sort of Duplicate Content Penalty, as Google believes that I have 2 pages with identical content, and under Windows 2000, I can only have one filename, devoid of case sensitivity.

Moving over to IIS or Apache is not an option for me. Does anyone have a server solution for me, in this environment, that would prohibit Google from fetching these pages with case-sensitivity in mind?

Thanks in advance.

centime

2:31 pm on Feb 19, 2007 (gmt 0)

10+ Year Member



I am not familiar with the web server you say you're running but I think you should consider the following

1, Google page rank, shown to the public as Toolbar page rank is in no way dependent on your server setup.

2, TBPR & PR are based on the incoming links to each particular webpage

3, TBPR which is exported by google from the internal PR about 4 times a year can show different values depending on what time you open your browser as each time you do , the google toolbar could connect to a different datacenter or server. Not all the datacenters are fully updated at anytime, so you might sometimes see previous values for TBPR

4, Microsoft windows server/workstation O/S products use case insensitive directory paths, an some believe this can cause some dup content issues, however the impact of this can be lessened by consistently using lowcase urls and directory paths in your web pages an when requesting inbound links

5, Why aren't you using II6 or II5, they come free with Win 2k & 2003 server

Cheers

doughayman

2:43 pm on Feb 19, 2007 (gmt 0)

10+ Year Member



The Google PR is anecdotal, and is really not relevant to the
problem.

4) Yes, Windows is case insensitive, and I agree (hindsight), all
URL's and directory paths should be lowercase. However, I
have URL's which have mixed upper/lower-case, which are
already indexed into Google that way. Removal of these pages,
and subsequently changing the URL filename nomenclature to
lowercase is NOT an option. Google will penalize you for up to
6 months on removal of a page/re-entry. This is well known, and
I cannot afford to go this route.

5) I have lots of custome-writeen CGI apps, tailored to this
webserver, and having to move to a new webserver, and re-write
these apps, would incur a serious "time" penalty.

I guess I'm looking for a cheap an easy way to avoid Google spidering me in all lowercase, when in fact, my existing URL naming conventions make use of both upper/lower case.