Capitalization is splitting my PR!?

Forum Moderators: open

Message Too Old, No Replies

Capitalization is splitting my PR!?

how can I fix this?

Craig_F

1:00 pm on Oct 29, 2004 (gmt 0)

I've never seen this before:

PR3 - [exampledomain.com...]
PR5 - [exampledomain.com...]

what can I do to fix this? I didn't even know this page was/could be a PR5 before I stumbled across this.

Brett_Tabke

1:03 pm on Oct 31, 2004 (gmt 0)

are you sure your server isn't actually spitting out different content on the two urls?

Craig_F

1:52 pm on Nov 1, 2004 (gmt 0)

I just checked to be sure, the content is the same on each url.

Strange, the orignial URL is fine now both versions showing PR5, but now another URL has the problem. With caps it's PR3 without it is PR0.

What else might be causing this?

Sanenet

2:09 pm on Nov 1, 2004 (gmt 0)

Cached PR on the toolbar would be my guess. The upper case value would be new, so it would force the toolbar to double check the PR value.

That's assuming that- the toolbar caches PR values (no reason why it shouldn't) and that it differentiates between lower and upper case in the URL (again, no reason why not).

internetheaven

10:53 pm on Nov 1, 2004 (gmt 0)

PR3 - [exampledomain.com...]
PR5 - [exampledomain.com...]
If at all possible, try and track down the site that is linking to you using the capitalised version (or the un-capitalised, whichever you want rid of) and get them to change it. I can't say I've ever heard of Google indexing a capitalised version of page unless it was linked to using that.

RFranzen

5:25 am on Nov 2, 2004 (gmt 0)

Servers based on Microsoft software suffer from permanent case confusion. File "greenwidgets.htm" is the same as file "GreenWidgets.htm" is the same as "GREENWIDGETS.HTM". On Unix/Linux servers, those would be three unique files. AFAIK, Google has to differentiate assuming the page is being fed by a computer which understands case.

As InternetHeaven suggested, find out who is linking to you with the wrong case. Make sure it isn't you -- quite likely if you are used to the sloppy Microsoft naming paradigm.

One reason I prefer being hosted by Unix/Linux machines is to avoid this very problem. If someone (e.g., you) links to a page of yours, and uses the wrong case, they find out really quick. The link won't work.

-- Rich

webnewton

10:04 am on Nov 2, 2004 (gmt 0)

well this could happen if there had been link building with the caps on.

[exampledomain.com...]
wehn you remove www from above url the PR might change again

[exampledomain.com...]
This is strage but this is how it happens. So its essential that all the link building is done on one identical url.

mark1615

10:54 pm on Nov 2, 2004 (gmt 0)

Are the pages identical other than the capitalization? If so could this be a duplicate content penalty with the older page getting the higher PR?

internetheaven

11:50 pm on Nov 2, 2004 (gmt 0)

If so could this be a duplicate content penalty with the older page getting the higher PR?

Are your statements regarding "penalties" and "older pages" simply in relation to duplicate pages within a site or are you suggesting that those are factors between sites aswell?

There are huge lists of reasons why Google could not implement such on opposing sites but I conceed that it may be possible for Google to make such a bold attack on duplications within a site itself.

From the way the original message is worded though, I would assume that he is talking about the same page and that the concern is that Google has "created" two pages whereas he believes there is only one.

DotBum

3:17 pm on Nov 3, 2004 (gmt 0)

If your server is using apache and you have access to the apache conf files on your server it might be worth looking into an apache rewrite to a 301 redirect.

HTH
DB

encyclo

3:27 pm on Nov 3, 2004 (gmt 0)

The following is a pure guess: as others have hinted, I'll assume you're running your site on a Windows server.

Google stores its index on Linux servers, not Windows. On Linux servers, "Widgets.htm" and "widgets.htm" are two different files, whereas on Windows, they are the same. The fact that your site is hosted on Windows is not as important as the fact that the index is parsed and categorized as case-sensitive file names.

That would git you a duplicate content problem within the cached Google index, even if it doesn't occur on your server - because the retrieved files co-exist in the cache, whereas they don't (and can't) on your server. In your example, as you're talking about directory names father than file names, then it could be that the entire directory is duplicated.

Craig_F

4:06 pm on Nov 3, 2004 (gmt 0)

> On Linux servers, "Widgets.htm" and "widgets.htm" are
> two different files, whereas on Windows, they are the same

I just checked this hosting account and it is Linux, so I'm still at a loss as to why this is happening.

Any chance this could be due to some faulty mod rewrite work?

RFranzen

4:31 pm on Nov 3, 2004 (gmt 0)

PR3 - [exampledomain.com...]
PR5 - [exampledomain.com...]
Craig,
If the server is Linux, then do both forms actually exist on the site? If you can access them both, then there are either two directories, or something at your server is allowing sloppy file paths through.
-- Rich
<add>
Just thought of something else. It could be a Linux server, but it might be using the Microsoft filesystem FAT32 or NTFS. In either case, the filesystem only pretends to differentiate "Wiget-Store" from "wiget-store", but in fact treats them as the same entity. In other words, it is not offering you the full benefits of Linux/Unix.
</add>
[edited by: RFranzen at 4:40 pm (utc) on Nov. 3, 2004]

encyclo

4:36 pm on Nov 3, 2004 (gmt 0)

Any chance this could be due to some faulty mod rewrite work?

It's probably not faulty, just not dealing with the precise situation you have here. If you are using mod_rewrite to make the directory name from a variable, and both the capitalized and non-capitalized word calls the same page on your server, then you've created a situation where your URLs have become case-insensitive (because you're not using "real" file or directory names) even though your server is running Linux.

However, Google is caching plain HTML files, so it can't see the difference between a mod_rewrite URL and a static one (which is the whole idea of using mod_rewrite in the first place).

You may need to adjust your rules to account for this situation - but the complexity depends on what kind of URL rewriting you are doing. Also, as I said, my "theory" is pure guesswork, so it might not be that at all.

Craig_F

5:16 pm on Nov 3, 2004 (gmt 0)

then do both forms actually exist on the site?.

Not sure what you mean. There is only one page that either version of that URL points to.

might be using the Microsoft filesystem FAT32

Not sure about that one, I don't think so, but I'll check.

However, Google is caching plain HTML files, so it can't see the difference between a mod_rewrite URL and a static one (which is the whole idea of using mod_rewrite in the first place).
You may need to adjust your rules to account for this situation - but the complexity depends on what kind of URL rewriting you are doing. Also, as I said, my "theory" is pure guesswork, so it might not be that at all.

I think you might be onto something there, but how can I know for sure?

mark1615

11:16 pm on Nov 3, 2004 (gmt 0)

My understanding - maybe wrong - is that there are pages on the site that are identical but for the capitalization. If that is the case then my comments were to suggest that there might be a "penalty" for the duplication. We have seen this before where some evidence suggests that G doesn't like duplicate pages. Not really sure though because there are always other factors.