homepage Welcome to WebmasterWorld Guest from 54.147.196.159
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Changing case in filenames shows different pagerank green bar
webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3628552 posted 8:00 pm on Apr 16, 2008 (gmt 0)

Maybe this has been covered, but I can't seem to find it anywhere. I also am wondering if I should revert back since things seemed to have changed... Anyway...

I was perusing the forum and I came across some posts that talked about wheter or not to use upper case of lowercase i filenames. For example... Red-Widget.html or red-widget.html. I could have sworn I saw that it was suggested to user lower case. I am not sure why. this was just a quick blip in a "how to name files" type thread.

So why not? I usually always use lower case, but I do have a site in which I named the files with upper case filenames.

So today I am doing some work and noticed that the green pagerank bar was greyed out on all the files that I had renamed. I am running IIS, so filenames really don't matter from a platform point of view, but what's up with the grey bar? Did I do something I shouldn't have done? I am concerned because one of the filenames I renamed is a forum page that pretty much has 13,000 pages indexed in G. All of these pages now show a grey bar. If I go to the address and physically change the filename back to upper case... Whalla! The green bar comes back and shows the normal rank (well, as normal as that green bar can be, if you know what I mean ;-).

So I checked about 20 key phrases and words to see if I actually had lost any ranking. Nope, everything seems normal. What was #1 last week is still #1 today. What was #11 last week is still #11 today. What was #249 is still that darn ol' #249 :-0

So, does the case really matter? Am I hosed and should I switch it back? Do I sit tight and see what happens? Do I completely get out of the SEO biz and sell shoes?

Thanks for any input!

 

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3628552 posted 5:18 am on Apr 17, 2008 (gmt 0)

Filenames are case-sensitive - and that's not just Google's decision, that's the way many operating systems work by default. From the W3C website [w3.org]:

URLs in general are case-sensitive (with the exception of machine names). There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive.

So in particular page.html and Page.html are considered different urls, because technically they can serve different content. Same for example.com/directory/ and example.com/Directory.

If your server is not set to be case sensitive (such as Windows IIS in its default configuration) you need very much to be careful with the case of any file paths you establish. I feel the best practice is always and only to use lower case.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3628552 posted 12:55 pm on Apr 17, 2008 (gmt 0)

Okay, I am now using lower case. The question is should I revert back? There seems to be no affect on the positions of these pages in the SERPs, only that green bar.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3628552 posted 2:10 pm on Apr 17, 2008 (gmt 0)

In fact, and I am sure this is not related, a couple search terms actually improved by a couple of spots.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3628552 posted 1:56 pm on May 23, 2008 (gmt 0)

Good thing this was a throw-away site. In the past week, all the files that were changed to lower case either went down by 3 or 4 spots or disappeared completely. They stayed gone until yesterday when they popped back into the SERPs again, usually about 15 spots lower then they were before.

I should have listened to you tedster!

Conclusion I have is that changing the case on filenames is pretty much like changing the fileneame itself, or pretty close. I changed all the case of these filenames back to what they were originally... we'll see what happens now. The great experiment continues on!

Isn't experimenting with Google grand?

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3628552 posted 6:26 pm on Jul 8, 2008 (gmt 0)

Okay...

I am resurrecting this post. I have a question. Some of the pages on this site are now listed twice as duplicate content in GWT. Since Google considers these pages as different URLs, do you think it is possible to remove just the lowercase while leaving the uppercase intact?

http://example.com/Test.html - keep
http://example.com/test.html - remove

This could get frightening, eh?

[edited by: tedster at 7:45 pm (utc) on July 8, 2008]
[edit reason] use example.com [/edit]

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3628552 posted 7:57 pm on Jul 8, 2008 (gmt 0)

changing the case on filenames is pretty much like changing the fileneame itself

It's exactly like changing the filename, except for the non-standard handling of case on some server software.

do you think it is possible to remove just the lowercase while leaving the uppercase intact?

I'm assuming that you're using an IIS server, correct? The only solid way I know of to address the issue is with a third party plug-in called ISAPI Rewrite. It gives an IIS server functionality that mimics .htaccess on Apache. However, it's not free - and the cost may be more than you want to spend on a "throw-away" domain.

The challenge you face is how to make your server case sensitive so you can request a url removal using either robots.txt, a meta robots tag, or a 404 response? In other words, your server needs to respond in a case-sensitive manner to requests for those urls, or else you cannot request removal without nuking all case variations, too.

The other alternative would be to make sure all your internal linking is lower case. Then you can hope that Google sorts it out over time, at least to the point of no ranking problems for you.

Google is definitely aware of the mixed case issue. Many large corporations use IIS and have this same challenge. So Google does try to work with it. But as with any similar situation, proactively getting your server technology straightened out is the best approach.

[edited by: tedster at 2:24 pm (utc) on July 9, 2008]

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3628552 posted 2:18 pm on Jul 9, 2008 (gmt 0)

Well,

I think that if Google, in it's infinite wisdom, would consider upper and lower case as separate files, they should make an easy way to remove these.

I did do an experimant with the Analyze robots.txt on GWT, though, that I found very interesting. I may try this as an experiment to see how it goes.

It seems that using robots.txt does follow case, at least as far as the analyzer goes. I added this to the robots text file and ran two URLs to see what would happen...

User-agent: Googlebot
Disallow: /Test.html

And then to Test URLs against this robots.txt file, I ran...

http://example.com/test.html

and

http://example.com/Test.html

Test.html was Blocked by line 22: Disallow: /Test.html

and

test.html was allowed.

I would assume that this would mean I could use the robots.txt file to block the offending URLs with the changed case?

And if that is the case (no pun intended), it states on the removal page...

To block a page or image from your site, do one of the following, and then submit your removal request:

* Make sure the content is no longer live on the web. Requests for the page or image you want to remove must return an HTTP 404 (not found) or 410 status code.

* Block the content using a meta noindex tag.

* Block the content using a robots.txt file.

Okay, now the big question... if I add the upper case file to robots.txt file and Google honors case in the robots.txt file, will the removal tool also honor case and remove just the upper case file? Has anyone ever tried anything like this? Does anybody care? ;-)

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3628552 posted 2:38 pm on Jul 9, 2008 (gmt 0)

I think that if Google, in it's infinite wisdom, would consider upper and lower case as separate files

We should be very clear that case sensitivity in the filepath was not Google's decision - not in any way. That standard goes back to the earliest days of Internet technology, and the W3C quote I included above underscores the issue. Microsoft's decision to ignore that standard, and to continue to ignore that standard even today, is the problem.

Not only are case variations in the filepath technically different, there are even websites that serve different content for those differing URLs. Google wrote their software based on the standard, because they had to.

At this point in time, I did expect that Google would have the problem 100% licked. But I was assuming that the IIS server's content for case variation would be identical, down to the last bit - and that is apparently not a true assumption, either.

I've never done the robots.txt experiment, but for an IIs server, that's the only possibility. It "should" work out, as long as you don't have to remove a bajillion urls via that route. But still, it would still not relieve you of the need for continued vigilance, for as long as your site is live.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3628552 posted 2:42 pm on Jul 9, 2008 (gmt 0)

I would go for all lower-case URLs as being the only valid versions.

In Apache it is very easy to block URLs that contain any characters in one case, such as when those are the key to a database entry.

It's a couple of lines in .htaccess. I have no idea if IIS has anything similar, other than installing ISAPI Rewrite or somesuch...

.

In .htaccess this would fail (403) any request with an upper case letter in it:

RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule .* - [F]

In .htaccess this would 404 any request with an upper case letter in it:

RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule .* /this-file-does-not-exist [L]

Code is untested. Use at own risk. There may be other factors that I haven't considered. Yada. yada. yada.

[edited by: g1smd at 2:51 pm (utc) on July 9, 2008]

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3628552 posted 2:45 pm on Jul 9, 2008 (gmt 0)

Not natively, at least not within the limits of any sane amount of effort. Only with ISAPI Rewrite does it become easy. Strictly speaking it "could" be done without the third party plug-in, but you'd have to develop your own parallel version of the plug-in's code!

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3628552 posted 2:50 pm on Jul 9, 2008 (gmt 0)

I understand the IIS thing and I whole heartedly agree with you. This did start out as an experiment several months ago. I changed the case on several files to see what the effect would be. I now have duplicate content. Step 2 of the experiment is to see if I can get rid of it. We are not talking about many pages... just 7. I usually name all filenames in lower case. I am going to add one file to the robots.txt with the offending case and then do the same with the removal tool and see what happens. I'll keep you informed.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3628552 posted 2:59 pm on Jul 9, 2008 (gmt 0)

Are these static pages or dynamic?

If dynamic (or if there is a way to prepend a script to the beginning of a static page), then you could add a few lines of code that simply test what the requested URL was, and issue a redirect to the right URL if the requested URL contained anything in the wrong case.

Basic Logic:

$URL_requested = {$SERVER_VARIABLE_Request_URI};
$URL_lower_case = string_to_lowercase($URL_requested);
IF ($URL_requested !== $URL_lower_case)
THEN
{send HTTP_HEADER: "URL: $URL_lower_case";
send HTTP_HEADER: "STATUS: 301 Moved";};

That would also mean that you retain traffic that comes to your site when that traffic is requesting the wrong URL.

oddsod

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3628552 posted 3:54 pm on Jul 9, 2008 (gmt 0)

I had exactly this problem with a site sometime ago. I pulled a lot of hair out and spent a lot of time researching the options before deciding to go with ISAPI rewrite. Then I discovered the hosting company wouldn't allow it. In the end I realised that I didn't really need to be on a Windows host anyway and just moved the site to somewhere I could use a good .htaccess.

Life has been a lot easier since :)

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3628552 posted 4:01 pm on Jul 9, 2008 (gmt 0)

For /CaseChallenges/ on Windows using ISAPI_Rewrite...

#convert all upper case to lower
RewriteCond URL ([^?]+[[:upper:]][^?]*).*
RewriteHeader X-LowerCase-URI: .* $1 [CL]

RewriteCond Host: (.+)
RewriteCond X-LowerCase-URI: (.+)
RewriteRule [^?]+(.*) http\://$1$2$3 [I,RP]

You can make Windows behave just like Apache if you have the right tools at hand. The above is for Version 2.0 of ISAPI_Rewrite. Version 3.0 utilizes .htaccess which I'm still learning. I like 2.0 because I've worked with it for years and can easily cut and paste different rules when needed.

rocco

10+ Year Member



 
Msg#: 3628552 posted 4:05 pm on Jul 9, 2008 (gmt 0)

g1smd's way is the way to go. if this site is of any value then you should use 301 redirects and not yet use the robots.txt.

one of my sites had to have their paths 2 times completly rewritten, everything done with 301 and no problem. the site gets 50,000 visitors a day from google only, so this is a real advice.

1.) Make sure everything works in the case you want it to work (eg lower-case)
2.) Add for every upper-case page a 301 redirect to the lower-case file.
3.) Try to turn on case-sensitivness on your server, so you can avoid accidents.
4.) Do not use robots.txt for this issue and do not request removal.
5.) Depending on pagerank this will be worked out in 1-3 month in my experience.

Please note, like others in this thread, I am not using a Microsoft webserver, so I cannot help you with code. An advice is, before you start, do some research on this topic because this knowledge is of major importance.

webdude

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3628552 posted 7:17 pm on Jul 9, 2008 (gmt 0)

Well, this is a good piece of information to know. It worked like a charm and took less then 3 hours to remove the uppercase filename. The lowercase stayed intact and is still listed in the SERPs. So it looks like you can remove filenames according to case... who'd a thunk it, eh?

[edited by: webdude at 7:54 pm (utc) on July 9, 2008]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3628552 posted 7:37 pm on Jul 9, 2008 (gmt 0)

It should work like that, but I would never risk it. The problem may still return if there are external sites still promoting the incorrectly-cased URL.

My preferred long-term solution is to 301 redirect to the correct URL, so that I retain traffic coming to the site from that incorrect listing, while that incorrect listing still exists.

By adding the redirect, I also get to keep all the traffic from other sites with the wrong case in their link out, traffic from bookmarks with the wrong case, and so on.

At the same time, the redirect forces the correct URL to show up in the browser window, so halting the spread of the wrong URL being cut and pasted to other sites as new links.

Additionally, for a site that currently has the "wrong" URL, implementing the redirect shows that site what the correct URL actually is, should they ever run a tool like Xenu LinkSleuth over their site. Should you simply return a 404 for the incorrect URL, they might just delete your link, rather than correct it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved