Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

URL Case Sensitivity and Canonicalization

         

engine

10:44 am on Sep 29, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



John Mueller has presented a short video on case sensitivity, crawling and canonicalization.

NickMNS

3:33 pm on Sep 29, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Clear and concise. Good advise. But the production is a bit cringy.

aristotle

8:21 pm on Sep 29, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



When I created my first website in 2004, I didn't know anything about canonization, probably hadn't even heard of it, and used index.html in all the internal links from other pages to the home page. In other words, I didn't use /index.html or http://example.com/index.html

Later I noticed that some other people had created links to my home page from their sites using http://example.com/ as the url, and was puzzled by it. Eventually I figured it out, but I've never changed the original internal links on that site.

But google's search results show it as a bare http://example.com without anything else.

lucy24

11:45 pm on Sep 29, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But google's search results show it as a bare http://example.com without anything else.

There is no difference, anywhere, at any time, between example.com/ with slash and example.com without slash. Google knows this, and so does your browser. The trailing slash only matters when it is example.com/dir vs. example.com/dir/. (The Applebot is one of a handful of robots that really wants everything to be extensionless, so it is constantly picking up directory redirects. This exasperates me.)

One of my pages dates back to when I had URLs in visible “index.html”. (I changed them all in, I think, 2012.) To this day I get the occasional redirected human because the index.html form is mentioned in one or two message boards--which I don't belong to, so I can’t hunt down the administrator and ask for an edit.

I’ve still got a scattering of pages with names in CamelCase. It’s a mismatch with the rest of the site, but it is absolutely not worth the trouble of renaming them when nothing else is changing. No, bingbot, not even when you persist in requesting "camelcase.html" and getting a 404.

aristotle

12:55 am on Sep 30, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One of my pages dates back to when I had URLs in visible “index.html”. (I changed them all in, I think, 2012.) To this day I get the occasional redirected human because the index.html form is mentioned in one or two message boards

Seeing that led me to immediately wonder why these need to be redirected. Most likely I just need to think about it some more, but at the moment I'm confused

phranque

7:44 am on Sep 30, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Seeing that led me to immediately wonder why these need to be redirected. Most likely I just need to think about it some more, but at the moment I'm confused

index.html (or index.htm, or index.cgi, etc) are typical default directory index document names, but the specification thereof is a technical implementation detail and is irrelevant to the content.
the fact that there are several "typical" names for the default directory index document is a strong indication that you shouldn't expose your technical implementation in public urls and further you shouldn't shackle your content's url to a specific technical implementation that may change.

aristotle

10:34 am on Sep 30, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



phranque -- yes, what you said is correct for "best practices".