Welcome to WebmasterWorld Guest from 100.26.176.182

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

canonicalizing urls for similar pages

     
3:49 am on Aug 4, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 21, 2002
posts: 1560
votes: 0


http://example.com
http://example.com/
http://example.com/index.html
http://example.com/index.php
https://example.com
https://example.com/
https://example.com/index.html
https://example.com/index.php

Are all these URLs the same in the eyes of google? i.e. can all these domains exist at once, or should they all redirect to https://example.com? If so what are the correct tags to use? concatenation, canonical or 301?

[edited by: Robert_Charlton at 4:13 am (utc) on Aug 4, 2018]
[edit reason] Changed domain.com to example.com to disable autolinking [/edit]

4:33 pm on Aug 4, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 15, 2003
posts:960
votes: 34


I always recommend using a rel="canonical" tag on a site's home page. It's the most important page on your site, so you want to make sure it gets indexed correctly. The two factors that most commonly affect the canonicalization of the home page are the subdomain prefix (ie. 'www.example.com' vs. 'example.com'), and the protocol selection (ie. 'http' vs. 'https'). In addition, your site should automatically respond to requests for the variants with 301 redirects to the canonical URL. So, pick the canonical URL for your home page, install the rel="canonical" tag, set up the appropriate redirects, make sure that your sitemap.xml file and all of your internal links use that URL.
5:13 pm on Aug 4, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15954
votes: 898


Short answer: yes and no.

Slightly longer answer: To a very slight measure, they all count as Duplicate Content. As a general policy, you shouldn't let the same content be reachable by more than one route.

Your example includes things that are very common, things that would never occur in real life--and things that shouldn't occur:

index.html vs. index.php: only one or the other of these can be your actual directory index file, set by the DirectoryIndex directive in Apache, or equivalent in other servers. Google--specifically--will occasionally try “index.html”, through probably only if the site has at some time returned content for this form. They won't try “index.php” unless they have reason to believe URLs in this form actually exist on your site.

/directory/ vs. /directory/index.html: Google may* try both--and humans will link to whatever URL they happened to end up on--so make sure you have an index redirect in place.

/directory/ vs. /directory: Again, Google will try both. If they are real, physical directories, your server will handle the redirect; if they are pseudo-directories created by a CMS or your own rewriting, make sure there is a redirect.

example.com/ vs. example.com: This one (only) doesn’t matter. Human browsers will use or omit the final slash as they see fit; all requests are sent to the same place.

example.com vs. www.example.com: Google will try both. Pick the one you like--there is absolutely no solid evidence that one is “better” than the other--and redirect the other.

http vs. https: THIS IS BAD. If a site is secure, all http requests** should be redirected to https. (If a site is not secure, and doesn't pretend to be, it may theoretically still be reachable by https, depending on whether the server is listening on both ports, but modern browsers will do everything in their power to prevent humans from getting through.)


* I checked logs before posting. Google occasionally asks for index.html on my personal site, which did not have an index redirect at the beginning. It has been several years since they asked for index.html on a site that has always had the index redirect; obviously they must have tried at least once or they wouldn't know not to bother.

** I’ve found by experiment that it’s best to let robots.txt requests go through as-is, because some law-abiding agents get confused. But redirect everything else.
5:17 pm on Aug 4, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4595
votes: 375


Are all these URLs the same in the eyes of google?
Only in the sense that they are all URLs.

Does Google see these URLs as the same site? No. Being able to access or link to the same content at different URLs is not a good thing. The canonical meta-tag is better than nothing, but using only that method only tells Google which page to index. Using only the meta-tag can dilute your site's strength. It may be viewed as duplicate content. A 301 redirect sends all visitors to the preferred page every time and since Google can't crawl those "other" URLs (because they are shown a 301 response and the preferred version of their request), they will concentrate on the one you prefer. It helps eliminate visitor confusion as well.

This is the reason that Google tells us to add each version of a property in GSC. For example, changing your site to use https: without adding that https version to your GSC account can have surprising results if it is not done correctly.

3:39 am on Aug 5, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


A canonical should only be used to when there is sufficient shared content from a duplicate page. Most of the main text content of a duplicate page must also appear in the canonical page. [webmasters.googleblog.com...]
10:25 am on Aug 6, 2018 (gmt 0)

New User from IN 

joined:Aug 6, 2018
posts: 4
votes: 0


The question you have asked makes me little confused at point of time.
But it wasn't actually.
http://example.com
http://example.com/
===Its the Same page. Some of the browsers shows the slash and some without slash.

http://example.com/index.html
http://example.com/index.php
===If in your site there is index.html file in the root folder then that page will run except others like http://example.com/index.php, http://example.com/ etc.
your website will run only with http://example.com/index.html.

https://example.com
https://example.com/
===If your website got ssl it will display with https in Url. Then why we are comparing with http?

https://example.com/index.html
https://example.com/index.php
=== Only index.html run. If you rename or delete index.html then index.php will work.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members