Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- Canonical URL Issues - including some new ones


jdMorgan - 7:26 pm on Aug 12, 2008 (gmt 0)


The problem is not whether FQDN URLs or any of these other variations function. The problem is that they create more that one URL for a unique resource. This thread only makes sense in that context, and this is the reason for the thread title.

Google and most of the other search engines use back-end processes to "de-duplicate" or "canonicalize" URLs, and often infer the "correct" single URL for a resource. But we have many cases posted here of webmasters who say that the "wrong URL" is showing up in search results. Or they post that their home page in "www" is PR4, while their non-www home page is PR3, indicating that PageRank has been 'split' across these two domains.

They're not understanding that preventative server-side measures are required.

Then there's the issue of exploits. Given that a site could potentially have HTTP/HTTPS, www and non-www, trailing dot on the hostname, trailing port number on the hostname, "index.php" vs. "/", use 'virtual subdirectories' for SEO-friendly keyword-in-URL URLs while not enforcing a particular 'closed set' of values for that URL-path-part, and then adding practically-infinite query string variations, the number of URLs that could be used to reach a particular resource can indeed grow to be practically infinite -- limited only by the server settings which limit the length of the HTTP request header. A malicious competitor could potentially dilute the PageRank of your important pages with a bit of "creative linking."

In cases where large numbers of URLs resolve to a single resource, there are several dangers:

  • The "wrong" URL listed in search results.
  • PageRank/Link-popularity split among various URLs, reducing the rank of 'the' page.
  • So-called "duplicate-content penalties" --really a filter, IMO-- applied to resources with "too many" URLs.
  • User confusion (e.g. broken on-page visited-link highlighting).

    So the whole point of this thread (and several others that Tedster cited above) is that one resource (e.g. one "page") should have only and only one URL by which it is accessible, and all other "valid" variants of that URL should be redirected to that single canonical URL.

    Jim


    Thread source:: http://www.webmasterworld.com/google/3718246.htm
    Brought to you by WebmasterWorld: http://www.webmasterworld.com