homepage Welcome to WebmasterWorld Guest from 54.234.141.47
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 64 message thread spans 3 pages: 64 ( [1] 2 3 > >     
Canonical Tag vs. Block in Robots.txt
Planet13

WebmasterWorld Senior Member planet13 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4302977 posted 3:48 pm on Apr 23, 2011 (gmt 0)

Hi there, Everyone:

The product pages on my ecommerce web site are (by default) available via multiple versions of the URL (namely, a long query string version, and a short version).

For years, I have simply blocked the long query string URLs via the robots.txt file (The long query string URLs have a "virtual" directory in the URL, so I just block that virtual directory).

But with "trust" being such an important issue after the Panda updates, I wonder if it might be better to unblock those URLs in robots.txt and just let the canonical tag take care of it.

In webmastertools, under crawl diagnostics, it lists something like 700 URLs blocked by Robots.txt, and if it is something that is being measured by google, I can't help but think that they are somehow using that information for something.

 

Planet13

WebmasterWorld Senior Member planet13 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4302977 posted 7:45 pm on Apr 23, 2011 (gmt 0)

Hmmm... according to google, it looks like canonical link tag might be the way to go. From the following post at [googlewebmastercentral.blogspot.com...]

One item which is missing from this list is disallowing crawling of duplicate content with your robots.txt file. We now recommend not blocking access to duplicate content on your website, whether with a robots.txt file or other methods. Instead, use the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. If access to duplicate content is entirely blocked, search engines effectively have to treat those URLs as separate, unique pages since they cannot know that they're actually just different URLs for the same content.


Anyone have any opinions before I drop the robots.txt disallow and rely on the canonical tag?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4302977 posted 7:58 pm on Apr 23, 2011 (gmt 0)

The 301 redirect is the way to go. This forces the URL in the users browser address bar to be "right", and this limits the amount of new links appearing that point to "wrong" URLs.

The canonical tag is less reliable. In particular, user's browsers will continue to show incorrect URLs and you may still gain more new links to the "wrong" URL.

Planet13

WebmasterWorld Senior Member planet13 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4302977 posted 8:13 pm on Apr 23, 2011 (gmt 0)

Hi there, g1smd:

thanks for the advice.

So you don't think that google will suspect any trickery if I suddenly take a couple hundred URLs that were previously blocked with robots.txt and 301 them to the correct canonical page?

I will have to see if i can even do that with my ecommerce system. The problem is that during the checkout process, they have to use the long URLs, so I will have to make sure that the method will allow the long checkout URLs to remain intact.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4302977 posted 8:32 pm on Apr 23, 2011 (gmt 0)

The addition of a redirect generally means the long URLs cannot be used.

There's one way you might be able to fix it. Redirect long URL http requests, and do not redirect long URL https requests, also requiring "logged in user" for all https requests.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4302977 posted 8:41 pm on Apr 23, 2011 (gmt 0)

I've read that Google might give less trust to a site that has an excessive number of redirects. In some cases it's an indication of major revamping, which could be a long-term negative mark against a site.

I don't know what the best choice would be in your situation. But whatever you do, it might be best to do it gradually if you can.

Planet13

WebmasterWorld Senior Member planet13 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4302977 posted 3:05 am on Apr 24, 2011 (gmt 0)

@aristotle:

I've read that Google might give less trust to a site that has an excessive number of redirects.


That isn't particularly good news for me... I already used 301 redirects to move half the content to a different site.

But whatever you do, it might be best to do it gradually if you can.


thanks for the advice. Unfortunately, I don't know if I am going to be able to do that, either. It turns out ALL the long URLs have the same directory (which is currently blocked with robots.txt), so unfortunately it is an all or nothing operation.

deadsea

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4302977 posted 11:44 am on Apr 25, 2011 (gmt 0)

I have a site with about 20 million pages and about 100 million urls that redirect. Googlebot regularly crawls a couple hundred thousand status 200 pages and a million status 301 pages. It ranks fine in Google both before and after panda. Its not the number of redirects that Google would object to. Its redirecting a large number of pages that had previously been indexed all at once.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4302977 posted 11:54 am on Apr 25, 2011 (gmt 0)

How can you 301 redirect valid paths in your store required for functioning?

If you do redirect, it should only be for a spider, not a human visitor as long as the product is still active.

Slap s canonical in the header, I use them, works pretty decent and you could simply put NOINDEX in the header for the URL variations to avoid a lot of redundant indexing.

deadsea

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4302977 posted 12:06 pm on Apr 25, 2011 (gmt 0)

If you do redirect, it should only be for a spider, not a human visitor as long as the product is still active.


Google specifically says that you should not do that: it is cloaking.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4302977 posted 3:55 pm on Apr 25, 2011 (gmt 0)

Google specifically says that you should not do that: it is cloaking.


It's only cloaking if you do something deceptive, keeping google from indexing every combination and permutation of a single ecommerce page with product options is hardly deceptive.

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4302977 posted 5:22 pm on Apr 25, 2011 (gmt 0)

I've read that Google might give less trust to a site that has an excessive number of redirects.

My take was always that this referred to internal 301 redirects (where the internal link on the site executes 301), but that having a mass of external 301 redirects is fine.

In some cases it's an indication of major revamping, which could be a long-term negative mark against a site.

I disagree. I do not think that many 301's that are implemented because of major revamping result in long-term negative mark against a site. Where the revamped site dropped substantially, it would be more with the revamped site changing structure / not executing URL migration properly and hence losing link juice / dropping content that was ranking previously or that supported other pages that ranked etc. Where the structure of the site has improved followed by revamping / redesign then I only ever saw positive effect despite redirecting 2000+ "old style" URLs.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4302977 posted 5:39 pm on Apr 25, 2011 (gmt 0)

My experience aligns with aakk9999. When I've seen problems, it's because the redirects were not executed properly, or other technical issues with the URLs made a mess that was hard to untangle.

Redevelopment of existing sites is a common occurrence, and though it does have challenges it doesn't not necessarily mean there will be lengthy problems because of "too many redirects." What can cause problems is many redirects all chained together for one single URL request.

arikgub

5+ Year Member



 
Msg#: 4302977 posted 6:09 pm on Apr 25, 2011 (gmt 0)

I have a scenario in mind where 301 can not be applied. Let's say we have the URLs like

http://example.com/widget.php?color={color}

where the {color} is some widget characteristic (e.g color) that is passed via GET request.

Now assume, that widget.php is the default product page for widgets. But optionally, the {color} para meter can be used to determine the widget image and the widget color attribute that will be displayed to the user.

On one hand, the pages are not exactly the same - they can not be 301'ed without changing user experience. On the other hand, they are almost certainly considered as dup content by Google.

What would you do?

If you choose to set the default page version (widget.php without any GET parameters) as your canonical, then I am wondering how Google treats the incoming links to a the different variations of the page. Is it equivalent to having all the pages 301'ed to the default page version in this regard?

deadsea

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4302977 posted 6:22 pm on Apr 25, 2011 (gmt 0)

I would choose the canonical tag of the 301 in that situation.

Is it equivalent to having all the pages 301'ed to the default page version in this regard?


Based on what I know about the canonical tag, it is equivalent to the 301 in that situation for Google.

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4302977 posted 6:24 pm on Apr 25, 2011 (gmt 0)

If the only difference is widget image and widget colour attribute shown to the user then in my opinion this would be a good choice to use canonical tag.

When the pages are canonicalised by implementing canonical tag, then incoming links to non-canonical version should count towards the declared canonical version of the page. There *could* be some PR juice loss though (as there also is by using 301).

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4302977 posted 6:52 pm on Apr 25, 2011 (gmt 0)

There *could* be some PR juice loss though (as there also is by using 301).


There *could* be some PR loss if Google mistakenly thinks you're trying to spam 200 copies of the same page for SEO purposes and slaps them all down supplemental.

You're damned if you do, you're damned if you don't, pick one and roll the dice.

arikgub

5+ Year Member



 
Msg#: 4302977 posted 7:11 pm on Apr 25, 2011 (gmt 0)

thanks all

There *could* be some PR loss if Google mistakenly thinks you're trying to spam 200 copies of the same page for SEO purposes


But the canonical tag is exactly what allows you to come clean, you are effectively telling Google that these pages are all the same thing, no?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4302977 posted 7:35 pm on Apr 25, 2011 (gmt 0)

Yes, the canonical tag is the right thing to use in some circumstances.

In many other cases it is not. By allowing users to still access non-canonical URLs and see content *at* that URL, you will continue to gain links pointing to non-canonical URLs. The redirect stops that behaviour.

Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4302977 posted 7:37 pm on Apr 25, 2011 (gmt 0)

- Canonical will tell Google which pages you want indexed.
- Robots.txt will tell Google which pages cannot be indexed.
- 301 redirect between identical pages will force the use of just one version
- Not creating multiple versions of a page will solve all problems from an indexing perspective.

If you can't achieve the last ideal then canonical is probably the best way to go. With canonical you will direct search traffic to the right version but not forcefully cut-off or block any potential good the other versions are creating, if any, such as incoming links of internal pagerank flow etc. Redirects may do well in limited use but I'm not sure you want 700 redirects in place that question trust.

It's better to say "this is the right page to index" than to try and forcefully block content.

[edited by: Sgt_Kickaxe at 7:39 pm (utc) on Apr 25, 2011]

fabulousyarn



 
Msg#: 4302977 posted 7:38 pm on Apr 25, 2011 (gmt 0)

Hi All - g1smd - so are you saying that if you have pages that are almost identical, but you need users to access these pages because they have say, a different color number, or image, that it's best to use the canonical? And in this case, are you risking being marked as spam? I have been testing both varieties (1 canonical) (2 editing the color pages to have content specific for that color) to see what works - nothing conclusive yet

deadsea

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4302977 posted 8:12 pm on Apr 25, 2011 (gmt 0)

Another option is to have a single canonical url for the product. Allow the users to change the color and such through Javascript and AJAX within the page. 301 redirect the urls with the color= parameter back to the canonical. You could even support the ability for users to bookmark or email the url with a color choice in it by supporting parameters after the hash tag:

http://example.com/widget.php#color=blue

Google doesn't crawl anything after the hash tag. Your server can't do anything with information after the hash tag. The javascript on the page can see it and do the correct AJAX/Javascript on page load.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4302977 posted 8:28 pm on Apr 25, 2011 (gmt 0)

- Robots.txt will tell Google which pages cannot be indexed.

It sort of does that, except that Google will continue to list those URLs as URL-only entries in the SERPs. The meta robots noindex tag is a better option, and the canonical tag another improvement yet again. The 301 redirect is often, but not always, the best option.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4302977 posted 9:43 pm on Apr 25, 2011 (gmt 0)

Another option is to have a single canonical url for the product. Allow the users to change the color and such through Javascript and AJAX within the page.


A more overall web friendly low-tech old school quick and dirty solution that might work with minimal changes to the site is to select the product options with a POST instead of a GET, and the current parameters pass from page to page as you change them, but don't create new URLs.

No new URLs, no problems for Google indexing the site.

Then you could eliminate all old URLs with 301s and rel=canonical, done for good.

Bewenched

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4302977 posted 10:19 pm on Apr 25, 2011 (gmt 0)

I've read that Google might give less trust to a site that has an excessive number of redirects.

not particularly good news for us either since many times manufacturers superceed parts, so instead of deleteing a part, we 301 it to the new one.

zehrila

5+ Year Member



 
Msg#: 4302977 posted 10:22 pm on Apr 25, 2011 (gmt 0)

What to do if you have 3 versions of same URL, served through same php file. E.g

1: example.com/Green-Blue-Widgets.html <--- Actual url
2: example.com/green-Blue-Widgets.html <--- look at small letter for word Green. I don't know how google pulled that out.
3: example.com/Green-Blue Widgets.html <--- See the space between Blue and Widgets.

Now would adding a canonical tag mentioned below in php file will sort out this duplication issue?

<link rel="canonical" href="http://example.com/Green-Blue-Widgets.html" />

This is will show the canonical tag in all 3 versions of url, 2 bad ones and 1 right one. Wondering if its right?

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4302977 posted 10:24 pm on Apr 25, 2011 (gmt 0)

so instead of deleting a part, we 301 it to the new one


This should be OK as long as you are not linking internally to the "old part page" URL (i.e. the old part page is either removed from the site or replaced with a replacement part page URL)

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4302977 posted 10:32 pm on Apr 25, 2011 (gmt 0)

What to do if you have 3 versions of same URL, served through same php file.

I would personally use 301 redirect to redirect 2. and 3. to 1.

Despite the above, I would also set up canonical tag as a kind of "catch all" - just in case some other URL misspelling pop up.

example.com/green-Blue-Widgets.html <--- look at small letter for word Green. I don't know how google pulled that out.
Are you on IIS server? They are case insensitive.

There might have been (or still exist) misspelled URL on your site. Or perhaps someone typed in URL in the address bar and Google picked up URL via G. toolbar? Or someone misspelled the URL when linking to it - there are many ways Google can get unwanted URL variant of a page.

<added>
example.com/Green-Blue Widgets.html <--- See the space between Blue and Widgets

Thinking about it - perhaps you have a problem with your back end because why would the above URL return the same content as the URL with hyphen between the words? Maybe you are using "fluffy" URL where the page content is obtained based on part of URL and the rest of URL can be anything. In this case you definitely need canonical tag on your site, but on top of this I would also catch as many unwanted URL variants as I can and 301 redirect them to the proper version.
</added>

zehrila

5+ Year Member



 
Msg#: 4302977 posted 10:41 pm on Apr 25, 2011 (gmt 0)

aakk9999: 301 would be a long job, i am not too much into coding and as you said there might be some other spelling or some other kind of mistakes and such urls can pop out.

I will get someone to add 301 redirect on 2 and 3, but for now, do you think the canonical tag in my main PHP will be an okay thing to do?

Note that canonical tag will appear in all 3 urls in view source. Is it okay to do it for now and mean while i try to figure a way to 301 2 and 3 to 1?

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4302977 posted 10:50 pm on Apr 25, 2011 (gmt 0)

Note that canonical tag will appear in all 3 urls in view source.

This is fine. Having a canonical tag that points to the same URL as page has does not do any harm.

Is it okay to do it for now and mean while i try to figure a way to 301 2 and 3 to 1?

Yes, I would do this.

This 64 message thread spans 3 pages: 64 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved