Difference between parameter exclusion and robots.txt?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Difference between parameter exclusion and robots.txt?

shaunm

7:09 am on Mar 21, 2014 (gmt 0)

Hi All,

What's the difference between adding the parameters in Google webmaster tool as 'No. Doesn't affect page content' where any URLs with this string will not be crawled and blocking the same query string through Robots.txt? Are they same?

What if I want this query string pages appearing in Google search be removed? Blocking it thorough robots.txt will still have the pages in Google index but with a snippet saying that 'A description for this result is not available because of this site's robots.txt – learn more'

Adding canonical, or noindex is not possible where there are n number of pages. How would you overcome such situations? Will adding the parameter exclusion through Google webmaster tool help?

Thanks for the help!

netmeg

12:34 pm on Mar 21, 2014 (gmt 0)

It seems to help. I use it for things like sort options on ecommerce sites.

shaunm

1:01 pm on Mar 21, 2014 (gmt 0)

Thanks @netmeg

I was wondering if the already indexed pages still show up in Google results pages saying 'A description for this result is not available because of this site's robots.txt – learn more' something like the robots.txt message? Or will get completely vanished from its index?

Thanks again

netmeg

2:12 pm on Mar 21, 2014 (gmt 0)

Since I implemented it, I haven't run across an URL with a parameter in a regular search. They're probably in there, but I think Google takes from that that they're supposed to show the canonical.

Simsi

8:10 pm on Mar 23, 2014 (gmt 0)

Must admit any of my pages that use parameters I simply stick in a canonical tag and I never see the parameters in G.

not2easy

4:53 am on Mar 24, 2014 (gmt 0)

I use parameters in GWT and G ignores my settings to crawl none. I can't say they index them but they won't stop crawling them and complaining about hitting 403s when they try to follow tracking URLs and 404s when they try to use a search results URL. You'd think they could understand that results pages don't exist at some URL until a search is performed. I have added the parameters to robots.txt to see if that helps them get it right and it does seem to help to have them in both places. Mind you, this helps on crawling, I don't keep up with everything they index or don't.

netmeg

12:24 pm on Mar 24, 2014 (gmt 0)

I usually noindex search results pages.

Errioxa

2:22 pm on Mar 24, 2014 (gmt 0)

On a site I was forced to use robots.txt and the results were a surprise.

in Webmaster Tools 400 million urls BLOCKED. That day also increased the number of indexed urls.
[mecagoenlos.com ]

I did another test on another site (smaller) and same happened
[mecagoenlos.com ]

shaunm

7:18 am on Mar 27, 2014 (gmt 0)

Thanks for the input guys!

It looks like we should look into all the possibilities and implement them all to make sure we are fine.

So, in my case, I should have to exclude the parameters through Google webmaster tool and also to include rel=canonical from the parameter URLs to unique pages.

But..but if the Google parameter exclusions works same like robots.txt exclusions then there is no way that the search engine robots will be able to find rel=canonical in the pages which I've excluded/blocked and thus make the process worthless.

So, I should add rel=canonical first and after making sure that the pages are no longer appear in search results I should add the parameter exclusions just to avoid such mess in the future?

Thanks again.

simonlondon

12:05 pm on Mar 27, 2014 (gmt 0)

The Parameter settings in GWT is a guidance for Google to allocate crawling resource on your site, which means it is a crawling guidance, not an indexing guidance. No matter what setting you choose, it is not going to affect how Google index your pages. For example, if those pages are already in the index, no matter how you change the setting, those pages will not be removed from the index.

If you are going to implement rel canonical, or noindex those pages, I would suggest that for some time you keep the setting in GWT parameters to crawl all. This setting will ensure that Google revisit the pages and get your indexation intructions, meaning your canonical tag or noindex tag.

shaunm

7:30 am on Mar 31, 2014 (gmt 0)

Thanks @simon :-)