Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

How do I block TagIDs?



7:30 am on Apr 9, 2013 (gmt 0)

Hi all,

Could you please help me with blocking pages from being indexed through Robots.txt?

This is for a blog where the tags and values are automatically assigned and available on the home page. So when clicked on a tag, it goes to a page like


So, I want to exclude all those tags(urls)being crawled and indexed by Google. I just want to exclude everything that contains '?tagid=' How do I do that? I see that I could block all those URLs that have '?' in them through robots, but I am concerned that it might block other important pages also.

Could you please help with this?

Thank you for all and any help :-)


11:22 am on Apr 9, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

This document details how Google handles the robots.txt file:
http://developers.google.com/webmasters/control-crawl-index/docs/robots_txt [developers.google.com]

you'll want to use the Disallow: directive.

The [path] value, if specified, is to be seen relative from the root of the website for which the robots.txt file was fetched (using the same protocol, port number, host and domain names). The path value must start with "/" to designate the root.

this means the crawler matches the url to be requested from left-to-right starting from the leading / which is the document root directory.

you'll need to answer these questions before you write a robots.txt file:
do you want to exclude exactly /abcprompt.aspx or all paths?
do you want to exclude only urls with exactly one parameter that is tagid or any query string with the tagid parameter?

and here's the other problem - you can use robots.txt to exclude googlebot from crawling but you can't use it to prevent google from indexing any urls it discovers.
if you want to control indexing you will have to allow crawling of the url and provide either a meta robots noindex element in the document head or a X-Robots-Tag HTTP Response header with a noindex value.

Robots meta tag and X-Robots-Tag HTTP header specifications - Webmasters - Google Developers:
http://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag [developers.google.com]


11:34 am on Apr 9, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Is there a link to the bare
URL from within the site?

If not, the page content will never be spidered and indexed.


1:27 pm on Apr 10, 2013 (gmt 0)

Thank you so much for that link :-)

To put in simple I used this in my robots.txt file, would that work?

Disallow: /*tagid

Thanks :-)
Yes the tags are available on the home page like any other blogs with tags, these URL values are automatically assigned to these tags.

The original page is example.com/something.aspx. Tags to this page will lead to example.com/something.aspx?tagid=22 or something

Thanks again


6:39 pm on Apr 10, 2013 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Disallow: /*tagid

That looks technically correct as long as the character string "tagid" doesn't appear in any other URLs except the ones you want to disallow crawling for. If you say "Disallow: /*?tagid" then including the "?" would limit the rule to just query string parameters - that might be even safer.

Another step you can take is to use your Webmaster Tools account to tell Google to ignore the "tagid" parameter. Look under the Configuration > URL Parameters section.


6:40 am on Apr 11, 2013 (gmt 0)


Thank you so much for your help. I will replace 'tagid' with '?tagid' then. And will work it out through Google webmaster tool as well :-)