Forum Moderators: goodroi

Message Too Old, No Replies

Disallowing & in url

         

Wolvereness

10:49 am on Jul 12, 2005 (gmt 0)



I'm setting up a robots.txt (I don't have one) and the only thing I want it to do is to allow all bots, but disallow any url that has the & symble. How would I go about with the syntax? I want it to index the pages with? because that is where all my content is, but my interactive pages always have extra GET variables.

jimbeetle

2:54 pm on Jul 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Wolvereness, welcome to Webmaster World.

Robots.txt syntax [robotstxt.org] is very, very simple and does not use any wild card characters (except for * which simply means all user agents) or expressions. About the only way you can use robots.txt to restrict access to pages with dynamic urls is to put them in their own directory. Then you can have something like this:

User-agent: *
Disallow: /dynamic-content/

If this isn't doable, another tact you might want to look into is including a Robots META tag [robotstxt.org] in the head of each of your dynamically generated pages:

<meta name="robots" content="noindex,nofollow">

This actually has an advantage. While Google does not index pages disallowed by robots.txt, it will list all pages that it discovers. So, if you have a directory protected by robots.txt and links pointing to those pages, G will return the pages in the SERPs as URL-only listings. Using the noindex,nofollow is the only way to make sure Google doesn't list the pages at all.