homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Robots.txt and regular expressions
Can they use them?

 3:23 pm on Jun 4, 2012 (gmt 0)

I have read conflicting stuff on this all over the interenet and even here in the forums. I am simply trying to get rid of a series of search pages with dynamic paramters and want to use robots.txt to do this. Can it except any regex? On SEOmoz (http://www.seomoz.org/learn-seo/robotstxt) it says they will

Pattern Matching

Google and Bing both honor two regular expressions that can be used to identify pages or sub-folders that a SEO wants excluded. These two characters are the asterisk (*) and the dollar sign ($).

* - which is a wildcard that represents any sequence of characters
$ - which matches the end of the URL

but most other sites I see say they will not.

Any ideas?



 4:03 pm on Jun 4, 2012 (gmt 0)

The * used in this way is not a Regular Expression, so be careful how you talk about it.

If you want to include directives that are intended for specific search engines, you can use any syntax that they say they recognize. But if you want an all-purpose "Which part of 'disallow' didn't you understand?" then stick to the minimalist form.

I don't know about Bing, but google ignores "crawl-delay" even though I'm sure it understands it perfectly well.


 4:57 pm on Jun 4, 2012 (gmt 0)

Yea I was talking about like a page that was page.aspx then the page had page.aspx/color=black and so on for more refinements. So I was going to add

Disallow: /page.aspx* does this sound correct?



 5:27 pm on Jun 4, 2012 (gmt 0)

Never use * at the end of the pattern. The pattern "matches from the left" and is a "prefix match".

Use * only near the beginning or in the middle of the pattern.

Disallow: /this disallows anything beginning example.com/this so the * is not needed.

Disallow: /*that disallows URL requests like example.com/<something-or-anything>that as a prefix.

$ ending is needed only when you need an exact match.

 5:41 pm on Jun 4, 2012 (gmt 0)

Thanks for the update g1smd. However with the example I put above I would need to match whatever is after the page.aspx and there isnt a real end that I can put on it because it is dynamic. So the page could be


Is there a way to match what I am talking about above with the *? thanks!


 5:47 pm on Jun 4, 2012 (gmt 0)

Do you need to match some query strings and not others?
Is this for one particular aspx page or for all aspx pages?

If you need to block all query strings for one particular
.aspx page then the prefix match for disallowing example.com/page.aspx?<anything> is
Disallow: /page.aspx?
It's a prefix match. You dont need a
* here.

If you want to block any
.aspx page with any query string, e.g. block example.com/<anything>.aspx?<anything> then use:
Disallow: /*.apsx?
* is needed only in place of the page name.

Never use
* at the end of the pattern.
* only near the beginning or in the middle of the pattern.

If you wanted to block requests for exactly
example.com/page.aspx without query strings but allow the same page with query strings you would use
Disallow: /page.apsx$
Disallow: /*.apsx$

 6:37 pm on Jun 4, 2012 (gmt 0)

Thanks g1smd I think I understand now. Since it is all query strings that go with page.aspx then I will use Disallow: /page.aspx? and it will match all of the additional query stings added to it. Correct?

Thanks again I really appreciate your help!


 6:42 pm on Jun 4, 2012 (gmt 0)

The pattern is a prefix match (matches from the left) so the rule
Disallow: /page.aspx?
matches any request that BEGINS
example.com/page.aspx? with anything or nothing after the question mark.

 6:52 pm on Jun 4, 2012 (gmt 0)

Thanks g1smd you have really helped a lot!


 7:48 pm on Jun 4, 2012 (gmt 0)

The devil is in the details.

It's especially important to define "exactly" what you want to do in plain English before you even begin to think about any code.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved