Msg#: 4461252 posted 3:23 pm on Jun 4, 2012 (gmt 0)
I have read conflicting stuff on this all over the interenet and even here in the forums. I am simply trying to get rid of a series of search pages with dynamic paramters and want to use robots.txt to do this. Can it except any regex? On SEOmoz (http://www.seomoz.org/learn-seo/robotstxt) it says they will
Google and Bing both honor two regular expressions that can be used to identify pages or sub-folders that a SEO wants excluded. These two characters are the asterisk (*) and the dollar sign ($).
* - which is a wildcard that represents any sequence of characters $ - which matches the end of the URL
Msg#: 4461252 posted 4:03 pm on Jun 4, 2012 (gmt 0)
The * used in this way is not a Regular Expression, so be careful how you talk about it.
If you want to include directives that are intended for specific search engines, you can use any syntax that they say they recognize. But if you want an all-purpose "Which part of 'disallow' didn't you understand?" then stick to the minimalist form.
I don't know about Bing, but google ignores "crawl-delay" even though I'm sure it understands it perfectly well.
Msg#: 4461252 posted 5:41 pm on Jun 4, 2012 (gmt 0)
Thanks for the update g1smd. However with the example I put above I would need to match whatever is after the page.aspx and there isnt a real end that I can put on it because it is dynamic. So the page could be
Msg#: 4461252 posted 5:47 pm on Jun 4, 2012 (gmt 0)
Do you need to match some query strings and not others? Is this for one particular aspx page or for all aspx pages?
If you need to block all query strings for one particular .aspx page then the prefix match for disallowing example.com/page.aspx?<anything> is Disallow: /page.aspx? It's a prefix match. You dont need a * here.
If you want to block any .aspx page with any query string, e.g. block example.com/<anything>.aspx?<anything> then use: Disallow: /*.apsx? The * is needed only in place of the page name.
Never use * at the end of the pattern. Use * only near the beginning or in the middle of the pattern.
If you wanted to block requests for exactly example.com/page.aspx without query strings but allow the same page with query strings you would use Disallow: /page.apsx$ or Disallow: /*.apsx$
Msg#: 4461252 posted 6:37 pm on Jun 4, 2012 (gmt 0)
Thanks g1smd I think I understand now. Since it is all query strings that go with page.aspx then I will use Disallow: /page.aspx? and it will match all of the additional query stings added to it. Correct?