| 8:30 pm on Mar 30, 2011 (gmt 0)|
Seems the answer to my first question might be :
which matches :
accoeding to : [code.google.com...]
I assume it will match : /index.php?anyparameters
I'm still looking for an answer to the second question. Is it a case of disallowing ?a=
then allowing ?a=blah:text ?
| 1:42 am on Mar 31, 2011 (gmt 0)|
I guess I need to make it clearer :
exclude this form :
keep (allow / index) this form :
How do do this in robots.txt ?
I've updated the url creation to be more descriptive but unless can remove the old format urls I will face duplicate content / title issues. (There are too many to block one by one)
| 4:04 pm on Mar 31, 2011 (gmt 0)|
My attempts to remove pages of this form :
is not working.
and used the remove URL in Webmaster tools but these pages still show in the Google index. What am I doing wrong ?
| 4:11 pm on Mar 31, 2011 (gmt 0)|
I believe the trailing wildcard in this instance is ignored.
I'm not certain but I think your use of wildcards is incorrect.
|I would like to block ALL these pages, no matter what the query string is (?....) |
Block or remove pages using a robots.txt file
| 7:01 pm on Mar 31, 2011 (gmt 0)|
I dont want to block every page, just those with index.php?something
Ideally keep index.php with no query string so that www.mydomain.com still appears in the index. Or will it anyway? Or is same as www.mydomain.com/index.php ?
| 8:24 pm on Mar 31, 2011 (gmt 0)|
Pattern matching in robots.txt is prefix matching "from the left".
Where a wildcard is used it is only needed "on the left" or "in the middle".
| 10:35 pm on Mar 31, 2011 (gmt 0)|
I am meaning the '?' to be part of the URL (query string).
Where can I read more on pattern matching for robots.txt ?
The robotstxt.org site says nothing about it.
| 11:27 pm on Mar 31, 2011 (gmt 0)|
The pattern matching is simple.
| 12:24 am on Apr 1, 2011 (gmt 0)|
So to allow index.php but disallow index.php?<anything>
? (used = r.t. ? in case ? has a special meaning)
Will this do what I want ?
to allow articles (index.php?option=article&...) but not forum stuff (index.php?option=forum&...)
I could use :