Forum Moderators: goodroi
User-agent: Googlebot
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
site:example.com shows link #2 as www.example.com/feed/
Google webmaster tools shows that url is NOT in the robots exclusion. Why is my wildcard not working Disallow: /*/feed/$
I can not seem to locate google official policy on robots.txt and wildcards and other optional tokens.
I also have a url in google that I want out, and would like to use robots to do so. The url is
/?p&paged=16
I take it that would be an impossible url to block? It has been 404'd for ages, but google hits it all the time.
Finally, in wordpress, I have urls of:
/category/personal/page/2/
/page/3/
I am no longer sure how the second url is accessed, but they are in the serps. Is there any good reason I should even let google crawl those pages either, and would:
Disallow: /page/
Disallow: /category/
those two rules take care of it for me?
I do have <meta name="robots" content="noindex,follow"/> in each of the above cases, put into the page dynamically, but perhaps the robots file is a bit more forceful?
there is no wildcarding or regular expression support for filenames.
(so the '$' isn't doing any good either)
The official Google answer about pattern matching in robots.txt is here [google.com...]
Current robots file:User-agent: Googlebot
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$site:example.com shows link #2 as www.example.com/feed/
Google webmaster tools shows that url is NOT in the robots exclusion. Why is my wildcard not working Disallow: /*/feed/$
There is no way "/*/feed/" would match "/feed/". Even if "*" ="", you're still trying to match "//feed/" with "/feed/".
Try this instead:
Disallow: */feed/$
Disallow: */feed/rss/$
Disallow: */trackback/$
But as others have said, wildcards are understood only by google and yahoo.