Current robots file: User-agent: Googlebot
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
site:example.com shows link #2 as www.example.com/feed/
Google webmaster tools shows that url is NOT in the robots exclusion. Why is my wildcard not working Disallow: /*/feed/$
I can not seem to locate google official policy on robots.txt and wildcards and other optional tokens.
I also have a url in google that I want out, and would like to use robots to do so. The url is
/?p&paged=16
I take it that would be an impossible url to block? It has been 404'd for ages, but google hits it all the time.
Finally, in wordpress, I have urls of:
/category/personal/page/2/
/page/3/
I am no longer sure how the second url is accessed, but they are in the serps. Is there any good reason I should even let google crawl those pages either, and would:
Disallow: /page/
Disallow: /category/
those two rules take care of it for me?
I do have <meta name="robots" content="noindex,follow"/> in each of the above cases, put into the page dynamically, but perhaps the robots file is a bit more forceful?