Welcome to WebmasterWorld Guest from 54.221.54.252

Forum Moderators: goodroi

Message Too Old, No Replies

Disallowing with robots.txt

robots.txt, disallow certain urls

     
10:49 pm on Dec 14, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:June 23, 2005
posts: 63
votes: 0


Hi all, I paste my robots.txt below. I think I have a problem I need to take care. When I use my sitemap generator - I use GSite Crawler it crawls for ever this kinf of URL on my site:

http://www.example.com/?month=-7 and that goes on to /?month=nnnn

Thats when I stopped it. I asume ( sorry, newbie) this is a calendar in wordpress. I have WP in root and a punch of static pages in folders also. I also asume if this crawler does that the Google spider attempts the same and this would be negative (?) for my site. I add in my robots.txt this line:

" Disallow: /?month* " while I understand * is a wildcard and that would stop it but doesn't.

Below is my robots.txt - I got it from a wordpress website saying that would be the best. Any advise for my "problem"?

Thanks for any advise.

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: */feed
Disallow: /category/*/*
Disallow: */trackback
Disallow: */*/trackback
Disallow: /*?*
Disallow: /*?
Disallow: /?month*
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

[edited by: encyclo at 12:53 am (utc) on Jan. 13, 2008]
[edit reason] switched to example.com [/edit]

7:44 pm on Dec 15, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:June 23, 2005
posts: 63
votes: 0


Problem solved ( I think...). I used instead the WP XML sitemap plugin and add my external sites manual.
12:36 am on Jan 13, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The partial URL in robots.txt is matched "from the left" so there is no point whatsoever in having wildcards at the extreme right of the disallow statement.