Welcome to WebmasterWorld Guest from 54.159.129.152

Forum Moderators: goodroi

Message Too Old, No Replies

Disallowing with robots.txt

robots.txt, disallow certain urls

     
10:49 pm on Dec 14, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:June 23, 2005
posts: 63
votes: 0


Hi all, I paste my robots.txt below. I think I have a problem I need to take care. When I use my sitemap generator - I use GSite Crawler it crawls for ever this kinf of URL on my site:

http://www.example.com/?month=-7 and that goes on to /?month=nnnn

Thats when I stopped it. I asume ( sorry, newbie) this is a calendar in wordpress. I have WP in root and a punch of static pages in folders also. I also asume if this crawler does that the Google spider attempts the same and this would be negative (?) for my site. I add in my robots.txt this line:

" Disallow: /?month* " while I understand * is a wildcard and that would stop it but doesn't.

Below is my robots.txt - I got it from a wordpress website saying that would be the best. Any advise for my "problem"?

Thanks for any advise.

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: */feed
Disallow: /category/*/*
Disallow: */trackback
Disallow: */*/trackback
Disallow: /*?*
Disallow: /*?
Disallow: /?month*
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

[edited by: encyclo at 12:53 am (utc) on Jan. 13, 2008]
[edit reason] switched to example.com [/edit]

7:44 pm on Dec 15, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:June 23, 2005
posts: 63
votes: 0


Problem solved ( I think...). I used instead the WP XML sitemap plugin and add my external sites manual.
12:36 am on Jan 13, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The partial URL in robots.txt is matched "from the left" so there is no point whatsoever in having wildcards at the extreme right of the disallow statement.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members