Welcome to WebmasterWorld Guest from 54.158.36.59

Forum Moderators: goodroi

Message Too Old, No Replies

Disallowing with robots.txt

robots.txt, disallow certain urls

     
10:49 pm on Dec 14, 2007 (gmt 0)

10+ Year Member



Hi all, I paste my robots.txt below. I think I have a problem I need to take care. When I use my sitemap generator - I use GSite Crawler it crawls for ever this kinf of URL on my site:

http://www.example.com/?month=-7 and that goes on to /?month=nnnn

Thats when I stopped it. I asume ( sorry, newbie) this is a calendar in wordpress. I have WP in root and a punch of static pages in folders also. I also asume if this crawler does that the Google spider attempts the same and this would be negative (?) for my site. I add in my robots.txt this line:

" Disallow: /?month* " while I understand * is a wildcard and that would stop it but doesn't.

Below is my robots.txt - I got it from a wordpress website saying that would be the best. Any advise for my "problem"?

Thanks for any advise.

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: */feed
Disallow: /category/*/*
Disallow: */trackback
Disallow: */*/trackback
Disallow: /*?*
Disallow: /*?
Disallow: /?month*
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

[edited by: encyclo at 12:53 am (utc) on Jan. 13, 2008]
[edit reason] switched to example.com [/edit]

7:44 pm on Dec 15, 2007 (gmt 0)

10+ Year Member



Problem solved ( I think...). I used instead the WP XML sitemap plugin and add my external sites manual.
12:36 am on Jan 13, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The partial URL in robots.txt is matched "from the left" so there is no point whatsoever in having wildcards at the extreme right of the disallow statement.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month