homepage Welcome to WebmasterWorld Guest from 54.196.57.4
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Disallowing with robots.txt
robots.txt, disallow certain urls
jim_knopf




msg:3528170
 10:49 pm on Dec 14, 2007 (gmt 0)

Hi all, I paste my robots.txt below. I think I have a problem I need to take care. When I use my sitemap generator - I use GSite Crawler it crawls for ever this kinf of URL on my site:

http://www.example.com/?month=-7 and that goes on to /?month=nnnn

Thats when I stopped it. I asume ( sorry, newbie) this is a calendar in wordpress. I have WP in root and a punch of static pages in folders also. I also asume if this crawler does that the Google spider attempts the same and this would be negative (?) for my site. I add in my robots.txt this line:

" Disallow: /?month* " while I understand * is a wildcard and that would stop it but doesn't.

Below is my robots.txt - I got it from a wordpress website saying that would be the best. Any advise for my "problem"?

Thanks for any advise.

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: */feed
Disallow: /category/*/*
Disallow: */trackback
Disallow: */*/trackback
Disallow: /*?*
Disallow: /*?
Disallow: /?month*
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

[edited by: encyclo at 12:53 am (utc) on Jan. 13, 2008]
[edit reason] switched to example.com [/edit]

 

jim_knopf




msg:3528608
 7:44 pm on Dec 15, 2007 (gmt 0)

Problem solved ( I think...). I used instead the WP XML sitemap plugin and add my external sites manual.

g1smd




msg:3546901
 12:36 am on Jan 13, 2008 (gmt 0)

The partial URL in robots.txt is matched "from the left" so there is no point whatsoever in having wildcards at the extreme right of the disallow statement.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved