homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

robots.txt need help

Msg#: 4535330 posted 5:03 pm on Jan 11, 2013 (gmt 0)


I need help. I see that google crowl some bad links that does not exist in my website.



I am wanted to block that the urls that contain %.

If i Add in my Robot file:

Disallow: %

will this work perfectly or it will disterb other web urls that does not contain %.

Please give your suggestion and feed back.

Thanks a lot.



Msg#: 4535330 posted 9:52 am on Jan 15, 2013 (gmt 0)


Robots.txt is a text file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.

robot file contains only:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
User-agent: Googlebot

[edited by: engine at 9:30 am (utc) on Jan 16, 2013]
[edit reason] see WebmasterWorld TOS [/edit]


WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

Msg#: 4535330 posted 10:17 pm on Jan 15, 2013 (gmt 0)

You are looking in the wrong place.

The problem is not that google is crawling urls you don't want it to know about. The problem is that it is inventing urls that don't exist. If it only does it now and then, this is normal: It is testing for "soft 404" responses. But if it is happening very often, you need to figure out where it is getting these imaginary URLs. There are other threads discussing this problem.


WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member

Msg#: 4535330 posted 10:20 pm on Jan 15, 2013 (gmt 0)

Disallow: /*%

will disallow crawling of URLs with a % in.

However, as stated above, that's the right answer to the wrong question.

If the requests are met with a 404 or 410 response then there is nothing further to do.

If your server is returning '200 OK' then you have a bigger problem to fix.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved