robots.txt need help

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt need help

a4tech

5:03 pm on Jan 11, 2013 (gmt 0)

Hello.

I need help. I see that google crowl some bad links that does not exist in my website.

like

http://example.com/test%p%12%5%.html

I am wanted to block that the urls that contain %.

If i Add in my Robot file:

Disallow: %

will this work perfectly or it will disterb other web urls that does not contain %.

Please give your suggestion and feed back.

Thanks a lot.

simran001

9:52 am on Jan 15, 2013 (gmt 0)

Hi,

Robots.txt is a text file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.

robot file contains only:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
User-agent: Googlebot

[edited by: engine at 9:30 am (utc) on Jan 16, 2013]
[edit reason] see WebmasterWorld TOS [/edit]

lucy24

10:17 pm on Jan 15, 2013 (gmt 0)

You are looking in the wrong place.

The problem is not that google is crawling urls you don't want it to know about. The problem is that it is inventing urls that don't exist. If it only does it now and then, this is normal: It is testing for "soft 404" responses. But if it is happening very often, you need to figure out where it is getting these imaginary URLs. There are other threads discussing this problem.

g1smd

10:20 pm on Jan 15, 2013 (gmt 0)

Disallow: /*%

will disallow crawling of URLs with a % in.

However, as stated above, that's the right answer to the wrong question.

If the requests are met with a 404 or 410 response then there is nothing further to do.

If your server is returning '200 OK' then you have a bigger problem to fix.