Welcome to WebmasterWorld Guest from 54.145.65.62

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt need help

     
5:03 pm on Jan 11, 2013 (gmt 0)

New User

joined:Dec 21, 2012
posts: 7
votes: 0


Hello.

I need help. I see that google crowl some bad links that does not exist in my website.

like

http://example.com/test%p%12%5%.html

I am wanted to block that the urls that contain %.

If i Add in my Robot file:

Disallow: %

will this work perfectly or it will disterb other web urls that does not contain %.

Please give your suggestion and feed back.

Thanks a lot.
9:52 am on Jan 15, 2013 (gmt 0)

New User

joined:Jan 12, 2013
posts: 5
votes: 0


Hi,

Robots.txt is a text file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.

robot file contains only:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
User-agent: Googlebot

[edited by: engine at 9:30 am (utc) on Jan 16, 2013]
[edit reason] see WebmasterWorld TOS [/edit]

10:17 pm on Jan 15, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13210
votes: 347


You are looking in the wrong place.

The problem is not that google is crawling urls you don't want it to know about. The problem is that it is inventing urls that don't exist. If it only does it now and then, this is normal: It is testing for "soft 404" responses. But if it is happening very often, you need to figure out where it is getting these imaginary URLs. There are other threads discussing this problem.
10:20 pm on Jan 15, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Disallow: /*%

will disallow crawling of URLs with a % in.

However, as stated above, that's the right answer to the wrong question.


If the requests are met with a 404 or 410 response then there is nothing further to do.

If your server is returning '200 OK' then you have a bigger problem to fix.