homepage Welcome to WebmasterWorld Guest from 54.242.18.190
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robots.txt need help
a4tech




msg:4535332
 5:03 pm on Jan 11, 2013 (gmt 0)

Hello.

I need help. I see that google crowl some bad links that does not exist in my website.

like

http://example.com/test%p%12%5%.html

I am wanted to block that the urls that contain %.

If i Add in my Robot file:

Disallow: %

will this work perfectly or it will disterb other web urls that does not contain %.

Please give your suggestion and feed back.

Thanks a lot.

 

simran001




msg:4536134
 9:52 am on Jan 15, 2013 (gmt 0)

Hi,

Robots.txt is a text file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.

robot file contains only:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
User-agent: Googlebot

[edited by: engine at 9:30 am (utc) on Jan 16, 2013]
[edit reason] see WebmasterWorld TOS [/edit]

lucy24




msg:4536313
 10:17 pm on Jan 15, 2013 (gmt 0)

You are looking in the wrong place.

The problem is not that google is crawling urls you don't want it to know about. The problem is that it is inventing urls that don't exist. If it only does it now and then, this is normal: It is testing for "soft 404" responses. But if it is happening very often, you need to figure out where it is getting these imaginary URLs. There are other threads discussing this problem.

g1smd




msg:4536314
 10:20 pm on Jan 15, 2013 (gmt 0)

Disallow: /*%

will disallow crawling of URLs with a % in.

However, as stated above, that's the right answer to the wrong question.


If the requests are met with a 404 or 410 response then there is nothing further to do.

If your server is returning '200 OK' then you have a bigger problem to fix.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved