Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Robots.txt help needed for googlebot

         

simonmc

5:04 pm on Aug 6, 2006 (gmt 0)

10+ Year Member



Hi,

I can't seem to block the robots from indexing links with strings.

Let me give you an example:

I want to block this link:

/forum/ftopic1234.html&watch=topic&start=0&sid=23453455dter34

The ftopic part and the sid part have many combinations.

I have in my robots text file:

Disallow: /forum/ftopic*.html&watch=topic&start=0&sid=*$

Alas this does not seem to work and the bot just follows these links and indexes them.

Can you see where I am going wrong? This is for google mainly too.

AjiNIMC

6:50 pm on Aug 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is your robots.txt sending proper header? Also try a robots validator.

jay5r

7:11 pm on Aug 6, 2006 (gmt 0)

10+ Year Member Top Contributors Of The Month



The Google sitemaps tool gives you the ability to try different variations of robots.txt against different URLs to see if they're blocked by the robots.txt file or not. Then when you find one that works, you post it on your site.

tedster

10:52 pm on Aug 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



An observation -- the robots.txt standard does not allow wildcard characters, so a proper validator will choke on that line. However, Google has extended the standard and it does support wildcards. See:

[google.com...]

It's probably a good idea to go over what Google says on that page very carefully.

I am wondering if two asterisks might be more than Google can handle accurately, at present. All their examples have only one. You might consider whether you can dynamically generate a meta robots tag for the <head> of any urls you want to see excluded.