Forum Moderators: goodroi

Message Too Old, No Replies

Validation

         

Dexie

9:37 am on Apr 25, 2005 (gmt 0)

10+ Year Member



Hi all,

Just tried to validate the robots.txt file and got the message below:

warning Multiple wildcard User-agent values. These should be combined under one User-agent/Disallow pair.

User-agent: *
10 warning Duplicate User-agent value seen: *. It is unknown how spiders will react to duplicates.

User-agent: *

1
2 User-agent: *
3 Disallow:
4
5
6 User-agent: Googlebot-Image
7 Disallow: /
8
9
10 User-agent: *
11 Disallow: /images/
12
13
14 User-agent: Yahoo-MMCrawler
15 Disallow: /

Can anyone please give an idea on a better way to code the above, but still have the same result? But also to disallow others viewing logs?

Any help appreciated.

Sev.

ncw164x

9:52 am on Apr 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



try this

User-agent: Googlebot-Image
Disallow: /

User-agent: Yahoo-MMCrawler
Disallow: /

User-agent: *
Disallow: /images/

To stop anyone from viewing your sites logs you are best advised to password protect this directory, unless its a well behaved spider it will access it anyway

Dexie

4:45 pm on Apr 27, 2005 (gmt 0)

10+ Year Member



Hi ncw164x, just to let you know that validated without any probs, so thanks for the help.

One thing . . . . so that I can hopefully understand, what was incorrect/duplicated about the original code?

ncw164x

5:45 pm on Apr 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You had this line what is incorrect
User-agent: *

then you was allowing all robots access but then banning two of them

this gives access to all robots to all of your site
User-agent: *
Disallow:

but then you disallow this robot
User-agent: Googlebot-Image
Disallow: /

and this
User-agent: Yahoo-MMCrawler
Disallow: /

and disallow all robots to your image directory
User-agent: *
Disallow: /images/

so by removing the incorrect line and the line giving access to all robots which is what you want to do anyway so you dont need it you are left with what validates

glad to be of help

Dexie

6:08 pm on Apr 27, 2005 (gmt 0)

10+ Year Member



The lightbulb just shone a tad brighter - got it.

Many thanks for your help and patience ;-)

Much appreciated.

Dexie

6:50 pm on Apr 27, 2005 (gmt 0)

10+ Year Member



Just a thought, I can see now that I boobed on that first line in that earlier code, but isn't the first code below correct, because I want to give the instruction that all robots can roam the site, on the proviso that the other 3 instructions are adhered to? Or, is there another set of instructions that says ALL robots are welcome to roam the site, except for the other 3 provisos? Because I definitely want all other robots to roam the site apart from those 3 exceptions.

Any help appreciated.

Sev.

User-agent: *
Disallow:

User-agent: Googlebot-Image
Disallow: /

User-agent: *
Disallow: /images/

User-agent: Yahoo-MMCrawler
Disallow: /

ThomasB

9:05 am on Apr 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



User-agent: *
Disallow: /images/

User-agent: Googlebot-Image
Disallow: /

User-agent: Yahoo-MMCrawler
Disallow: /

There you go.

Dexie

9:58 am on Apr 28, 2005 (gmt 0)

10+ Year Member



Many thanks Thomas, got that from the earlier post, and am thankful for that, but would it *also* be useful to put a command in there to let all other robots know that they are welcome?

ncw164x

10:05 am on Apr 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



the robots.txt file is to disallow spiders not to allow so you dont need

User-agent: *
Disallow:

if you are then banning any bots,

You can have a blank file just to stop getting 404 errors in the log file of your site and it will still give access to all robots

Dexie

10:35 am on Apr 28, 2005 (gmt 0)

10+ Year Member



Another lightbulbs just gone on - I thought the robots.txt file was to *improve* your chances of the other robots visiting your website? Not sure where I got that from?

Many thanks for the help again.

ncw164x

9:30 pm on Apr 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Quick Recap....

You don't have to have the file if you don't want to but you will see reference to 404 errors in your sites log file because the file was requested but was not available

having a blank file will stop the 404 errors

or in your case you want to disallow two robots and disallow access to your image directory

hence this....
User-agent: Googlebot-Image
Disallow: /

User-agent: Yahoo-MMCrawler
Disallow: /

User-agent: *
Disallow: /images/

click...I hear another light bulb go on ;)

To create a robots.txt file
Create a text file using a Word Processor or HTML editor using the required coding like the example above
Save the file as robots.txt
Upload the robots.txt file to the root directory using your FTP software in ACSII mode

Dexie

6:14 am on Apr 29, 2005 (gmt 0)

10+ Year Member



Many thanks for the info, it is appreciated, but I think I might not have explained it well enough before, I've been using the robots.txt file for about 3 years, and know how to upload it, but, I guess the succinct question would be: We've now got the correct instructions here, on how to disallow, but I thought it was also a good idea to show, in the robots.txt file, to all the other robots out there, that they are allowed?

larryhatch

6:23 am on Apr 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All other robots are AUTOMATICALLY allowed.
There are no 'allow' commands, that's like ordering a dog to breathe. - Larry

Dexie

6:51 am on Apr 29, 2005 (gmt 0)

10+ Year Member



Many thanks Larry, I always thought that the robots.txt was for ALLOWING as well, but it obviously isn't! You learn something new every day, (the dog thing wasn't new though ;-))

Many thanks to you people.

alexo

12:18 am on May 20, 2005 (gmt 0)

10+ Year Member



is it valid code?

User-agent: *
Disallow: /images/

User-agent: Googlebot
Crawl-delay: 10

i try to validate it and get "We're sorry, this robots.txt does NOT validate."

ThomasB

1:27 pm on May 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The reason is that crawl-delay is a proprietary standard, that's why it doesn't validate.

alexo

12:11 am on May 21, 2005 (gmt 0)

10+ Year Member



< The reason is that crawl-delay is a proprietary standard, that's why it doesn't validate. >

does this mean, that it will work for googlebot?

ThomasB

4:14 pm on May 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's not mentioned in their Googlebot FAQ [google.com] so I guess it won't work.