Welcome to WebmasterWorld Guest from 54.145.208.64

Forum Moderators: goodroi

Message Too Old, No Replies

Looking for complicated borderline Robots.txt

   
2:48 am on Apr 5, 2001 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If anyone has some really complex Robots.txt, they would send me for running through the new validator over at the other site, I would appreciate it. . Need some test foder...
1:11 pm on Apr 5, 2001 (gmt 0)

10+ Year Member



Something to keep you busy!~:)

(common)
[google.com...]
[microsoft.com...]
[dmoz.org...]
[northernlight.com...]

(others)
[klug-suchen.de...]
[searchcode.de...]
[polk.ucdavis.edu...]
[global-positioning.com...]

(more complex)
[tardis.ed.ac.uk...]
[searchtools.com...]
[searchenginewatch.com...]
[wdvl.com...]
[suchfibel.de...]

9:12 am on Apr 9, 2001 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



ack - those are all simple ones. I *was* talking 2-3k bloated robots.txt like this one:

(warning: 400k):
[greatrentals.com]

And I'd double check the scholarship to this school:
[physiology.uthscsa.edu]

And here is an example of too much leisure time:
[goddethroned.diaryland.com]

Well, since I couldn't find anyone willing to fess up, I broke out the spider and had a go:

[searchengineworld.com...]

Thanks
Brett

Xoc

10:14 am on Apr 9, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm starting to think that the w3c needs to standardize robots.txt. Since there isn't a spec, any complicated robots.txt can be interpreted by a spider however it feels is right. Another spider could interpret it an entirely different way. Seems like they should be able to do that spec in less than a year! :)

Brett, nice job with the robots page. (The ODP page was nice too!)