homepage Welcome to WebmasterWorld Guest from 54.167.182.201
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Looking for complicated borderline Robots.txt
Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 99 posted 2:48 am on Apr 5, 2001 (gmt 0)

If anyone has some really complex Robots.txt, they would send me for running through the new validator over at the other site, I would appreciate it. . Need some test foder...

 

JuniorHarris

10+ Year Member



 
Msg#: 99 posted 1:11 pm on Apr 5, 2001 (gmt 0)

Something to keep you busy!~:)

(common)
[google.com...]
[microsoft.com...]
[dmoz.org...]
[northernlight.com...]

(others)
[klug-suchen.de...]
[searchcode.de...]
[polk.ucdavis.edu...]
[global-positioning.com...]

(more complex)
[tardis.ed.ac.uk...]
[searchtools.com...]
[searchenginewatch.com...]
[wdvl.com...]
[suchfibel.de...]

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 99 posted 9:12 am on Apr 9, 2001 (gmt 0)

ack - those are all simple ones. I *was* talking 2-3k bloated robots.txt like this one:

(warning: 400k):
[greatrentals.com]

And I'd double check the scholarship to this school:
[physiology.uthscsa.edu]

And here is an example of too much leisure time:
[goddethroned.diaryland.com]

Well, since I couldn't find anyone willing to fess up, I broke out the spider and had a go:

[searchengineworld.com...]

Thanks
Brett

Xoc

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 99 posted 10:14 am on Apr 9, 2001 (gmt 0)

I'm starting to think that the w3c needs to standardize robots.txt. Since there isn't a spec, any complicated robots.txt can be interpreted by a spider however it feels is right. Another spider could interpret it an entirely different way. Seems like they should be able to do that spec in less than a year! :)

Brett, nice job with the robots page. (The ODP page was nice too!)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved