leadegroot

msg:4113819 | 10:11 am on Apr 11, 2010 (gmt 0) |
If the robots.txt file is empty you don't have to have it - but I tend to use an empty one just to keep the errors log empty. If your only entry is 'this bot has full access' then you don't need one. (Note that the file is called robots.txt - note the 's' - if you misname the file the bots will not find it)
|
Edge

msg:4113852 | 1:31 pm on Apr 11, 2010 (gmt 0) |
User-agent: Googlebot Disallow: /bunch-of-stuff Allow: / User-agent: Googlebot-Mobile Disallow: /bunch-of-stuff Allow: / User-agent: Mediapartners-Google Disallow: /bunch-of-stuff Allow: / User-agent: Googlebot-Image Disallow: / User-agent: asterias Disallow: / User-agent: aibot Disallow: / User-agent: Alexibot Disallow: / User-agent: asterias Disallow: / User-agent: BackDoorBot Disallow: / User-agent: BecomeBot Disallow: / User-agent: Bloodhound Disallow: / User-agent: BotALot Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: Bullseye Disallow: / User-agent: BunnySlippers Disallow: / User-agent: CheeseBot Disallow: / User-agent: CherryPicker Disallow: / User-agent: CherryPickerSE Disallow: / User-agent: CherryPickerElite Disallow: / User-agent: cosmos Disallow: / User-agent: Crescent Disallow: / User-agent: Crescent Internet ToolPak Disallow: / User-agent: combine Disallow: / User-agent: Copernic Disallow: / User-agent: CopyRightCheck Disallow: / User-agent: DittoSpyder Disallow: / User-agent: Down2Web Disallow: / User-agent: dumbot Disallow: / User-agent: e-collector Disallow: / User-agent: Email Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailWolf Disallow: / User-agent: EmailSiphon Disallow: / User-agent: Enterprise_Search Disallow: / User-agent: es Disallow: / User-agent: EroCrawler Disallow: / User-agent: ExtractorPro Disallow: / User-agent: Exabot Disallow: / User-agent: FairAd Client Disallow: / User-agent: Flaming AttackBot Disallow: / User-agent: Foobot Disallow: / User-agent: Francis Disallow: / User-agent: FreeFind Disallow: / User-agent: Gaisbot Disallow: / User-agent: grub Disallow: / User-agent: grub-client Disallow: / User-agent: Googlebot Disallow: /*.gif$ User-agent: Hatena Antenna Disallow: / User-agent: Harvest Disallow: / User-agent: Heritrix Disallow: / User-agent: hloader Disallow: / User-agent: htmlgobble Disallow: / User-agent: httplib Disallow: / User-agent: HTTrack Disallow: / User-agent: humanlinks Disallow: / User-agent: ia_archiver Disallow: / User-agent: InfoNaviRobot Disallow: / User-agent: JennyBot Disallow: / User-agent: JavaBee Disallow: / User-agent: JoBo Disallow: / User-agent: Java Disallow: / User-agent: Jetbot/ Disallow: / User-agent: Jetbot Disallow: / User-agent: Kenjin Spider Disallow: / User-agent: Larbin Disallow: / User-agent: LexiBot Disallow: / User-agent: LinkextractorPro Disallow: / User-agent: LinkWalker Disallow: / User-agent: LNSpiderguy Disallow: / User-agent: lwp-trivial Disallow: / User-agent: Mata Hari Disallow: / User-agent: MIIxpc Disallow: / User-agent: Microsoft URL Control Disallow: / User-agent: moget Disallow: / User-agent: naver Disallow: / User-agent: NetAnts Disallow: / User-agent: NICErsPRO Disallow: / User-agent: Nutch Disallow: / User-agent: Offline Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Openbot Disallow: / User-agent: Openfind data gathere Disallow: / User-agent: Openfind Disallow: / User-agent: PerMan Disallow: / User-agent: PentonMediabot Disallow: / User-agent: psbot Disallow: / User-agent: ProPowerBot Disallow: / User-agent: ProWebWalker Disallow: / User-agent: Robofox Disallow: / User-agent: SiteSnagger Disallow: / User-agent: SiteVigil Disallow: / User-agent: Sohu Disallow: / User-agent: tarspider Disallow: / User-agent: The Intraformant Disallow: / User-agent: Teleport Disallow: / User-agent: Teleport Pro Disallow: / User-agent: Telesoft Disallow: / User-agent: Twiceler Disallow: / User-agent: URL_Spider_Pro Disallow: / User-agent: w3mir Disallow: / User-agent: WebAuto Disallow: / User-agent: webbandit Disallow: / User-agent: WebCapture Disallow: / User-agent: WebCopier Disallow: / User-agent: webmirror Disallow: / User-agent: Website Quester Disallow: / User-agent: Webster Disallow: / User-agent: Web Downloader Disallow: / User-agent: WebFetcher Disallow: / User-agent: WebEnhancer Disallow: / User-agent: Webster Pro Disallow: / User-agent: Wget Disallow: / User-agent: WebSauger Disallow: / User-agent: WebStripper Disallow: / User-agent: WebWasher Disallow: / User-agent: webvac Disallow: / User-agent: WebZIP Disallow: / User-agent: WWW-Collector-E Disallow: / User-agent: Xenu's Link Sleuth Disallow: / User-agent: Xenu's Disallow: / User-agent: Zeus Disallow: / User-agent: Zeus Link Scout Disallow: /
|
tangor

msg:4113929 | 5:43 pm on Apr 11, 2010 (gmt 0) |
Edge... it's been my experience if google is allowed in they assume all the rest of their little bots are allowed, too... hence if you DON'T want Googlebot-Mobile (example) you'd have to disallow that one. I generally whitelist (allow) a handful of useful bots and disallow all others. Makes for a much shorter robots.txt !
|
Edge

msg:4114502 | 6:56 pm on Apr 12, 2010 (gmt 0) |
Each Googlebot crawls different data/media. There are places I don't want a particualr bot and other places that are OK. Different rules for each google bot.
|
|