| verify the robots.txt tell me the following robots.txt is correct or does it have any error |
selenagomez

msg:4144380 | 3:39 pm on May 31, 2010 (gmt 0) | hii i had uploaded the following text in robots.txt please check and tell me if there is any mistake in the robots.txt my site is SEO friendly and i have a awesome SEO . AWESOME! so i am fearing .. plzz help the code for robots.txt is as follows User-agent: * Disallow: /uploads/ Disallow: /backup/ Disallow: /cgi-bin/ Disallow: /basket/ Disallow: /cpanel/ Disallow: /dle_config.php Disallow: /admin.php Disallow: /autobackup.php Allow: / Host: www.example.com User-agent: WebZip Disallow: / User-agent: larbin Disallow: / User-agent: b2w/0.1 Disallow: / User-agent: Copernic Disallow: / User-agent: psbot Disallow: / User-agent: Python-urllib Disallow: / User-agent: NetMechanic Disallow: / User-agent: URL_Spider_Pro Disallow: / User-agent: CherryPicker Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: WebBandit Disallow: / User-agent: EmailWolf Disallow: / User-agent: ExtractorPro Disallow: / User-agent: CopyRightCheck Disallow: / User-agent: Crescent Disallow: / User-agent: SiteSnagger Disallow: / User-agent: ProWebWalker Disallow: / User-agent: CheeseBot Disallow: / User-agent: LNSpiderguy Disallow: / User-agent: Alexibot Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: MIIxpc Disallow: / User-agent: Telesoft Disallow: / User-agent: Website Quester Disallow: / User-agent: moget/2.1 Disallow: / User-agent: WebStripper Disallow: / User-agent: WebSauger Disallow: / User-agent: WebCopier Disallow: / User-agent: NetAnts Disallow: / User-agent: TheNomad Disallow: / User-agent: WWW-Collector-E Disallow: / User-agent: RMA Disallow: / User-agent: libWeb/clsHTTP Disallow: / User-agent: asterias Disallow: / User-agent: httplib Disallow: / User-agent: turingos Disallow: / User-agent: spanner Disallow: / User-agent: InfoNaviRobot Disallow: / User-agent: Harvest/1.5 Disallow: / User-agent: Bullseye/1.0 Disallow: / User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) Disallow: / User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 Disallow: / User-agent: CherryPickerSE/1.0 Disallow: / User-agent: CherryPickerElite/1.0 Disallow: / User-agent: WebBandit/3.50 Disallow: / User-agent: NICErsPRO Disallow: / User-agent: Microsoft URL Control - 5.01.4511 Disallow: / User-agent: DittoSpyder Disallow: / User-agent: Foobot Disallow: / User-agent: SpankBot Disallow: / User-agent: BotALot Disallow: / User-agent: lwp-trivial/1.34 Disallow: / User-agent: lwp-trivial Disallow: / User-agent: BunnySlippers Disallow: / User-agent: Microsoft URL Control - 6.00.8169 Disallow: / User-agent: URLy Warning Disallow: / User-agent: Wget/1.6 Disallow: / User-agent: Wget/1.5.3 Disallow: / User-agent: Wget Disallow: / User-agent: LinkWalker Disallow: / User-agent: cosmos Disallow: / User-agent: moget Disallow: / User-agent: hloader Disallow: / User-agent: humanlinks Disallow: / User-agent: LinkextractorPro Disallow: / User-agent: Mata Hari Disallow: / User-agent: LexiBot Disallow: / User-agent: Web Image Collector Disallow: / User-agent: The Intraformant Disallow: / User-agent: True_Robot/1.0 Disallow: / User-agent: True_Robot Disallow: / User-agent: BlowFish/1.0 Disallow: / User-agent: JennyBot Disallow: / User-agent: MIIxpc/4.2 Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: ProPowerBot/2.14 Disallow: / User-agent: BackDoorBot/1.0 Disallow: / User-agent: toCrawl/UrlDispatcher Disallow: / User-agent: suzuran Disallow: / User-agent: TightTwatBot Disallow: / User-agent: VCI WebViewer VCI WebViewer Win32 Disallow: / User-agent: VCI Disallow: / User-agent: Szukacz/1.4 Disallow: / User-agent: Openfind Disallow: / User-agent: Xenu's Link Sleuth 1.1c Disallow: / User-agent: Xenu's Disallow: / User-agent: Zeus Disallow: / User-agent: RepoMonkey Bait & Tackle/v1.01 Disallow: / User-agent: RepoMonkey Disallow: / User-agent: Microsoft URL Control Disallow: / User-agent: Openbot Disallow: / User-agent: URL Control Disallow: / User-agent: Zeus Link Scout Disallow: / User-agent: Zeus 32297 Webster Pro V2.9 Win32 Disallow: / User-agent: Webster Pro Disallow: / User-agent: EroCrawler Disallow: / User-agent: LinkScan/8.1a Unix Disallow: / User-agent: Keyword Density/0.9 Disallow: / User-agent: Kenjin Spider Disallow: / User-agent: Iron33/1.0.2 Disallow: / User-agent: FairAd Client Disallow: / User-agent: Gaisbot Disallow: / User-agent: Aqua_Products Disallow: / User-agent: Radiation Retriever 1.1 Disallow: / User-agent: Flaming AttackBot Disallow: / |
| thanks :)
|
jdMorgan

msg:4144429 | 5:34 pm on May 31, 2010 (gmt 0) | This construct:
User-agent: * Disallow: /uploads/ Disallow: /backup/ Disallow: /cgi-bin/ Disallow: /basket/ Disallow: /cpanel/ Disallow: /dle_config.php Disallow: /admin.php Disallow: /autobackup.php Allow: /
tells bots not to fetch eight specific URL-paths, but then overrides that by telling them to fetch "everything." The end result is that this policy record accomplishes nothing at all. I would suggest leaving out the "Allow" completely as, if I understand your intent, it is not needed. Your file then spends many lines disallowing bad-bots which will not pay any attention to robots.txt. I'd suggest that you monitor all the 'bots in your list, and delete the disallows for the ones that don't obey them anyway. You can and should take care of them in other ways -- such as serving them a 403-Forbidden response using code in .htaccess or in your scripts (Be sure to allow all clients (including bad-bots) to fetch robots.txt itself, and if you use a custom 403 error document, be sure to allow all clients (even bad-bots) to fetch that page, otherwise, you create an "infinite loop" which is NOT good for your server...) You should put your policy records in order from most-specific to least, with specific 'bots listed first, and ending up with the "User-agent: *" record. Be aware that not all 'bots understand "Allow," "Host," "Crawl-delay" and other semi-proprietary directives; Although robots are *supposed to* ignore directives that they do not understand, these semi-proprietary directives should be included only in policy records directed to the robots that do understand them if you want your site's robots.txt implementation to be robust. Jim
|
selenagomez

msg:4144695 | 5:16 am on Jun 1, 2010 (gmt 0) | oh! thanks so i should remove the following line isn't it ?
|
vijayseo

msg:4146815 | 11:05 am on Jun 4, 2010 (gmt 0) | jdMorgan thanks for your reply u have given correct information
|
tangor

msg:4147812 | 8:38 pm on Jun 6, 2010 (gmt 0) | Might I suggest a white list approach? Managing a list of bad bots with disallows they won't honor is a significant use of time. White list the bots allowed and disallow all other bots. Bots I let in is a pretty short list! Then looking at your logs for a few weeks will tell you which non-compliant bots need to be banned via .htaccess I spent two years chasing bad bots and got ulcers. Three years ago switched to white listing and sleep so much better!
|
|
|