homepage Welcome to WebmasterWorld Guest from 54.211.230.186
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

    
verify the robots.txt
tell me the following robots.txt is correct or does it have any error
selenagomez



 
Msg#: 4144378 posted 3:39 pm on May 31, 2010 (gmt 0)

hii

i had uploaded the following text in robots.txt please check and tell me if there is any mistake in the robots.txt

my site is SEO friendly and i have a awesome SEO . AWESOME! so i am fearing .. plzz help


the code for robots.txt is as follows
User-agent: *
Disallow: /uploads/
Disallow: /backup/
Disallow: /cgi-bin/
Disallow: /basket/
Disallow: /cpanel/
Disallow: /dle_config.php
Disallow: /admin.php
Disallow: /autobackup.php
Allow: /

Host: www.example.com

User-agent: WebZip
Disallow: /

User-agent: larbin
Disallow: /

User-agent: b2w/0.1
Disallow: /

User-agent: Copernic
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: NetMechanic
Disallow: /

User-agent: URL_Spider_Pro
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: LNSpiderguy
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: moget/2.1
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: TheNomad
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: RMA
Disallow: /

User-agent: libWeb/clsHTTP
Disallow: /

User-agent: asterias
Disallow: /

User-agent: httplib
Disallow: /

User-agent: turingos
Disallow: /

User-agent: spanner
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: Harvest/1.5
Disallow: /

User-agent: Bullseye/1.0
Disallow: /

User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /

User-agent: CherryPickerSE/1.0
Disallow: /

User-agent: CherryPickerElite/1.0
Disallow: /

User-agent: WebBandit/3.50
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Microsoft URL Control - 5.01.4511
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: SpankBot
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: lwp-trivial/1.34
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: Microsoft URL Control - 6.00.8169
Disallow: /

User-agent: URLy Warning
Disallow: /

User-agent: Wget/1.6
Disallow: /

User-agent: Wget/1.5.3
Disallow: /

User-agent: Wget
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: moget
Disallow: /

User-agent: hloader
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: Web Image Collector
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: True_Robot/1.0
Disallow: /

User-agent: True_Robot
Disallow: /

User-agent: BlowFish/1.0
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: MIIxpc/4.2
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: ProPowerBot/2.14
Disallow: /

User-agent: BackDoorBot/1.0
Disallow: /

User-agent: toCrawl/UrlDispatcher
Disallow: /

User-agent: suzuran
Disallow: /

User-agent: TightTwatBot
Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /

User-agent: VCI
Disallow: /

User-agent: Szukacz/1.4
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: RepoMonkey
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: URL Control
Disallow: /

User-agent: Zeus Link Scout
Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: LinkScan/8.1a Unix
Disallow: /

User-agent: Keyword Density/0.9
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Iron33/1.0.2
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: Aqua_Products
Disallow: /

User-agent: Radiation Retriever 1.1
Disallow: /

User-agent: Flaming AttackBot
Disallow: /



thanks :)

 

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4144378 posted 5:34 pm on May 31, 2010 (gmt 0)

This construct:

User-agent: *
Disallow: /uploads/
Disallow: /backup/
Disallow: /cgi-bin/
Disallow: /basket/
Disallow: /cpanel/
Disallow: /dle_config.php
Disallow: /admin.php
Disallow: /autobackup.php
Allow: /

tells bots not to fetch eight specific URL-paths, but then overrides that by telling them to fetch "everything." The end result is that this policy record accomplishes nothing at all.

I would suggest leaving out the "Allow" completely as, if I understand your intent, it is not needed.

Your file then spends many lines disallowing bad-bots which will not pay any attention to robots.txt. I'd suggest that you monitor all the 'bots in your list, and delete the disallows for the ones that don't obey them anyway. You can and should take care of them in other ways -- such as serving them a 403-Forbidden response using code in .htaccess or in your scripts (Be sure to allow all clients (including bad-bots) to fetch robots.txt itself, and if you use a custom 403 error document, be sure to allow all clients (even bad-bots) to fetch that page, otherwise, you create an "infinite loop" which is NOT good for your server...)

You should put your policy records in order from most-specific to least, with specific 'bots listed first, and ending up with the "User-agent: *" record.

Be aware that not all 'bots understand "Allow," "Host," "Crawl-delay" and other semi-proprietary directives; Although robots are *supposed to* ignore directives that they do not understand, these semi-proprietary directives should be included only in policy records directed to the robots that do understand them if you want your site's robots.txt implementation to be robust.

Jim

selenagomez



 
Msg#: 4144378 posted 5:16 am on Jun 1, 2010 (gmt 0)

oh!
thanks
so i should remove the following line
Allow: /

isn't it ?

vijayseo



 
Msg#: 4144378 posted 11:05 am on Jun 4, 2010 (gmt 0)

jdMorgan thanks for your reply u have given correct information

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4144378 posted 8:38 pm on Jun 6, 2010 (gmt 0)

Might I suggest a white list approach? Managing a list of bad bots with disallows they won't honor is a significant use of time. White list the bots allowed and disallow all other bots. Bots I let in is a pretty short list! Then looking at your logs for a few weeks will tell you which non-compliant bots need to be banned via .htaccess

I spent two years chasing bad bots and got ulcers. Three years ago switched to white listing and sleep so much better!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved