Forum Moderators: DixonJones
If you don't have access to your logs, go out and get it!
192.168.2.1 - - [23/Feb/2005:11:22:24 +0100] "GET /phpMyAdmin HTTP/1.0" 401 468 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0"
Often, the server only keeps the last 7 to 20 days worth of results. Your host may have archived the older ones. If you need those, ask your host for them.
There is usually one log file per day, and each line in the file records the date and time that someone accessed your website, their IP address and user agent, what file they asked for, the status of that request, and so on.
If someone accessed a page of your site, and that page contained 3 images, then there would be 4 entries in the log file for that action. The log file can be very big for a large traffic site.
An analysis program can read the data from that file and then produce statistics for you: such things as number of unique users, number of pages accessed, users per country or ISP, and many other things.
I searched the net for some ASP code to do this and came up with something called XAgent which seems to do the job but their site has either been hijacked or is now under construction so I haven't been able to find any example code yet.
[msdn.microsoft.com...]
To physically block spiders or IPs you need to set file and directory permissions. One way to do this on Apache is to set these up in the .htaccess file. I assume that IIS boxes have some sort of equivalent functionality.
Thanks for the feedback but I am soooooo lost! Sorry, all you have explained seems like a foreign language to me. I am new at all of this so I apologize for my ignorance. However, I really do need help in creating an effective robots.txt or to prevent those I do not want indexing or searching my pages. I am not using apache and in all honesty my provider is giving me windows based services.
***********************
The robots.txt file should be created in Unix line ender mode! Most good text editors will have a Unix mode or your FTP client *should* do the conversion for you. Do not attempt to use an HTML editor that does not specifically have a text mode to create a robots.txt file.
***********************
*************************
# Robots.txt file from [searchengineworld.com...]
#
# Built from text file
[info.webcrawler.com...]
#
# This restricts access to only known and registered robots.
#
User-agent: Mozilla/3.0 (compatible;miner;mailto:miner@miner.com.br)
Disallow:
User-agent: WebFerret
Disallow:
User-agent: Due to a deficiency in Java it's not currently possible
to set the User-agent.
Disallow:
User-agent: no
Disallow:
User-agent: 'Ahoy! The Homepage Finder'
Disallow:
User-agent: Arachnophilia
Disallow:
User-agent: ArchitextSpider
Disallow:
User-agent: ASpider/0.09
Disallow:
User-agent: AURESYS/1.0
Disallow:
User-agent: BackRub/*.*
Disallow:
User-agent: Big Brother
Disallow:
User-agent: BlackWidow
Disallow:
User-agent: BSpider/1.0 libwww-perl/0.40
Disallow:
User-agent: CACTVS Chemistry Spider
Disallow:
User-agent: Digimarc CGIReader/1.0
Disallow:
User-agent: Checkbot/x.xx LWP/5.x
Disallow:
User-agent: CMC/0.01
Disallow:
User-agent: combine/0.0
Disallow:
User-agent: conceptbot/0.3
Disallow:
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow:
User-agent: root/0.1
Disallow:
User-agent: CS-HKUST-IndexServer/1.0
Disallow:
User-agent: CyberSpyder/2.1
Disallow:
User-agent: Deweb/1.01
Disallow:
User-agent: DragonBot/1.0 libwww/5.0
Disallow:
User-agent: EIT-Link-Verifier-Robot/0.2
Disallow:
User-agent: Emacs-w3/v[0-9\.]+
Disallow:
User-agent: EmailSiphon
Disallow:
User-agent: EMC Spider
Disallow:
User-agent: explorersearch
Disallow:
User-agent: Explorer
Disallow:
User-agent: ExtractorPro
Disallow:
User-agent: FelixIDE/1.0
Disallow:
User-agent: Hazel's Ferret Web hopper,
Disallow:
User-agent: ESIRover v1.0
Disallow:
User-agent: fido/0.9 Harvest/1.4.pl2
Disallow:
User-agent: Hämähäkki/0.2
Disallow:
User-agent: KIT-Fireball/2.0 libwww/5.0a
Disallow:
User-agent: Fish-Search-Robot
Disallow:
User-agent: Mozilla/2.0 (compatible fouineur v2.0;
fouineur.9bit.qc.ca)
Disallow:
User-agent: Robot du CRIM 1.0a
Disallow:
User-agent: Freecrawl
Disallow:
User-agent: FunnelWeb-1.0
Disallow:
User-agent: gcreep/1.0
Disallow:
User-agent:?
Disallow:
User-agent: GetURL.rexx v1.05
Disallow:
User-agent: Golem/1.1
Disallow:
User-agent: Gromit/1.0
Disallow:
User-agent: Gulliver/1.1
Disallow:
User-agent: yes
Disallow:
User-agent: AITCSRobot/1.1
Disallow:
User-agent: wired-digital-newsbot/1.5
Disallow:
User-agent: htdig/3.0b3
Disallow:
User-agent: HTMLgobble v2.2
Disallow:
User-agent: no
Disallow:
User-agent: IBM_Planetwide,
Disallow:
User-agent: gestaltIconoclast/1.0 libwww-FM/2.17
Disallow:
User-agent: INGRID/0.1
Disallow:
User-agent: IncyWincy/1.0b1
Disallow:
User-agent: Informant
Disallow:
User-agent: InfoSeek Robot 1.0
Disallow:
User-agent: Infoseek Sidewinder
Disallow:
User-agent: InfoSpiders/0.1
Disallow:
User-agent: inspectorwww/1.0
[greenpac.com...]
Disallow:
User-agent: 'IAGENT/1.0'
Disallow:
User-agent: IsraeliSearch/1.0
Disallow:
User-agent: JCrawler/0.2
Disallow:
User-agent: Jeeves v0.05alpha (PERL, LWP, lglb@doc.ic.ac.uk)
Disallow:
User-agent: Jobot/0.1alpha libwww-perl/4.0
Disallow:
User-agent: JoeBot,
Disallow:
User-agent: JubiiRobot
Disallow:
User-agent: jumpstation
Disallow:
User-agent: Katipo/1.0
Disallow:
User-agent: KDD-Explorer/0.1
Disallow:
User-agent: KO_Yappo_Robot/1.0.4(http://yappo.com/info/robot.html)
Disallow:
User-agent: LabelGrab/1.1
Disallow:
User-agent: LinkWalker
Disallow:
User-agent: logo.gif crawler
Disallow:
User-agent: Lycos/x.x
Disallow:
User-agent: Lycos_Spider_(T-Rex)
Disallow:
User-agent: Magpie/1.0
Disallow:
User-agent: MediaFox/x.y
Disallow:
User-agent: MerzScope
Disallow:
User-agent: NEC-MeshExplorer
Disallow:
User-agent: MOMspider/1.00 libwww-perl/0.40
Disallow:
User-agent: Monster/vX.X.X -$TYPE ($OSTYPE)
Disallow:
User-agent: Motor/0.2
Disallow:
User-agent: MuscatFerret
Disallow:
User-agent: MwdSearch/0.1
Disallow:
User-agent: NetCarta CyberPilot Pro
Disallow:
User-agent: NetMechanic
Disallow:
User-agent: NetScoop/1.0 libwww/5.0a
Disallow:
User-agent: NHSEWalker/3.0
Disallow:
User-agent: Nomad-V2.x
Disallow:
User-agent: NorthStar
Disallow:
User-agent: Occam/1.0
Disallow:
User-agent: HKU WWW Robot,
Disallow:
User-agent: Orbsearch/1.0
Disallow:
User-agent: PackRat/1.0
Disallow:
User-agent: Patric/0.01a
Disallow:
User-agent: Peregrinator-Mathematics/0.7
Disallow:
User-agent: Duppies
Disallow:
User-agent: Pioneer
Disallow:
User-agent: PGP-KA/1.2
Disallow:
User-agent: Resume Robot
Disallow:
User-agent: Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl)
Disallow:
User-agent: Robbie/0.1
Disallow:
User-agent: ComputingSite Robi/1.0 (robi@computingsite.com)
Disallow:
User-agent: Roverbot
Disallow:
User-agent: SafetyNet Robot 0.1,
Disallow:
User-agent: Scooter/1.0
Disallow:
User-agent: not available
Disallow:
User-agent: Senrigan/#*$!xxx
Disallow:
User-agent: SG-Scout
Disallow:
User-agent: Shai'Hulud
Disallow:
User-agent: SimBot/1.0
Disallow:
User-agent: Open Text Site Crawler V1.0
Disallow:
User-agent: SiteTech-Rover
Disallow:
User-agent: Slurp/2.0
Disallow:
User-agent: ESISmartSpider/2.0
Disallow:
User-agent: Snooper/b97_01
Disallow:
User-agent: Solbot/1.0 LWP/5.07
Disallow:
User-agent: Spanner/1.0 (Linux 2.0.27 i586)
Disallow:
User-agent: no
Disallow:
User-agent: Mozilla/3.0 (Black Widow v1.1.0; Linux 2.0.27; Dec 31
1997 12:25:00
Disallow:
User-agent: Tarantula/1.0
Disallow:
User-agent: tarspider
Disallow:
User-agent: dlw3robot/x.y (in TclX by [hplyot.obspm.fr...]
Disallow:
User-agent: Templeton/
Disallow:
User-agent: TitIn/0.2
Disallow:
User-agent: TITAN/0.1
Disallow:
User-agent: UCSD-Crawler
Disallow:
User-agent: urlck/1.2.3
Disallow:
User-agent: Valkyrie/1.0 libwww-perl/0.40
Disallow:
User-agent: Victoria/1.0
Disallow:
User-agent: vision-search/3.0'
Disallow:
User-agent: VWbot_K/4.2
Disallow:
User-agent: w3index
Disallow:
User-agent: W3M2/x.xxx
Disallow:
User-agent: WWWWanderer v3.0
Disallow:
User-agent: WebCopy/
Disallow:
User-agent: WebCrawler/3.0 Robot libwww/5.0a
Disallow:
User-agent: WebFetcher/0.8,
Disallow:
User-agent: weblayers/0.0
Disallow:
User-agent: WebLinker/0.0 libwww-perl/0.1
Disallow:
User-agent: no
Disallow:
User-agent: WebMoose/0.0.0000
Disallow:
User-agent: Digimarc WebReader/1.2
Disallow:
User-agent: webs@recruit.co.jp
Disallow:
User-agent: webvac/1.0
Disallow:
User-agent: webwalk
Disallow:
User-agent: WebWalker/1.10
Disallow:
User-agent: WebWatch
Disallow:
User-agent: Wget/1.4.0
Disallow:
User-agent: w3mir
Disallow:
User-agent: no
Disallow:
User-agent: WWWC/0.25 (Win95)
Disallow:
User-agent: none
Disallow:
User-agent: XGET/0.7
Disallow:
User-agent: Nederland.zoek
Disallow:
User-agent: BizBot04 kirk.overleaf.com
Disallow:
User-agent: HappyBot (gserver.kw.net)
Disallow:
User-agent: CaliforniaBrownSpider
Disallow:
User-agent: EI*Net/0.1 libwww/0.1
Disallow:
User-agent: Ibot/1.0 libwww-perl/0.40
Disallow:
User-agent: Merritt/1.0
Disallow:
User-agent: StatFetcher/1.0
Disallow:
User-agent: TeacherSoft/1.0 libwww/2.17
Disallow:
User-agent: WWW Collector
Disallow:
User-agent: processor/0.0ALPHA libwww-perl/0.20
Disallow:
User-agent: wobot/1.0 from 206.214.202.45
Disallow:
User-agent: Libertech-Rover www.libertech.com?
Disallow:
User-agent: WhoWhere Robot
Disallow:
User-agent: ITI Spider
Disallow:
User-agent: w3index
Disallow:
User-agent: MyCNNSpider
Disallow:
User-agent: SummyCrawler
Disallow:
User-agent: OGspider
Disallow:
User-agent: linklooker
Disallow:
User-agent: CyberSpyder (amant@www.cyberspyder.com)
Disallow:
User-agent: SlowBot
Disallow:
User-agent: heraSpider
Disallow:
User-agent: Surfbot
Disallow:
User-agent: Bizbot003
Disallow:
User-agent: WebWalker
Disallow:
User-agent: SandBot
Disallow:
User-agent: EnigmaBot
Disallow:
User-agent: spyder3.microsys.com
Disallow:
User-agent: www.freeloader.com.
Disallow:
User-agent: Googlebot
Disallow:
User-agent: METAGOPHER
Disallow:
User-agent: *
Disallow: /
*************************