Forum Moderators: DixonJones

Message Too Old, No Replies

Webalizer Enhancements

adding cleaner user-agents and search results

         

tnlnyc

1:59 pm on Aug 26, 2003 (gmt 0)

10+ Year Member



I've been hacking around my webalizer.conf file and produced two list that may be of interest:

GroupAgent List:
[tnl.net...]

SearchEngine List:
[tnl.net...]

Hope this will be helpful.

TNL

wkitty42

7:44 pm on Aug 26, 2003 (gmt 0)

10+ Year Member



FWIW:

Xenu is not a spambot... it is a link checker...

wget is an actual program though it may also have a code library component...

all in all, it is a good list/job... i've done similar with the domains and locations... huge list of those... i'll be looking at you other list in a few minutes... may very well be adding your additions to my own webalizer configs... thanks and job well done!

tnlnyc

8:05 pm on Aug 26, 2003 (gmt 0)

10+ Year Member



Thanks for the info on Xenu... I've updated it in my own conf file. I plan to actually stay on top of the list and post updates, probably on a quarterly basis.

TNL

wkitty42

10:03 pm on Aug 26, 2003 (gmt 0)

10+ Year Member



excellent... i merged all of them and the search engine strings with my own configs... now i'm just waiting on webalizer to run and update the pages :)

i'm also grouping sites on the last four dots and then manually coding groupings of them after looking at the actual domain names in the logs... this lets me go so far as to actually break down the geographical locations from some domains...

for example:

GroupSite*.wc.optusnet.com.au
HideSite*.wc.optusnet.com.au
GroupSite*.nsw.bigpond.net.au
HideSite*.nsw.bigpond.net.au
GroupSite*.sa.bigpond.net.au
HideSite*.sa.bigpond.net.au
GroupSite*.qld.bigpond.net.au
HideSite*.qld.bigpond.net.au
GroupSite*.wa.bigpond.net.au
HideSite*.wa.bigpond.net.au

GroupSite*.carolina.rr.com
HideSite*.carolina.rr.com
GroupSite*.cfl.rr.com
HideSite*.cfl.rr.com
GroupSite*.cinci.rr.com
HideSite*.cinci.rr.com
GroupSite*.columbus.rr.com
HideSite*.columbus.rr.com
GroupSite*.cox.rr.com
HideSite*.cox.rr.com
GroupSite*.elp.rr.com
HideSite*.elp.rr.com
GroupSite*.hawaii.rr.com
HideSite*.hawaii.rr.com

the above being a /very/ small example ;)

you'll also note that i use the hidesite option, too... it was a bit of work adding hideagent to all those i merged in from your list... hehehe... oh well... good things take some work, eh? :):)

tnlnyc

10:41 pm on Aug 26, 2003 (gmt 0)

10+ Year Member



Ask and you shall receive... :)

Here's the HideAgent list:

HideAgent rv:1.4
HideAgent 3.01
HideAgent 3.02
HideAgent 4.01
HideAgent 5.0
HideAgent 5.01
HideAgent 5.12
HideAgent 5.13
HideAgent 5.14
HideAgent 5.15
HideAgent 5.16
HideAgent 5.17
HideAgent 5.21
HideAgent 5.22
HideAgent 5.23
HideAgent 5.5
HideAgent 6.0
HideAgent 348NorthNews
HideAgent Alcatel-
HideAgent almaden.ibm.com/cs/crawler
HideAgent AmphetaDesk
HideAgent antibot
HideAgent AppleWebKit
HideAgent [Ask.24x.Info...]
HideAgent ASPseek
HideAgent aspseek
HideAgent augurfind
HideAgent AvantGo
HideAgent Awasu
HideAgent Baiduspider
HideAgent BarraHomeCrawler
HideAgent BBot
HideAgent BFS_method
HideAgent Bilbo
HideAgent Bison
HideAgent Blazer
HideAgent blo.gs
HideAgent BlogBot
HideAgent Blogdigger
HideAgent Blogosphere
HideAgent BlogPulse
HideAgent BlogShares
HideAgent Blogwise
HideAgent boitho.com
HideAgent bookwatch@onfocus.com
HideAgent books@onfocus.com
HideAgent BorderManager
HideAgent brainoff.com/geoblog/
HideAgent www.business-socket.com
HideAgent Camino
HideAgent CE-Preload
HideAgent Check&Get
HideAgent china
HideAgent China
HideAgent CJNetworkQuality
HideAgent cloakBrowser
HideAgent combine
HideAgent COMBINE
HideAgent compatible)
HideAgent CoolBot
HideAgent CoologFeedSpider
HideAgent CopyHunter
HideAgent curl
HideAgent DA
HideAgent danux
HideAgent Dattatec.com-Sitios-Top
HideAgent daypopbot
HideAgent DoCoMo
HideAgent DTS
HideAgent Ecosystem/development
HideAgent EgotoBot
HideAgent Elaine
HideAgent EmailSiphon
HideAgent Ericsson
HideAgent ETS
HideAgent eXactSite
HideAgent Exalead
HideAgent exactseek.com
HideAgent EyeOnSite
HideAgent fantomBrowser
HideAgent fantomCrew
HideAgent FAST
HideAgent Fast
HideAgent FavOrg
HideAgent FeedDemon
HideAgent Feedreader
HideAgent FeedOnFeeds
HideAgent Feedster
HideAgent FeedValidator
HideAgent Fetch
HideAgent Finder
HideAgent FlickBot
HideAgent Franklin
HideAgent Frontier
HideAgent Gaisbot
HideAgent GalaxyBot
HideAgent Genome
HideAgent GetRight
HideAgent Gigabot
HideAgent grub-client
HideAgent Google*
HideAgent gossamer-threads.com
HideAgent htdig
HideAgent HTTrack
HideAgent ia_archiver
HideAgent iaea.org
HideAgent iCab
HideAgent Industry
HideAgent Indy
HideAgent INGRID/3.0
HideAgent InternetSeer
HideAgent internetseer
HideAgent IUFW
HideAgent IUPUI
HideAgent IXE
HideAgent Jakarta
HideAgent janes-blogosphere
HideAgent Java
HideAgent jBrowser
HideAgent jiffe
HideAgent junkbuster
HideAgent k2spider
HideAgent Lachesis
HideAgent lachesis
HideAgent larbin
HideAgent Leknor.com
HideAgent Liberate
HideAgent libwww-perl
HideAgent Lincoln
HideAgent Linkbot
HideAgent LinkHype
HideAgent Links
HideAgent LinksManager.com
HideAgent LinkSweeper
HideAgent LinkWalker
HideAgent LNSpiderguy
HideAgent Lynx*
HideAgent MagpieRSS
HideAgent Microcomputers
HideAgent Missauga
HideAgent Missigua
HideAgent Mitsu
HideAgent mogimogi
HideAgent MOT-
HideAgent Mozilla/3.04
HideAgent Mozilla/3.04Gold
HideAgent Mozilla/4.04
HideAgent Mozilla/4.05
HideAgent Mozilla/4.06
HideAgent Mozilla/4.08
HideAgent Mozilla/4.5
HideAgent Mozilla/4.51
HideAgent Mozilla/4.6
HideAgent Mozilla/4.61
HideAgent Mozilla/4.7
HideAgent Mozilla/4.8
HideAgent MSFrontPage
HideAgent MSNBOT
HideAgent MyHeadlines
HideAgent MyWireServiceBot
HideAgent NationalDirectory
HideAgent NaverRobot
HideAgent NCBrowser
HideAgent Netcraft
HideAgent NetNewsWire
HideAgent NetResearchServer
HideAgent NewsGator
HideAgent Newz
HideAgent NG/1.0
HideAgent NIF
HideAgent NITLE
HideAgent nntp//rss
HideAgent Nokia
HideAgent NPBot
HideAgent NRK-bruker
HideAgent Openbot
HideAgent Opera
HideAgent Oddbot
HideAgent Offline
HideAgent OPWV-SDK
HideAgent Oracle
HideAgent Panasonic
HideAgent PEAR
HideAgent PHILIPS-
HideAgent PHP
HideAgent Pix
HideAgent PocketFeed
HideAgent Pompos
HideAgent Popdexter
HideAgent PostNuke
HideAgent Powermarks
HideAgent psbot
HideAgent Python-urllib
HideAgent QuepasaCreep
HideAgent Radio*
HideAgent Rainbow
HideAgent rdflib
HideAgent Robozilla
HideAgent RPT-HTTPClient
HideAgent SAGEM-
HideAgent SAMSUNG
HideAgent Scrubby
HideAgent SHARP-
HideAgent SideWinder
HideAgent slurp@inktomi.com
HideAgent Scooter
HideAgent searchspider.com
HideAgent SearchSpider.com
HideAgent SEC-
HideAgent semanticdiscovery
HideAgent SIE-
HideAgent SharpReader
HideAgent Shareware
HideAgent SlimBrowser
HideAgent Snoopy
HideAgent SOFTWING_TEAR_AGENT
HideAgent SonyEricsson
HideAgent spider@spider.ilab.sztaki.hu
HideAgent SpiderKU
HideAgent Spinne
HideAgent SmartDownload
HideAgent stealthBrowser
HideAgent Steeler
HideAgent SuperBot
HideAgent SurveyBot
HideAgent Sweeper
HideAgent Syndic8
HideAgent Syndirella
HideAgent Syndigator
HideAgent Tagword
HideAgent Technoratibot
HideAgent Teleport
HideAgent Teoma
HideAgent Teradex
HideAgent Terrar
HideAgent T-H-U-N-D-E-R-S-T-O-N-E
HideAgent timboBot
HideAgent TurnitinBot
HideAgent [tutorgig.com...]
HideAgent UltraLiberalFeedParser
HideAgent Vagabondo
HideAgent verzamelgids
HideAgent VoilaBot
HideAgent W3C_Validator
HideAgent w3m
HideAgent www.walhello.com
HideAgent www.wapsilon.com
HideAgent WebCapture
HideAgent Webclipping
HideAgent WebFilter
HideAgent WebGather
HideAgent WebGo
HideAgent WebRACE
HideAgent websitealert.net
HideAgent WebStripper
HideAgent WebTV
HideAgent WebZIP
HideAgent WEP
HideAgent Wget
HideAgent Wildgrape
HideAgent WinHttp.WinHttpRequest
HideAgent Xenu
HideAgent Zealbot
HideAgent ZyBorg

wkitty42

12:42 am on Aug 27, 2003 (gmt 0)

10+ Year Member



thanks, tnlnyc...

i actually already have them in place... mainly because i groupwhatever and then hidewhatever right after it...

ie:

groupwhatever
hidewhatever

that's the way i saw it done, originally and have just maintained that format...

FWIW: my webalizer is now sitting at 85K in size... i even had to drop back to using the defaults for the lists entries due to webalizer running out of memory for them and not giving me the link to see all the items in the lists ;) it still runs fast as all getout on my machines...

amznVibe

2:11 am on Aug 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I take it these configurations are only available when you have installed webalizer yourself? The servers we use that have cpanel don't seem to have user editable configurations for webalizer. However I tend to like AWSTATS details and interface a bit more over webalizer and we have full control over that.

[edited by: amznVibe at 2:23 am (utc) on Aug. 27, 2003]

wkitty42

2:22 am on Aug 27, 2003 (gmt 0)

10+ Year Member



installed it yourself? i suppose so... dunno, really... i've never used a hosted site or had hosted access... at one time, though, i did have a client who had a hosted site... that site was running a flavor of *nix and we were able to download numerous apps and tools for that version of *nix and install them in their home/~user/bin directory and run them when necessary... we could even schedule cron jobs and such... yes, it was a prompt account... i don't think that there are many of those sold any more... then again, i haven't been in the market for one in a long time...

tnlnyc

1:04 pm on Aug 27, 2003 (gmt 0)

10+ Year Member



Not sure of what the cpanel configuration looks like as I use my own system but you might want to ask your ISP to add that stuff as it will make it better for everyone. I intend to get a clean version of the file up on my own site soon so people can just download it for their own use.

TNL

tnlnyc

1:44 pm on Aug 27, 2003 (gmt 0)

10+ Year Member



I've posted the modified file online for anyone who wants to add this stuff but not really work hard at it:
See

[tnl.net...]

for more info.

TNL