Forum Moderators: DixonJones
I've essentially been running a customized version of one of the big.cfg files, but realize there's a lot more power to this software. How do you tweak yours?
You don't use that extensive SearchQuery.txt from [science.co.il...] to give you the latest listings of SEs? It seems to be updated rather regularly.
FILEALIAS /pageone/ /pageone/index.htm
Now all requests for /pageone/ are listed with /pageone/index.htm
Another one I find handy is BROWALIAS. I have added the following to get these bots to show up in the Browser Summary.
BROWALIAS *slurp@inktomi.com* Slurp
BROWALIAS *Jeeves* Ask_Jeeves
BROWALIAS *T-H-U-N-D-E-R-S-T-O-N-E* Thunderstone
BROWALIAS *ZyBorg@WISEnutbot.com* ZyBorg
While it works and is nice it also lists the different slurp bots just as slurp in the Browser Summary. I am now doing something I like better. First thing that needs to be done is adding # before BROWALIAS *slurp@inktomi.com* Slurp or you can just delete that line altogether.
Then you can add this:
BROWALIAS *Slurp/cat* Slurp/cat
BROWALIAS *Slurp/si* Slurp/si
BROWALIAS *Slurp.so/1.0* Slurp/so
BROWALIAS *Slurp/2.0-KiteWeekly* Slurp/KiteWeekly
BROWALIAS *Slurp/2.0-MakoCrawl* Slurp/MakoCrawl
The five above I have tested and know they work, the next three have not been tested but should work:
BROWALIAS *Slurp/2.0-KiteHourly* Slurp/KiteHourly
BROWALIAS *Slurp/2.0-Boobook* Slurp/Boobook
BROWALIAS *Slurp/2.0-GreatWhiteCrawl* Slurp/GreatWhiteCrawl
Now the next time you run analog you will know which slurp bot hit the site, as they will not all just be listed as slurp.
We have some .dat files on the site and in the File Type Report they had no name given to them so I added the following:
TYPEALIAS .dat ".dat [Data]"
We can also change what the files are called by changing the text [in these brackets]. We can also line up the description given by lining up the [ ] in the analog.cfg
REFSITEALIAS [translate.google.*...] [google.com...]
REFSITEALIAS [images.google.*...] [google.com...]
REFSITEALIAS [directory.google.*...] [google.com...]
These all sucessfully change to [google.com...] but now there are three lines, I want just one total.