| 11:26 am on Dec 24, 2003 (gmt 0)|
If you use that method then you'll be mis-classifying quite a few user agents. Most browsers (including MSIE) have "Mozilla" in the UA, but so do many spiders, including: Ask Jeeves/Teoma; grub-client; ZyBorg; and Slurp.
A more effective approach might be to look for the platform (ie. "Linux", "Mac", "Win", etc) to indicate a 'Human'.
I think for Windows there is a public "BROWSCAP.INI" file that people are using to filter traffic on their sites - try searching for reference to it here or on Google.
| 2:07 pm on Dec 24, 2003 (gmt 0)|
So are you saying that spiders donít identify a platform?
Or are there identified platforms something other than the standard "Linux", "Mac", "Win", etc...?
| 2:29 pm on Dec 24, 2003 (gmt 0)|
A user agent (UA) is just a string of text - how you interpret it is up to you. Most web browsers however will identify one of those platforms in the UA. A quick inspection of our logs shows that you might want to add the following to that list: "FreeBSD", "WebTV" and maybe "Lynx" as it doesn't appear to list a platform.
Others might want to double-check this but I think you could get 95% or even 99% accuracy using such a system.
| 4:26 pm on Dec 24, 2003 (gmt 0)|
O.k. dcrombie I did a quick search of my server logs and was not able to find any quality indexing bots such as
(Google, AltaVista, Inktomi ectÖ) that identified a platform so I would agree that your approach is the better one.
I also got "BROWSCAP.INI" working looks like an easy solution for detection but it a VERY large file so I have concerns about its speed.
Iíll keep working on it and hopefully end up with a quality function.
Just a little background info:
The site being built is part of an industry that is very dishonest and thereís a lot of copycat behavior.
So this function will load a ďRealĒ set of MetaTags for search engine bots and other non-bot visitors will see a generic set of tags.
The only critical design goal is 100% of all Major indexing spiders must be detected and feed ďRealĒ MetaTags.
I would consider the major spiders to be:
All the Web
| 5:39 pm on Dec 24, 2003 (gmt 0)|
Wow are you still seeing these in yours logs, I have not seen these for well "nearly 2 years"
| 6:42 pm on Dec 24, 2003 (gmt 0)|
My search of the log files shows (Googlebot,Slurp,Fast-Webcrawler,ia_archiver,Infoseek)
as repeat visitors.
My list of major spiders was off the top off my head and definitely needs to be refined.
Iíve never cared much about search page rankings or indexing bots and Iím finding
it to be a very complicated and controversial subject to learn.
as always input is most welcome
| 7:19 pm on Dec 24, 2003 (gmt 0)|
Robots will typically not stay on a page very long.
A human needs to read the page and digest its contents before moving to a new page.
When robots visit my website they stay less than 5 seconds per page.
Humans average 5 seconds - 90 seconds per page.
Robots are well "robotic"; they can "visit" the same page several times in one second.
Also a human will not click on invisible links.
I typically pepper my page with invisible links.
Many of them are counters and other tools.