Forum Moderators: open
If any one knows a good thread on how to implement UA cloaking and IP Cloaking especially for SID removal, I really appreciate. Please point me to the right thread...I could find what I wanted from WebmasterWorld searches...
Thanks,
I have a database of spider IPs and have a PHP script at the top of every page that checks the IP of every request and only starts a session if its not a spider.
I've been monitoring user agents for over a year now and I'm also storing all IP's that have done a request with a certain UserAgent.
So I know exactly from wich IP's GoogleBot is requesting pages. Same for the Yahoo Crawler.
Thing is, I'm STILL seeing *new* IP's from the mayor Search Engine's (Google, Yahoo, MSN).
So filtering based on IP is not entirely fool-proof; where filtering based on UserAgent is.
One of the ways I guess a if a visitor is a spider or not is to check if there is a referring URL; if not I assume it to be a spider. I think this is still right rather than wrong most of the time, but I'm interested in other opinions!
Also, in my JSP pages I carefully avoid forcing the creation of a session unless one is actually needed, eg because the user has selected a display language (i18n) that I could not deduce from their browser settings, etc, and thus need to carry in a session (I don't use permanent cookies).
Rgds
Damon
a lot of users will come in without a referer (from their bookmarks or a simple type-in from the mind).
a lot of crawlers will come in WITH a referer telling you where the crawler found the url. This will not allways be the case, but I do see crawlers come by with referers all the time.
OK, thanks for that... What is a better *simple* way that does not rely on checking against long hand-crafted lists of (forgeable) UA strings (or IP addresses)?
As I say, I don't need 100% accuracy; better than 50% is probably good enough and an ordinary user's experience is not hurt if their browser does not provide a referring URL (eg for security reasons or because the URL was typed in, etc, etc).
Rgds
Damon
You will need a list of those, yes, and you will have to manage that list to some extent.
You can create such a list simply by logging what UA's do a request to your server.
Should be fairly simple to pick the crawlers out of that list :).
I think you will have all the mayor crawlers identified within a month. I did when I started with this.
So a bit of work on the start to log the stuff;
After that it's an occasional check if there are new crawlers, but that doesn't take much time :).
Is there any other UAs that we need to take care of? We sell internationally as well as mainly for US market. Any suggestions?