User Agent or IP Cloaking? Which one to use?

Forum Moderators: open

Message Too Old, No Replies

User Agent or IP Cloaking? Which one to use?

We use session id on the URL

itisgene

8:24 pm on Nov 11, 2004 (gmt 0)

Ok, our site uses session IDs on the URL and I recommeded removing it after detecting the User Agents. Is it a good idea just to use the User Agent for this purpose or do we need to use IP Cloaking for this? We don't want to present any different contents to users and bots. We just want to get rid of the SIDs if the visitors are bots so that SEs don't list our product pages 200-300 times... with different SIDs.

If any one knows a good thread on how to implement UA cloaking and IP Cloaking especially for SID removal, I really appreciate. Please point me to the right thread...I could find what I wanted from WebmasterWorld searches...

Thanks,

phpmaven

11:54 pm on Nov 11, 2004 (gmt 0)

I actually just recently changed my setup to check for IP address. I was just looking at the user agent and ended up getting a bunch of urls with PHPSESSIDs on them into some of the engines.

I have a database of spider IPs and have a PHP script at the top of every page that checks the IP of every request and only starts a session if its not a spider.

willis1480

2:29 am on Nov 12, 2004 (gmt 0)

thats a good idea, never thought of spiders eatin up my badwidth going to products pages and such.

rfung

8:04 pm on Dec 19, 2004 (gmt 0)

phpmaven,

care to share the script/database, or, where can I find them?

thanks

volatilegx

6:05 pm on Dec 20, 2004 (gmt 0)

To answer the first question, I believe User-Agent cloaking is sufficient for this purpose. I'm not aware of any existing threads for session ID removal based on User Agent. Any members doing this are invited to start a thread :)

DoppyNL

2:49 pm on Dec 31, 2004 (gmt 0)

I would do it based on User Agent.

I've been monitoring user agents for over a year now and I'm also storing all IP's that have done a request with a certain UserAgent.
So I know exactly from wich IP's GoogleBot is requesting pages. Same for the Yahoo Crawler.

Thing is, I'm STILL seeing *new* IP's from the mayor Search Engine's (Google, Yahoo, MSN).
So filtering based on IP is not entirely fool-proof; where filtering based on UserAgent is.

TheVisitor

1:16 am on Jan 2, 2005 (gmt 0)

What if they're faking the Agent?

bakedjake

2:08 am on Jan 2, 2005 (gmt 0)

What if they're faking the Agent?

Who cares if you're just trying to hide session IDs?

Agree - UA cloaking is fine.

DamonHD

12:34 pm on Jan 4, 2005 (gmt 0)

Hi,

One of the ways I guess a if a visitor is a spider or not is to check if there is a referring URL; if not I assume it to be a spider. I think this is still right rather than wrong most of the time, but I'm interested in other opinions!

Also, in my JSP pages I carefully avoid forcing the creation of a session unless one is actually needed, eg because the user has selected a display language (i18n) that I could not deduce from their browser settings, etc, and thus need to carry in a session (I don't use permanent cookies).

Rgds

Damon

DoppyNL

12:50 pm on Jan 4, 2005 (gmt 0)

checking for a referer isn't very smart:

a lot of users will come in without a referer (from their bookmarks or a simple type-in from the mind).

a lot of crawlers will come in WITH a referer telling you where the crawler found the url. This will not allways be the case, but I do see crawlers come by with referers all the time.

DamonHD

1:27 pm on Jan 4, 2005 (gmt 0)

Hi,

OK, thanks for that... What is a better *simple* way that does not rely on checking against long hand-crafted lists of (forgeable) UA strings (or IP addresses)?

As I say, I don't need 100% accuracy; better than 50% is probably good enough and an ordinary user's experience is not hurt if their browser does not provide a referring URL (eg for security reasons or because the URL was typed in, etc, etc).

Rgds

Damon

DoppyNL

1:41 pm on Jan 4, 2005 (gmt 0)

Best way would be UA string, as that is what it's there for :).

You will need a list of those, yes, and you will have to manage that list to some extent.
You can create such a list simply by logging what UA's do a request to your server.
Should be fairly simple to pick the crawlers out of that list :).

I think you will have all the mayor crawlers identified within a month. I did when I started with this.

So a bit of work on the start to log the stuff;
After that it's an occasional check if there are new crawlers, but that doesn't take much time :).

DamonHD

2:09 pm on Jan 4, 2005 (gmt 0)

Thanks again.

I shall adapt my algorithm and collect some UAs...

Rgds

Damon

itisgene

6:32 pm on Jan 25, 2005 (gmt 0)

Ok, i started this thread last November and haven't posted since. We decided to implement UA detection to get rid of the session ids. We are planning to detect GoogleBot, (Yahoo) Slurp, and MSNbot only at this time using the keyword "Googlebot", "Slurp", and "MSNbot" from the UA string to capture any variation. Do you see any problem of doing it?

Is there any other UAs that we need to take care of? We sell internationally as well as mainly for US market. Any suggestions?