Forum Moderators: open

Message Too Old, No Replies

AOL spider identification

         

Tonearm

10:06 pm on Dec 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello, my shopping cart allows me to specify strings and when any of them arecontained in a visiting User Agent, session data is kept out of the URL. Does AOL have anything particular in its UA that would allow me to use this for it? Is there anything particular in the hostname?

jdMorgan

1:15 am on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



tonearm,

Welcome to WebmasterWorld [webmasterworld.com]!

I get a few requests that contain the REMOTE_HOST "spider-aannn.proxy.aol.com" and similar, but I don't see anything unique about the USER_AGENT.

spider-th061.proxy.aol.com - - [14/May/2002:21:15:40 -0400] "GET /images/icra_sw.gif HTTP/1.0" 200 844 "http://www.mydomain.com/" "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000 6.0; Windows NT 5.0)"
spider-th061.proxy.aol.com - - [14/May/2002:21:15:40 -0400] "GET /images/my_logo.gif HTTP/1.0" 200 2815 "http://www.mydomain.com/" "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000 6.0; Windows NT 5.0)"
cache-rq06.proxy.aol.com - - [14/May/2002:21:15:28 -0400] "GET / HTTP/1.0" 200 - "-" "Mozilla/4.0 (compatible; MSIE 5.5; CS 2000 6.0; Windows NT 5.0)"

If you don't want spiders in your shopping cart, you should block them with your robots.txt file.

Jim

[edited by: littleman at 5:07 am (utc) on Dec. 30, 2002]

Tonearm

3:10 am on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, I need to identify them so I can keep session information out of their URL so they can traverse the links properly. Thanks for the info, jdMorgan. Maybe I'll go with something like "spider*aol" for identification via the host? What do you think about that?

jdMorgan

3:43 am on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



tonearm,

I don't know the syntax of your "filter", but spider(something)aol.com would be a good candidate.

Jim