|Looksmart using spider with no user-agent for MSN?|
A Looksmart spider came that had an IE user-agent...
| 12:25 am on Oct 17, 2003 (gmt 0)|
One of our sites started getting referrals from MSN with session-IDs attached to the URL (i.e. [xyz.com...]
(I changed the actual ID for publication here)
Well, that seemed odd, because we filter requests by user-agent (Googlbot, Slurp, bot, etc) and don't generate session-IDs for spiders. So I went through the log to find the first time that this SessionID had shown up, as this would be the time it was generated. To my surprise, I found:
sv-fw.looksmart.com - - [08/Oct/2003:13:53:28 -0400] "GET /index.php?session_id=2106002191ad2fee7a94178dbb33deac HTTP/1.1" 200 31238 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; YComp 126.96.36.199)"
So our script had generated a sessionID for Looksmart because it didn't have a user-agent that looked like a spider. It looks like a user.
1. Has anyone else noticed this?
2. I suppose this could be to avoid cloaking, but there are legitimate reasons to be detecting spiders (such as mine).
3. Is the only alternative to try and filter by IP addresses of known spiders... which will be quite a headache to keep up with...?
(in case anyone is curious, I found that T312461 is an IE security update)
| 1:23 am on Oct 17, 2003 (gmt 0)|
I think this is a real person looking at your web page and then entering it into the directory (wrongly) with the session ID still attached. I don't know if that session ID was the exact one or if it had been edited, but searches for that string on search.msn.com and looksmart.com don't resolve, so I assume the problem's been fixed (probably through editorial review).
| 4:41 am on Oct 17, 2003 (gmt 0)|
As I mentioned, I changed the actual session id before posting here on WebmasterWorld... although it makes little difference. If I search for the actual session ID string, it returns nothing. If I search for some actual keywords, the resulting link DOES still include the session ID.
Are they really finding sites by person and inserting into the directory? Seems terribly inefficient.
| 4:54 am on Oct 17, 2003 (gmt 0)|
|Are they really finding sites by person and inserting into the directory? Seems terribly inefficient. |
Yes they are, and yes it may seem inefficient.
A page of mine was added into the directory that way. I didn't submit the location or anything. A LookSmart editor added it. No complaints here.
| 7:18 am on Oct 17, 2003 (gmt 0)|
Looksmart has both an algorithmic search platfrom (Wisenut) and a search tuned by people (Zeal/Directory). The directory is where folks would have been adding links by hand.
| 9:37 pm on Oct 17, 2003 (gmt 0)|
The plot thickens.
Today, inktomi spidered that same session_id.
Could inktomi have followed a result from the msn results page?
also, I'm not all to happy about engines putting a session_id in their SERP pages...
| 3:31 am on Oct 18, 2003 (gmt 0)|
I guess the thing to do is to find out where the source of the link is.
Put the title of the page you're looking for in MSN Search. What heading do you get in the results? Do you get "SPONSORED SITES", "WEB DIRECTORY SITES", or "WEB PAGES"?
| 4:03 am on Oct 18, 2003 (gmt 0)|
Web Directory Sites.
| 5:20 am on Oct 18, 2003 (gmt 0)|
I'd do three things: First, contact Looksmart directory and request that the link be corrected. Second, temporarily patch your scipt to recognize and ignore that particular session ID, so you can assign a new one to real visitors that follow one of these messed-up links. Third, temporarily patch your script to check the requestor's IP address against Looksmart's IP block, and don't assign sessions for those requests. You could check the sv-fw.looksmart.com hostname instead, but that is slower and may not work all the time.
You can remove the first patch when they fix the link, and the second patch when you are sure this was a human editor error, and not some new project robot.
Really messy problem... This is a good example of why cloaking is so tough!
| 5:33 am on Oct 18, 2003 (gmt 0)|
I'm guessing it's in the Zeal directory. All you need to do is edit the page's profile. To do that, you'll either have to be a Zeal Contributor or ask someone else who is a Contributor or higher to edit it for you.
I'd recommend becoming a Zeal Contributor. It's not really that hard. All you have to do is register, then pass the Member Quiz. If you do it, then you get the points for the edit, plus you get some experience with the Zeal/LookSmart way of doing things.
I could do the edit for you, but I think it would be better in the long run for you to be a member so that if the same thing happens again, you'll already know what to do.