Forum Moderators: DixonJones
The site I'm working on is a UK-focused car sales site. The vast majority of the traffic comes from the UK, and a sizeable chunk from search engines.
We noticed that over the past week, a high amount of 'direct traffic' started to come to the site. Looking through the stats, they all originated from one source - Limelight Networks in Tempe, Arizona. Until a week ago, they'd never visited the site before.
In Google Analytics, it's showing each of the visits as a separate visitor (rather than the same visitor viewing multiple pages). They aren't focusing on any particular pages, but are acting in the way I'd expect a search-engine spider to act (possibly). They aren't visiting the same page multiple times, they're going to new pages each time, and spending only a second on these pages.
It's not caused us any issues in terms of load on the server, so we're not unduly worried. However, I'd be interested to hear if anyone else has had them suddenly come to your site and act in the same way? I've done a bit of looking around on Google, and did find one other mention of them doing the same thing on another site back in 2006.
Hi Kinboshi,Is there a specific User Agent associated with these requests?
Are visitors coming from the same IP Address?
Do these visitors download the images as well as JavaScript/CSS files?
I'm just getting access to our log files now. Unfortunately, Google Analytics doesn't provide IP addresses of visitors or details of exactly what they are doing.
Hopefully the log files will tell me a bit more.
Domain Name llnw.net? (Network)
IP Address 208.111.154.# (Limelight Networks, LLC)
ISP Limelight Networks, LLC
Location
Continent : North America
Country : United States (Facts)
State : Arizona
City : Tempe
Lat/Long : 33.4357, -111.9171 (Map)
Language English (U.S.)
en-us
Operating System Linux UNIX
Browser Mozilla 1.8.1.11
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.11) Gecko/20080109
Javascript version 1.5
Monitor
Resolution : 1300 x 1300
Color Depth : 16 bits
I have written to Lime Light asking for an explanation although I'm not holding my breath for a response.
Anyone know anything about them?
One said it could be related to the banner ads on my site that might be delivered by their CDN (which I guess is Content Delivery Network).
Someone else from Limelight also responded, and they said that the IPs related to Searchme, who are a client of Limelight and it could be their spider.
They did say that they don't want to cause any problems with anyone's server, so if we wanted we could be added to their no-crawl list. If it's a new search engine that's going to be launched then we're quite happy to be spidered, indexed and ranked. It is interesting though that the spider doesn't appear to be acting like other search-engine spiders.
very wired spider behaviour....it requests java scripts, images css the lot....behaves like a browser and inflates impressions on a number of ad networks currently running over these sites...no increase in click activity though...impressions only.
Seems to be all over the place and has been showing an increase in activity over the last few days.
Limelight Networks Inc.
Reverse dns:
v21.nat.svl.kavam.net
v18.nat.svl.kavam.net
v20.nat.svl.kavam.net ......etc. all over my logs
does not smell too good.
[edited by: Web_speed at 12:45 pm (utc) on Feb. 3, 2008]
<snip>
The traffic is from a Searchme robot...these are not human visitors.
For some odd reason, they are triggering javascript web traffic tracking code, which normal spiders don't do. Please do read my article and let me know if this is what you all are seeing too.
[edited by: engine at 9:37 am (utc) on Feb. 5, 2008]
[edit reason] No urls, thanks. See TOS [webmasterworld.com] [/edit]
Yes exactly what i am experiencing.
Update:
This bot continues to hammer my sites heavily. This is no normal crawler...it acts like a browser and executes javascript code, not just URLs in the code but the entire code just like a browser would.
I'm itching to completely block this thing on all fronts...thoughts anyone?
For the foreseeable future you'll probably get more traffic from the spider itself than what you get from the search engine (which is not even live yet). I'd block it if I was in your shoes.
Thanks. Makes a lot of sense. I decided to just block the darn thing.
In case anyone is intrested here is the .htaccess code:
<Limit GET HEAD POST>
order allow,deny
##--> Bye bye Limelight Networks. You are not welcome here.
deny from 208.111.154
allow from all
</LIMIT>
[edited by: Web_speed at 9:40 am (utc) on Feb. 5, 2008]
208.111.154.16
208.111.154.189
208.111.154.15
208.111.154.67
208.111.154.193
208.111.154.66
208.111.154.182
208.111.154.21
208.111.154.183
208.111.154.184
208.111.154.65
208.111.154.68
208.111.154.199
208.111.154.188
208.111.154.186
208.111.154.195
208.111.154.197
208.111.154.200
208.111.154.62
208.111.154.69
208.111.154.63
208.111.154.64
[edited by: Bewenched at 2:23 am (utc) on Feb. 7, 2008]
I found an explanation of what they're up to.
[searchme.com...]
They're still hammering me like I'm a nail that won't go in. Grrr.
Unfortunately the UA that is blitzing my site does not show Charlotte - it appears to be spoofing a normal browser UA - basically their advice is - pretty much useless.
I've added the lines shown in that earlier post to my htaccess and it worked a treat. Don't know what these guys are up to but they're about to get a stinking email - nobody needs to spider any of my sites so completely in such a short space of time...
I have been getting hit by the same bots like kavam.net as Web_speed
Has mentioned Maybe we can get an admin to merge these threads so we can all try to figure this out together.
Brad
"Yes, the activity you are seeing is coming from one of our crawlers. We are busily refreshing our index in preparation for the public launch of our search engine. We hope that inclusion in our index will prove beneficial to you, but understand if you would prefer that we exclude you. We will add you to our “do not crawl” list today. You should see all activity from our spiders cease within an hour or two, so you should not need to modify your robots.txt configuration. We will stay off your site until such time as you explicitly request that we index you again.
Sorry for any inconvenience."
Hmmmmm.
I'm also seeing the kavam.net activity.
NB today they seem to have changed from kavam to searchme with the same UA: X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.11) Gecko/20080109
While they aren't using a meaningful UA like a normal bot I can't say.
The link you mention is from another source trying to do something malicious with URL redirects - as discussed in the thread.
Unless... everyone seeing the kavam/searchme activity has also started seeing the strange outgoing links. Then I guess they could be connected.
But to me kavam/searchme just looks like an amateur bot when the other activity looks like a malicious attack.
Just checked my logs and kavam came visiting on one site and the dodgy outgoing link activity (see NMH's link) started on the same day on one site and 3 days later on another. Does all seem a bit coincidental now you mention it!
The tip is to write in a friendly manner, make it a bit funny and non-threatening. Works like a charm.
Too many males (sorry, guys) tend to be formal and at times confrontational and it backfires. I'm tired of the phrase "outside the box" but that's where I go and it hasn't failed me yet.
I'm going to monitor the site and wait for it to become operational and then perhaps send another email inviting them back.