homepage Welcome to WebmasterWorld Guest from 54.242.200.172
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

    
How do to deal with Illegitimate traffic ?
Arkanoid1984




msg:4419328
 3:33 pm on Feb 19, 2012 (gmt 0)

Hi All,

For the past 3 weeks I have an abnormal increase of traffic.
Thousands of visitors visit the 3 same pages of my website.
94% of them are first timer and they stay less than 11 seconds.
They are coming from all over the USA from the various ISPs there.

They are direct traffic with no referrers.
They consume between 2 GB and 11 GB of bandwidth everyday. I exploded my monthly transfer allowance in just 10 days.

Running my site has become super costly all of sudden considering that this traffic is basically pure junk.
They do not buy any products.
they do not sign up to the newsletter.
and they do not increase ad revenues.


I asked my hosting provider to investigate this traffic. They say that the traffic seems legitimate to them.

I really would like to know where this direct traffic is coming from ? How they got my address for an obscure sale page on my site ? and why they are going there en-masse.


Is there something I can do ?
Thanks for the help,

Arkan

 

creative craig




msg:4419330
 4:05 pm on Feb 19, 2012 (gmt 0)

Do you have access to your raw log files? If so you should have a close through the data to see if anything jumps out.

Arkanoid1984




msg:4419354
 5:30 pm on Feb 19, 2012 (gmt 0)

Ok I went through the raw log for yesterday data.

For referring sites 6338 requests are unresolved on a total of 9194 requests.
1600 request are coming for my own site.
The rest for the search engines and other websites.

visitors countries 9810 requests are unknown on a total of 10026.

Referring URLs 6341 are unknown on 9198.
Top visitors IPs 340 requests are coming from host#*$!.atsomehostingcompany (No website is configured at this address)
154 from another host like this where no site is configured.

I'm not sure how to interpret these data. Any help?

Thanks
Arkan

Staffa




msg:4419366
 6:15 pm on Feb 19, 2012 (gmt 0)

In your log files you need to look at the IP numbers and UAs for those accessing your 3 obscure pages, without a referrer it seems obvious that those requests are coming from bots and not humans.

Block the ranges of these IPs and the user agents if they contain something that distinguishes them from the average UA, as a start that will slow down your bandwidth loss.

Arkanoid1984




msg:4419370
 6:58 pm on Feb 19, 2012 (gmt 0)

I wish I could just do that. But I don't see how.
There is no specific IP ranges this traffic is coming from all over the USA and a bit from Europe. They are from so many differents IPs.
The most common UAs are
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
Very normal UAs.

lucy24




msg:4419394
 8:44 pm on Feb 19, 2012 (gmt 0)

You said almost all of them are first-timers. Is that based on something objective, like cookies? You could redirect any suspicious visitors to an "I don't like your face" type of page containing an obfuscated e-mail or something that only humans can read.

The conditions would have to be:
-- Request for certain specific pages
-- No referer
-- No existing cookie

No matter what they do they'll be taking up bandwidth. But you can redirect them to a page that takes up only 1 or 2K instead of the full size of the page they're aiming for. (Don't know about you, but I have a steady stream of robots heading straight for my fattest page. I finally threw in the towel and reconfigured it so nobody can get there except via a shorter gateway page-- which has the name of the old long page. So far it's working.)

Arkanoid1984




msg:4419398
 9:22 pm on Feb 19, 2012 (gmt 0)

Yeah my hosting provide told that I will have to pay a projected 262 dollars of over-usage for this month.

I will ask them, how to create such a script but in case they cannot help me. How do you create such a blocking script ?

About the first-timers information it is from Google Analytics.

lucy24




msg:4419426
 2:34 am on Feb 20, 2012 (gmt 0)

I don't know from scripts; I just do it in htaccess.

How does GA tell if someone is a first-time visitor? Any information available to them is also available to you; they just process it differently.

:: detour to the usual place ::

Holy ###! I had no idea you could use mod_rewrite to set cookies. It's not a Swiss Army Knife, it's a whole bleedin toolbox!

Ahem.

:: making this up off the top of my head ::

RewriteCond %{HTTP_COOKIE} !.
RewriteCond %{HTTP_REFERER} ^-?$
RewriteRule (thispage|otherpage|thirdpage)\.html go_away.html [L]

or possibly

... http://www.example.com/goaway.html [R=301,L]

A redirect might even work better than a rewrite, because robots unlike humans have the option of not following redirects. 19 times out of 20 when they meet a 301 they say "Well, to heck with that" and go away without ever investigating. And that costs you even less bandwidth than even the smallest custom page. (My logs put a redirect at around 500 bytes, give or take 20. Do not ask me why they aren't all exactly the same size.)

Superfluous warning: Do not simply cut & paste the above.

Does your host give the option of simply shutting down the site for the rest of the month if you go over a certain usage level? Something tells me you're not making $262/month off all those unwanted robots.

:: off to investigate this cookie business ::

Arkanoid1984




msg:4419545
 2:16 pm on Feb 20, 2012 (gmt 0)

It is getting worse my Adsense revenue are down to 0 dollars since yesterday.
Is it possible that Google consider this traffic junk and stopped to count click.(Strange because I do not use Adsense on sales pages) When I saw this in the morning I deactivated Adsense on that site. I hope it is not too late. I hope that I will not be banned.

I submitted your idea to my host I will see what they say. I cannot close my site. It has 1500 legitimate visitors that are involved in the many dicussion going on the site. Maybe, I can close for a couple of hours during the night to sort this out. I have to see with the host.

lucy24




msg:4419590
 5:14 pm on Feb 20, 2012 (gmt 0)

This is an extreme remedy but...

If your host won't let you pull the switch, you can cut back on your own. Every night before you go to bed, upload an htaccess file that contains some kind of ungrammatical garbage. While you sleep, everyone will get a 500 error. This is not a pretty solution. But you can make a custom 500 page for your human visitors, something like "Closed for maintenance-- check back in a few hours."

Or put in a "Deny from all" directive and comment it out every morning. There are lots of other ways to lock people out if you are willing to be brutal and unkind about it :(

This is assuming you are allowed to make your own htaccess and look at your own raw logs. If not, you may as well change hosts anyway ;)

jpch




msg:4419668
 7:17 pm on Feb 20, 2012 (gmt 0)

Perhaps you could try CloudFlare as they might automatically recognize and be able to block that bad traffic. If nothing else it'll cut back on your bandwidth usage.

Arkanoid1984




msg:4419672
 7:34 pm on Feb 20, 2012 (gmt 0)

@Lucy24 yes I did what you say. I used a wordpress plugin to put the site in maintenance mode. The wordpress part of the site is about 400 pages big and get usually 200-300 legitimate visitors a day. Thats the part of the site under attack.
The static html part of the site is working fine and is getting 1000 of legitimate visitors a day.Even today this part of the site is working great.

That the code my host gave me:
RewriteEngine on
RewriteCond %{HTTP_COOKIE} !.
RewriteCond %{HTTP_REFERER} ^-?$
rewriterule ^(.*)$ htt p : / / w w w . your site .com/$1 [r=301,nc]

Thats different from what you wrote.

@jpch I will check out this Cloudfare. It looks good so far for Wordpress blogs.

lucy24




msg:4419696
 8:43 pm on Feb 20, 2012 (gmt 0)

What does your host's redirect do? If "yoursite.com" is, uhm, your site, then it seems as if it would redirect to the same file the user requested in the first place. This is fine if it's a robot, because they probably won't follow the redirect. Easy way to get rid of them. But if it's a genuinely new human, they'll never get there. Their own browser-- not your server-- will look at what's happeneing and slam the door.

Use www.example.com to keep examples from turning into clickable links. Oh, and the [NC] flag isn't needed, since no exact text is specified in the rule. But the Conditions are the important part, and you've got those.

Arkanoid1984




msg:4419832
 2:27 am on Feb 21, 2012 (gmt 0)

On another discussion they say that a 301 redirect can cause a lot of duplicate contend some more problem ?
D
I will see if I can go with an Error 410-Gone response.
I renamed the sales page /fr-products but I still have the suspicious traffic going for the page that no longer exist called /products

There is a thread about this here [webmasterworld.com...]
I have to learn how to do this.

I also signed up for Cloudfare but it doesnt seem terribly effective. I will test the service 1 or 2 days then leave it doesnt work well for me...

jpch




msg:4419840
 2:51 am on Feb 21, 2012 (gmt 0)

I also signed up for Cloudfare but it doesnt seem terribly effective. I will test the service 1 or 2 days then leave it doesnt work well for me


Cloudflare won't do anything until your name servers have been updated. Depending on DNS providers this can take 24-48 hours assuming they update every 24 hours. Before you give up on it at least make sure it's actually working.

Andem




msg:4420257
 12:55 am on Feb 22, 2012 (gmt 0)

It makes zero sense to have that much random zombie traffic from such a random sample without it being a DDoS attack. I know lucy24 is trying to help, but there are better suggestions to deal with this.

Have you taken a sample of maybe 25-50 of the IP addresses, done some WHOISes and figured out if they have any connections? If there is any connection, then try denying the subnets.

Are you sure that this traffic is really coming from the United States? Are they coming from IP addresses assigned to consumer-based ISPs?

I'm sure if you look at the traffic with a keen eye, you will be able to spot some consistencies.

An idea might be to try registering sessions with PHP (or whatever language you use) and then denying access if the script is unable to register a session.

Another idea is based on lacy24's idea. If you use a scripting language like PHP, if it is between certain times of the day, try only allowing access to the pages using too much bandwidth by requiring authentication through RECAPTCHA. That service is extremely easy to use and provides enough examples; if the 'user' doesn't get through that, you might be able to ward them off until they move on to their next target. Feel free to PM me if you need assistance in setting that up.

lucy24




msg:4420345
 5:27 am on Feb 22, 2012 (gmt 0)

On another discussion they say that a 301 redirect can cause a lot of duplicate contend some more problem ?
D
I will see if I can go with an Error 410-Gone response.

Duplicate content is when the actual pages are the same. If you're redirecting, then only the new page counts.

If you do mass-redirects, like hundreds of former pages to a single new one, that can lead to a whole different type of problem. But if the number of pages involved is so small that you can identify them by name, there should not be anything to worry about.

The choice between 301 and 410 is up to you, based on your own knowledge of the pages. If there's a surviving page that's a good match for the content of a former page, redirect-- even if you're redirecting two or three old pages to the same new one. If there's no good substitute, return a 410.

Arkanoid1984




msg:4420532
 3:56 pm on Feb 22, 2012 (gmt 0)

Thanks Lucy24 with my limited knowledge of coding I implemented a solution similar to what you recommended. I renamed all the 3 pages so humans can see the links when they browse the site. All the zombie traffic seem to be programmed to go the the pages with their old names. So I use a 301 redirect to dump all this zombie traffic on a 4 octets or so txt file that says "Oops you arrived at the wrong place".

So it is helping me to save my bandwidth. Yesterday consumption was 650 MB (A normal day will be around 615 MB , a full day under non-stop can cost me as high as 15.47 GB like Feb. 7), I think I will go further down as I implemented this solution late yesterday at 7 AM after several hours of zombies visit.

I also signed up for Cloudfare. I'm not sure how this service work and how effective it is. But it says that it saved me 97 MB.
According to Cloudfare all my zombie traffic is USA based. I checked the raw log files they are from the US and mostly from consumer-based ISPs at the exception of two IPs which I blocked through .htacess as they are hitting me 600 times a day. These 2 are probably spammers.

I will go again today through the raw files to check for patterns in the IPs.

rogerd




msg:4420544
 4:25 pm on Feb 22, 2012 (gmt 0)

More zombie discussion here:
[webmasterworld.com...]

Glad you found a solution, Arkanoid1984. This seems to be a botnet of infected Windows machines, which would explain the diverse IPs, UAs, screen resolutions, etc.

Other sites experiencing this seem to be Wordpress sites as well, but the sample size is too small to say whether it's coincidence or WP sites are being targeted. Not really clear what's going on, the volume on one site I watch isn't high enough to be a DDOS attack (it's been as high as 30K pageviews in a day, but the site can handle the load), nor does the site seem like a logical DDOS target.

netmeg




msg:4420545
 4:31 pm on Feb 22, 2012 (gmt 0)

I'm hit by this as well (maybe we should splice those two posts together) The FIRST thing I did was turn off the AdSense, and I strongly advise anyone else hit by it who's running AdSense to turn it off as well. It won't matter if it's not your fault. If you're getting CPM ads (and I do) then attacks like this will make your site a significant risk to advertisers.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved