Forum Moderators: phranque

Message Too Old, No Replies

So getting invalid traffic on my apache server

         

born2run

9:36 am on Mar 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My apache webserver site is now getting invalid traffic. Any advice how to block these?

Also, Cloudflare just got this new firewall feature "Create a rule to block or challenge a specific User Agent from accessing your site"

Is there any particular malicious user agent that I can block using this rule? Any recommendations please? Thanks!

keyplyr

10:08 am on Mar 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We've had this discussion before. Same answer... check your server logs to determine who/what is the cause.

Blocking Methods [webmasterworld.com]

Search Engine Spider & User Agent ID Forum [webmasterworld.com]

Server Farm IP Ranges [webmasterworld.com]

TravisDGarrett

11:30 am on Mar 6, 2018 (gmt 0)



What do you call "invalid traffic" ?

keyplyr

11:41 am on Mar 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



He means what Adsense is calling invalid... bots.

TravisDGarrett

12:30 pm on Mar 6, 2018 (gmt 0)



He means what Adsense is calling invalid... bots.

Ok, I see.

I wonder if we (publishers) can really do something helpful, in blocking invalid traffic. Yes, we can block "known" user agent, "known" IP range, things like that , but since they are "known", I would assume that Adsense already know them and is already blocking them before an ad is served, and therefor clicked.

I am talking about invalid traffic regarding Adsense, not about scrappers and things like that, which is, of course, something we have to address ourselves.

keyplyr

12:43 pm on Mar 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Adsense doesn't block any requests. Adsense doesn't control our server. Once the Adsense code is on our pages, it is our responsibility.

Adsense does however invalidate some clicks it determines to not be human... and even some that are human.

If Adsense sees this to be a significant number of invalid clicks, it then comes after publishers to fix the problem.

TravisDGarrett

12:47 pm on Mar 6, 2018 (gmt 0)



Adsense doesn't block any requests

Are you sure of it ? I can't believe that Adsense servers are answering all Ad requests without a minimum of filtering, otherwise it would be easy to DDos Adsense all the time.

keyplyr

12:56 pm on Mar 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Adsense doesn't block requests on *our* server. The requests are made to fulfill the files linked to the webpage, including the Adsense code.

Whether the ad is served to what they consider to be invalid or not is represented by the clawbacks. So yes, ads are shown to invalid UAs. Maybe some are not shown. That would likely remain an unknown number.

lucy24

7:37 pm on Mar 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Maybe some are not shown.
Aren't most ads--whether through AdSense or other source--ultimately javascript based? The vast majority of robots don't even request, let alone act on, scripts.

keyplyr

8:40 pm on Mar 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@lucy24 - Clickbots, those creatures that intentially cause false positives for Adsense publishers, are purposed to do just that... follow JS pretending they are human.

tangor

10:06 pm on Mar 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Manage the bots. Always. As for which UAs to manage, that can change minute by minute. A never ending case of whack-a-mole. Or, you can whitelist and poke holes as needed. Both methods work.

born2run

3:45 am on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Should I block blank user agents?

keyplyr

3:52 am on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Should I block blank user agents?
I always have. Be careful though. You may need to allow some IP ranges to use a blank UA.

example: Facebook
If you post promotional material at FB with images, FB will periodically use a blank referrer to update it's cache of those files.

born2run

4:03 am on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh what the heck this fake traffic issue is a mess and I need to dive in.. secondly there is a huge market for any new company to help with this farking mess!

lucy24

5:54 am on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FB will periodically use a blank referrer to update its cache of those files
Blank referer or blank UA?

keyplyr

6:03 am on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Blank UA... but actually both.

Thanks for the heads-up :)

born2run

8:28 am on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks guys I’ll be sitting with a microscope and blocking bots as a first step

born2run

12:33 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So I have option to block blank referrers (which I've decided I won't) but surely I can block blank User agents? FB will have probs then? Kindly advise. Thanks!

born2run

12:44 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also I checked and found this link on Facebook's guidelines for webmasters:

[developers.facebook.com...]

They seem to have removed the blank UA issue no?

keyplyr

7:49 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As I said above, one of FBs image caching agents will not use a UA nor a referrer. I post a lot at FB and see this empty string in my logs every single day (I'm looking at it now.)

So if you block blank UAs and do not allow the various IP ranges used by FB, your images will start to disappear from your FB posts.

There are several ways to accomplish this. Here's one way using htaccess that allows the several FB UAs, including blank referrer, from FB ranges:

RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^(facebook|Facebot|visionuti)
RewriteCond %{REMOTE_ADDR} !^31\.13\.(6[4-9]|[789][0-9]|1[01][0-9]|12[0-7])\.
RewriteCond %{REMOTE_ADDR} !^66\.220\.1(4[4-9]|5[0-9])\.
RewriteCond %{REMOTE_ADDR} !^69\.63\.1(7[6-9]|8[0-9]|9[01])\.
RewriteCond %{REMOTE_ADDR} !^69\.171\.2(2[4-9]|[34][0-9]|5[0-5])\.
RewriteCond %{REMOTE_ADDR} !^173\.252\.(6[4-9]|[789][0-9]|1[01][0-9]|12[0-7])\.
RewriteRule - [F]


There also may be other beneficial agents that use a blank referrer. This is why you need to do the research before you start blindly blocking access to your server.

Start by watching your server logs several times a day for a few months... to learn what agents access your server and who/what they are.

- - -

lucy24

9:39 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Clickbots

Thanks, keyplyr, although I've got a nasty suspicion this is one of those things that get explained to me over and over again and it never sinks in.

one of FBs image caching agents will not use a UA nor a referrer.
:: detour to raw logs ::

Well, ###, I thought this only applied if you were personally active on FB so they were following-up on your own posts. Note that if you've got an IPv6 address, they will most likely come in from
2a03:2880::/29
(amusingly, out of this vast range of IPs-- /29 is a lot bigger in 6 than it was in 4-- they always pick the ones containing the string :face: ) But so far they've always had a UA from this range.

[edited by: lucy24 at 10:02 pm (utc) on Mar 7, 2018]

wilderness

9:43 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



born2, I didn't allow FB traffic for the longest while. FB is the most invasive server/software on the www (worse than any virus or malware).

Eighteen months ago, changed my tune due to widget topics.
The blank UA's are their problem, not mine.
There's far worse than blank UA's?
Wait until a FB user embeds a URL and exposes all the thread users IP's to your raw logs.

Personally, I place FB in a similar category as all the Wiki pages. There's is no real benefit (at least over time) to your site (s) because 99.99 of users merely view the one page. Most FB users have zero knowledge of primary www search engines (nor do they care), rather they are trapped into FB (like Wiki limitations for external links. Most don't even utilize the FB search options, rather they simply ask another user (pure laziness of social interaction).

keyplyr

9:58 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FB can be a huge source of revenue generating traffic if nurtured and used properly, especially FB Groups.







[fix typo]

[edited by: keyplyr at 10:00 pm (utc) on Mar 7, 2018]

tangor

9:59 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Personally I haven't allowed blank referers (sic) or UAs for the last 12 years. FB traffic still comes, from that link in SM. FB is not really a search engine and doesn't want to be, so I accommodate in that regard.

seoskunk

10:24 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Personally I haven't allowed blank referers (sic) or UAs for the last 12 years


Jeez remind me not to type your url into the search bar

wilderness

10:33 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



tangor, my denial of blank UA's is at least fifteen years (possibly longer).
It's almost step I in the htaccess bible <BG>

TravisDGarrett

10:36 pm on Mar 7, 2018 (gmt 0)



There is also malformed User Agent, which can be an indication of a bot.

Also, I don't know what to think about requests from IP which do not have a hostname. I always finds this suspect. I assume (may be wrong) that all legitimate ISP should set up a hostname for each of their IP.

wilderness

10:58 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Travis, you referring to 'Private Customer'?

lucy24

12:37 am on Mar 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



remind me not to type your url into the search bar
And whatever you do, don't bookmark a page, no matter how often you visit.

:: noting sadly that the conceptual merging of “search bar” and “address bar” proceeds apace ::