Forum Moderators: open

Message Too Old, No Replies

Prudent to block empty user agent visitors?

         

born2run

3:06 pm on Apr 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi, I'm challenging empty user agent visitors but looking at the urls they are visiting I see most seem to be useless visitors.

What percentage of empty user agent visitors are fake? Should I just block empty user agents? Please advise. Thanks!

TorontoBoy

3:18 pm on Apr 18, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I block all empty UAs. They are bots. Usually humans don't remove their UA, and the reputable bots show their UA, some UA, but rarely nothing. I have not detected any valid bot or human with an empty UA.

tangor

4:54 pm on Apr 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Been doing this for years. Recommended. :)

born2run

5:13 pm on Apr 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes so I see most of the experienced folks here are blocking blank UAs.. anyone else wanna join in this discussion? Thanks!

not2easy

5:23 pm on Apr 18, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It could be a very long wait for someone to oppose blocking blank UAs. They're not valid visitors.

Travis

5:40 pm on Apr 18, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



anyone else wanna join in this discussion?

I do block empty UA, also I block IP without reverse DNS, which I know is not good, but I did study a lot and weighted the risks, before taking this decision. (I also block buggy reverse rDNS)

lucy24

7:23 pm on Apr 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have not detected any valid bot or human with an empty UA.
Up until a few years ago, Google's faviconbot sent no UA at all. Then, for another year or two, they claimed to be Firefox/6. As it happens, I let everyone have the favicon (it's one way to flag wrongly blocked humans) so it made no difference.

But, yeah, a null user-agent is one of the very first things I blocked. I think for most sites it's a no-brainer.

If a human chooses to omit their UA string, that's a conscious choice on their part, and any consequences are on them.

born2run

8:17 pm on Apr 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So if bots have empty UA what’s the advantage for them having this?

Travis

9:46 pm on Apr 18, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



So if bots have empty UA what’s the advantage for them having this?

They might think they'll not catch your attention, and that you may not dare to block them.

Usually, webmasters pay more attention to UA than empty ones.

Also, an empty UA can be due to some kind of privacy protection options, so it "might" be real humans, and bots might think you will not dare to block something you can't identify for sure.

lucy24

10:10 pm on Apr 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



what’s the advantage for them
Most of the time it's just laziness. It's no different from omitting all the usual headers a human browser sends. The person writing the robot's script has to spend an extra few seconds* getting it to send the User-Agent header--or any other header--so why bother.

Considering that everybody, and their brother, and their cat, has a website: What proportion of all websites in the world have any access controls at all, whatsoever?


* That is: a few seconds, one time, out of your entire life, ever. That's three seconds the botrunner could have spent playing GTA, or whatever it is botrunners do with the rest of their waking hours.

born2run

2:08 am on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmm, so I set to block empty UAs.. firewalls blocking 1 empty ua visitor every 15 minutes! is that a normal rate or I've just been sitting on a mess of bots?

keyplyr

2:34 am on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you post links to FB, a related image is scraped from the respective page. One of the recurring image caching agents has an empty UA field, so you'll need to allow this from FB ranges. I see it validating files every day. If blocked, your image(s) will disappear.

There are a dozen more agents that send an empty UA field and, depending on your site's interests, may be important for you to give prejudice.

Be careful. Many who are quick to give advice may have a completely different web schema than you.

Note: It is sloppy webmastering and basically unwise to block anything unless you keep a diligent daily watch on your server logs to verify exactly who/what is getting blocked.

The internet is in constant change. New agents are present each day. Consistent evaluation is needed to stay relevant.

keyplyr

3:40 am on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...whatever it is botrunners do with the rest of their waking hours
I like surfing.

wilderness

3:40 am on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"What proportion of all websites in the world have any access controls at all, whatsoever? "

lucy, I'd wager it's less than 5% (and that's modest) of all websites.
Even worse minority, how many folks have you seen appear in this forum over the years that are neither aware of the existence of raw logs, nor have them turned ON?

born2run

4:08 am on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep I recall here the fb bot

Can I selectively allow fb bot?

tangor

4:19 am on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As with any hard and fast rule there are no hard and fast rules. Poke holes in your blank uas by ip/range. Let in what you like, keep all others out. If you don't do SM, or don't care, blocking fb and ilk is not a terrible thing. The web is the way it is because it is open to amateurs, simply meaning that they have no experience---and as such keep the whole mess alive.

As for humans surfing with a blank ua, it's my sandbox and I want to know who comes to play. Don't say? Can't play.

Dealing with expressed uas can be time consuming. Dealing with blank uas to find a few that might be beneficial is for the anal retentive.

keyplyr

4:44 am on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can I selectively allow fb bot?
Allow or block anything you like. Only you know what benefits your interests.

Point is... asking others what they block is useless information unless they have a site exactly the same as yours and even then they may be doing things detrimental to their own interests so taking their advice is dangerous.

You need to do the research. You need to do it yourself.

Look up the UAs discussed here and do web searches to find out who owns the bots and what their purpose is. Look up IP ranges listed here and use WHOIS to find out what IPs to allow or block.

Read your raw logs several times a day and learn about all the agents that target your web property. You learn all this by doing it yourself.

lucy24

6:38 am on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can I selectively allow fb bot?
Do you mean, is it physically possible? Sure. Exact methods are a question for whichever subforum is concerned with your server type. Unless you're using one of the extremely uncommon ones, in which case you're probably on your own.

I do most of my access controls using mod_setenvif* (Apache): Deny from {long list of environmental variables}. A blank UA gets flagged as env=noagent, which will eventually be blocked. But before it gets to that point, I check the IP, and if it's one of the listed FB ranges, I unset "noagent". FB seems to be wholly arbitrary & capricious about their IPs, even on a single visit. I think there are five ranges altogether.


* The actual denying is done with mod_authzthingummy in 2.2, or mod_whatever-it-is in 2.4. But either way, it relies on things that have already happened in mod_setenvif.

born2run

7:04 am on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another way is to manually add your own facebook image for the article being posted on FB. I use a scheduler to post articles (kinda like hootsuite).

lucy24

5:51 pm on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think keyplyr was talking about Facebook posts originating with other people, not self-promotion. I don't even do FB, but I'll occasionally meet the facebookbot.

I checked back in raw logs. I think FB started the no-UA variant around the middle of 2017; I don't find it earlier.

born2run

9:18 pm on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I post my site’s urls only on Twitter and fb

lucy24

10:42 pm on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You mean the site is noindexed, so only Twitter or FB users know about it, and then only if you tell them? In that case what’s with all the posts asking about GSC?

keyplyr

10:48 pm on Apr 19, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let's stay on the topic related to blank referrer. This is the User Agent forum.

Please take Facebook and Twitter discussion to the appropriate Social Media forums so that others can benefit by this informative dialogue.

TorontoBoy

1:08 pm on Apr 24, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



173.252.98.22-29 comes in with a UA of "-"

173.252.64.0 - 173.252.127.255 CIDR: 173.252.64.0/18
NetName: FACEBOOK-INC

I did block it.

lucy24

6:26 pm on Apr 24, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



a UA of "-"
The - is Apache's logging designation for a header that wasn't sent at all. If logs instead said "" (quotation marks with nothing inside them) it would mean the header was sent, but it is empty. To cover all bases, my access-control rules say
^-?$
although really either
^$
or
!.
(depending on which mod you're in) would do the same job.

keyplyr

3:11 am on Apr 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



173.252.98.22-29 comes in with a UA of "-"

173.252.64.0 - 173.252.127.255 CIDR: 173.252.64.0/18
NetName: FACEBOOK-INC

I did block it.
As I mentioned above, one of FBs image caching agents uses a blank UA. If you do not allow access from FB IP ranges, you get those embarrassing "Image Not Found" defaults next to your FB posts (or posts with links to your site that other people post.)

IMO a nice looking image next to the link to your site will attract a lot more traffic than that stupid looking "Image Not Found"