Forum Moderators: open

Message Too Old, No Replies

More X11 / Ubuntu old-Firefox Activity

         

Bubalo

11:52 am on Jul 8, 2023 (gmt 0)

Top Contributors Of The Month




Hi all.

I am a new member and this my first post.

I am "web mastering" ( a steep learning curve for me) my own personal web site that features some of my artworks and photographs.

I have visited Webmaster World a few times before joining for helpful guidance - particularly about - the User Agent abuse from - Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0.

I am pretty sure, from evidence behavior on my site (but I could be wrong) that this U/A it is a scraper (possibly human but probably a unnamed/unknown bot) that is behind this user agent. It ignores my robots.txt file block request. It switches IP's frequently - most IP's show up in AbuseIPDB website as known dodgy IP's - but some IP's it uses are alarming - the latest being the French Atomic Energy Agency! - A lot of Universities/Schools, Cloud Proxies, and Amazon Aws.

The reason I think this is a scraper is from log reports - here is an example:

8 Jul 2023, 01:34:47104.219.213.35GET1.1200162,241425Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
8 Jul 2023, 01:31:5844.229.15.165GET1.140316,3690Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
8 Jul 2023, 01:30:2844.229.15.165GET1.140316,3690Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0

When is encounters a 403 Response Code (my ip block) - it switches IP - to something new and when is gets a 200 Response Code it then takes any thing from a few hundred Time Taken (ms) to over two thousand Time Taken(ms) to GET what it wants. It usually does this in blocks of 3 attempts and maybe only 5 or 6 attempts in one 24 hour period before moving on to a different target on my website. It seems to be concentrating on GETTING individual images (.jpeg) (hundreds of them on the site). There are only 13 different HTML pages on the site. There is no advertising on the site and it is NOT a commercial site as nothing is offered for sale on the site.

Of course I wondered at first was this U/A legitimate soon after it appeared a few months ago so when I noticed this Forum message about the botnet coming back I became more suspicious. As I blocked the IP's it just seemed to switch to new IP's as fast as I blocked them.

I also noticed many of the IP's were associated with China, North Korea, Hong Kong but as I blocked these - the IP's switching went worldwide - USA, UK, etc. So I tried blocking the countries China and HK - and then there was a marked increase in the U/A string using international IP's.

So far I have blocked probably a hundred different IP's and incidences now seem to be slowing down - most now come out of the USA.

I have not used the .htaccess file to attempt to block as I am pretty sure X11 the U/A will ignore that too.

I few days ago I decided as an experiment to lift the county block for China and -- I got over 30 hits from x11 in 24 hours - so I blocked China again. I don't get any audience traffic from China - other than hosting companies like 10 cent so I thought no great loss of traffic and so worth a shot to see what happened.

So. X11 seems to originate form China but what is behind it?

I notice on GitHub A LOT of people learning or using scraping use the X11 user agent string - and there is advice there for them to switch it often to another UA !

Legitimate traffic to my site does not seem to be down much and it usually fluctuates up and down anyway - but I do fear the X11 trouble could get much worse as others posting on WM world have indicated has occurred on their web sites. I don't want this to happen to me. My host does not have a anti-scrape tool, yet, And *loudflare has other problems I don't want to touch

I thought to post my experience here and welcome all comments and suggestions from you guys who are more experienced.
Thanks.




[edited by: not2easy at 1:55 pm (utc) on Jul 8, 2023]
[edit reason] split thread cleanup [/edit]

not2easy

11:48 am on Aug 27, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The UA is not some sinister a powerful bot, it is not being handled by whatever you are using. If the UA is bypassing your blocking it is because it is not done correctly, UAs don't get to bypass blocks unless they are ineffective blocks. To discuss the right way to block it, you would need to know which version of Apache in order for others to assist. Do you have access to CPanel? It is available on most hosts but not all. You can usually see the server information - such as Apache version - from within the CPanel interface.

To discuss blocking it does need to go to Apache, and this discussion is not 'lost', it is linked in your profile. Once you know the UA you want to block, that is the discussion - which is different from determining the UA.

Bubalo

12:25 pm on Aug 27, 2023 (gmt 0)

Top Contributors Of The Month



As I said, it was blocked, but then it got itself "unblocked" which I have just been advised was due to a host side "cache glitch" error, that has now been fixed.

As a new user experiencing such an unknown and troublesome thing as this particular U/A - it was a worry I was seeking help to get to grips with. If you don't have such a thing constantly attacking your websites(s) - then you are lucky or you have just got the right methods in place to ensure it doesn't so badly affect your site(s) as it did mine.

I remember the post I first replied to was some like this U/A was "back again" - this was when I first came across your WW forum - and then you created this new discussion. What caught my eye was the words "X11 is back..."

I won't be back.
Thanks again all for help.

tangor

7:45 am on Aug 28, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry to hear that. The web constantly evolves and what we do evolves with it. Take the tools, work 'em, then work 'em again until it gets right.

Best anyone can do!

(BTW, there is no singular lock and key solution to anything. Always more than one way to skin the proverbial cat!)

In the Apache forum!
This 63 message thread spans 3 pages: 63