homepage Welcome to WebmasterWorld Guest from 54.224.202.109
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Servers using open proxies
dstiles




msg:4404648
 10:59 pm on Jan 6, 2012 (gmt 0)

Just a warning to say: ensure you block (eg) server farms by X_FORWARDED_FOR as well as by REMOTE_ADDR IP.

I'm seeing a lot of multiple hits from a single forwarded IP (usually server-based) hitting my server through (usually) compromised or deliberately opened broadband connections. I've just checked a dozen such and found all proxies to have multiple open ports and to be broadband, which would normally be let through. Proxy countries are often (but not exclusively) the usual "bad" countries.

Today's batch came from an IP belonging to UK's FastHosts on 88.208.193.2nn (which might itself be a compromised server) but I've seen these attacks come from US servers as well.

It's also a good idea to block certain common proxies. One of the most persistent baddies at present is Mikrotik HttpProxy - well worth blocking! Again, it uses open broadband connections for its dastardly work.

 

DeeCee




msg:4404713
 6:44 am on Jan 7, 2012 (gmt 0)

You are correct. A lot of them use the MicroTik software.
Some of the connections hiding behind proxies are somewhat funny.

I am building a new spam blocker plugin, running against an API service, and my pre-releases are catching a LOT of these guys.. Causing both the originator and the proxy server to get banned.

This morning the system caught one that was somewhat unusual.

  • A company in Long Island (local advertising web-site) are promoting their pages and local (innocent) Long Island companies through Blog/Forum Spam.

  • The crawler actually running around posting the Spam is hosted on a server in Los Angeles, CA.

  • BUT. They found a hole in the US Armed Forces servers in Afghanistan, who are apparently running multiple open http proxy servers (using MicroTik). :-) So the Spam came from/through the military proxy servers overseas.

    Of course, Most of the others are dumb spammer bots hosted in China or Russia.
    But I had never before noticed the US Forces running spam. So much for the value of their security folks.

  • Umbra




    msg:4404732
     10:28 am on Jan 7, 2012 (gmt 0)

    Will this also block VPNs? I assume that many people are using VPN for legitimate reasons but haven't found a way to distinguish between open proxies and VPN, as both are found on server farms.

    DeeCee




    msg:4404764
     3:21 pm on Jan 7, 2012 (gmt 0)

    To the moderators of this forum: Feel welcome to drop this post, if you feel it necessary. While the response is discussing something I am developing it is not intended as Spam (a concept I hate).

    To Umbra: If you are asking about the Spam blocking, it as such do not really care where they come from. VPN, server farms, home networks..., No matter.

    The plugin checks blog comment spam (and in a future plugin forum site posts) against my API service/algorithm, and it does not really care. The algorithm I am developing do not really distinguish. In various proportions, it uses information from the spam itself, the links submitted, the IP origination, the proxies used, the combined knowledge from my DNSBLs gathered from various traps and honeypot type setups I have, the posters (whether robotic postings or people in India paid to post manually), site owner opinions, and other information. One factor is learning from the blog administrators. If some new Spam is not recognized it will stay in the WordPress moderation queue showing its history as undetermined, and if the admin then Spam manually, the algorithm learns from that for any future Spams (on that site or others). Similar for potential false positives. When an admin Unspams, it learns from that. If a Spam stays in the moderation queue as not recognized (with no admin intervention), it will be re-checked later automatically, as it might suddenly be recognized/categorized then and flipped either way.

    The intent is multiple.

    a) let spammers dig their own holes (read: graves). The more Spam they try to post (across sites), the worse off they get and the more money and effort they have to spend on buying new domains, finding new proxies, ... Assuming of course that the same spam-blocker is used across the sites. The Slime essentially sticks to everything they involve in its submission, rather than sticking to your site. (At least that is my plan. :))
    The worse they behave, the higher their Slime score gets. I watch a lot of them in Beta runs right now, and they keep buying new domains with slightly different names (advertising the same products or phishing sites), changing proxies, changing format, ... Even changing the bots they use. But rarely do the Slimers manage to make themselves unrecognizable when all factors are taken into account together. Plus, the effectiveness of these evasion actions is limited, as the new factors quickly get tainted as well.

    b) Stop annoying site users with CAPTCHA type fronts, lengthy manual moderation queuing, ... Good posters automatically build good karma (across sites) and can move more freely, bad posters build bad karma (across sites) and get stomped.

    c) Limit site admins from wasting time with manual Spam intervention. When running correctly, washing away Slime should be automatic. In my current tests, I have only had one piece of Spam be undetermined across sites and stay in the moderation queue. And it was subsequently auto-spammed on the automatic recheck.

    Since I also track other types of offenders, hacking, info trackers, mark scanners, ... around 50 types in total, the same plugin also has a security type front. Blocking certain URL patterns (SQL Injections, PHP injections, ...), and the blog owner can select which levels/types of DNSBL types to block out IPs by as well. Default, known hacker types are blocked out (known SSH, PHP, mysql, ppmyadmin, ..), but the admin can also select to block out known mark scanners, info trackers, known bad content scrapers, ..., ... The site admin can choose to report these catches back to the API (the default setting), so essentially the site can help build future blocks across the net. I have some other stuff in the plans as well.
    Essentially to save some server/network bandwidth from the thousands out there that think they should be tracking and lifting our sites (one of my pet peeves). The more we block them, the more we dilute the values of the databases they sell.

    When caught, they will see only either a code 403: Permission denied, or a 418: I am a Teapot (as in they violated the coffee pot protocol, which I am in love with right now :) )).

    I see it as Crud and Slime. Crud: Security/scanner issues, Slime: The unwanted Spam that clings and sticks to us all.)

    The same service will also tracks certain known email-spam, so if they use email spam as promotions as well, it will taint and block up their blog spam in a combined effort.

    I am finishing up the generics and the Wordpress specific code right now. Later I'll start a phpBB plugin, I think.

    BTW.. I have it running in test across some sites right now, but they are all similarly configured, since I control them. Before I let the Wordpress plugin out in the wild (as in upload to the Wordpress.org plugin site), I would like to have some other test sites install it. So if anyone is interested in the final Beta test or maybe just in more information, please drop me a note.

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved