|Why does this PHP code block XML Sitemaps Generators?|
<?php if(@fsockopen($_SERVER['REMOTE_ADDR'], 80, $errstr, $errno, 1))
die("Proxy access not allowed"); ?>
on a page stops an xml sitemap generator script from working. anyone know why? I don't want to remove it as it is doing a great job.
Why do you think it's doing such a great job if you don't understand how it works? Your server has an IP address that responds to port 80, I presume?
A lot of people have port 80 open on their IP. Whether they operate a family web server, a media server of some type (gamers), or for home security. Maybe their coffee machine is connected to the net. Even if they don't operate a home server, their router might be configured to respond to port 80.
That piece of code will block a lot of innocent visitors and the payout for blocking proxies is very small. Most proxies don't even have port 80 open. Especially, the ones you need to be most worried about.
|That piece of code will block a lot of innocent visitors and the payout for blocking proxies is very small. |
Lots of scrapers use a wide list of proxy IPs to avoid detection and blocking proxies is the only way to disable their operations. Proxies also trick Google and other spiders to crawl via proxies which is how 302 hijacking occurs, but the solution there was to validate the spider IP properly instead of blocking the proxy.
However, the solution here is just like with any other firewall code, you have to make exceptions such as whitelisting the IP of the XML generator. If you want to block proxies you can either go the hard route and try to maintain a blacklist of proxy IPs, a never ending battle and time consuming, or simply whitelist this XML generator and anything else that gets snared by the proxy block, which is an infinitely shorter list.
To solve the problem with the code posted above, change it to check if the IP is the XML generator, else test for proxy and then die violently.
A more elegant approach would be test for a proxy and if the test is true, offer a captcha to see is a human is at the keyboard, don't just die();
Open port 80 doesn't equal proxy. Most proxy configurations don't even allow inbound connections on port 80. Try to connect to proxy
126.96.36.199 on port 80. You can't, even though it's notorious for being used by spammers.
On the other hand, I would be blocked by this PHP code because my home security system and satellite box is connected to the net on port 80. I'm not using a proxy and any message to that effect would come across to me as amateurish. Forget about captcha or other authentication hassles- that's almost worse than the proxy message. At that point, I've already lost trust in your site. I'll spend my time and money elsewhere.
|Open port 80 doesn't equal proxy. |
While that may be true, close inbound port 80 and problem solved.
Why you have an open port 80 in the first place?
|I've already lost trust in your site. I'll spend my time and money elsewhere. |
To webmasters being abused into oblivion, losing a couple of port 80 proxy customers won't matter that much as it doesn't happen often.
Might as well set your email server to allow relay while you're at it and see how many emails you ever send again.
There are some rules about how to configure ports and services that are simply becoming too hard to ignore.
An E-mail relay is not even in the same ball park as http traffic through port 80, although it might surprise you to learn that a lot of people do use personal E-mail servers for their home and businesses. I know I have in the past. I have a good ISP.
As for why I use port 80, that's what it is intended to be used for. Most home server devices are configured to use port 80. In the future, all homes will be connected in some way. Even appliances.
Just out of curisiosity, I checked out 10 trusted Comcast IPs and nearly half responded on port 80. Most of the responses were from routers. Test it yourself. The results may surprise you. Open port 80 has nothing at all to do with proxies. Ban them if you want, but you will be blocking a lot of visitors who aren't using a proxy. Far many than those who are using a proxy. In fact, I can't quite understand why somebody would think that port 80 was somehow synonymous with proxies. They're not.
|An E-mail relay is not even in the same ball park as http traffic through port 80 |
If I can get access to any site I want via a port 80 proxy I can spam forms, spam blogs, spam forums, scrape or DDoS websites, and make the person with the open port look responsible so it's just as bad or worse than spam.
Shouldn't be wide open for anyone outside your network.
Could be exactly how some sites currently under attack are happening, perhaps its not a botnet, perhaps it's just someone using everyone open port 80 on the planet to appear to be a botnet.
You obviously don't know what a proxy is. It's also clear that you don't understand how home routers or servers operate on port 80. You can access them, but without the login name and password, you aren't getting in. That is unless you are running a public web server. In which case, it's no different than any other server on the net.
Even if you had the login details, what would you do? Most routers can't be configured to act as a proxy. They're just dumb pieces of hardware with the software burned to firmware. At best, you could change the dns entries in the router and make life miserable for its owner. Or if you were really mean, shut it down.
Also, a botnet doesn't need open port 80 to operate any more than you need one to surf the net.
[edited by: Key_Master at 12:28 am (utc) on Mar 20, 2012]
|You obviously don't know what a proxy is |
I know exactly what a proxy is, I use them, have set them up, have even written a small one, and routinely block them. Perhaps these port examples you're giving aren't true proxies and you're confusing me discussing apples and oranges the claiming I don't know apples because I'm talking about oranges instead.
All I know is there are a lot of sites out there blocking open port 80s and maybe, just maybe, you should address people posting the code that the OP took such as this site:
That's how that stuff propagates all over the place.
Don't shoot the messenger.
|Even if you had the login details |
What login details?
I've been discussing OPEN PROXIES and there are a bunch!
Secured proxies? Who cares! Why would anyone block those until they get caught doing something bad? Just like you don't block secured email connections with relay disabled until they get caught being used for nefarious purposes like spam.
Real proxy checking code actually attempts to make a complete connection to verify the proxy is open and available for random usage which is not what the OPs code does, but that's what people are passing for proxy checkers these days per the link I provided above and code I've seen in many so-called proxy blocking scripts.
A real proxy testing script would actually attempt to read/write content via the open socket and some target site because the fsockopen() function often returns false positives which can't be confirmed or denied until an actual data transfer attempt is made through the socket.
Now, unless you want to quiz me on proxies and PHP socket code further, assuming we're on the same page of what constitutes an open proxy, which I didn't know was a variable that needed to be expressly explained, let's continue.
When you say those ports are open, you mean they get a valid response to fopensocket(), which also returns false positives, and may be password protected which can only be determined by attempted data transfer, correct?
To that end, the OPs code is wholly inadequate, but the short answer to his simple questions and problem is still to make a simple exception for the IP address of the service he wants.
Solving the rest of the open proxy predicament is way more complicated, and even THAT still needs exceptions to those rules. He needs an open source or commercial program that properly enables proxy blocking and stop playing with simplistic code that causes more problems than it fixes.
[edited by: incrediBILL at 12:47 am (utc) on Mar 20, 2012]
I'm addressing it here because it's being discussed here. It's bad code. It doesn't block proxies. It should warn, "Users with open port 80 are not allowed".
The bot they use can also be blocked via the poorly implemented http headers it uses. They look nothing like the headers a regular browser sends.
NoScript, huh? You sure it works? :):):)
Umm ... hello ... OP here. Quick question:
My problem is that "proxies" like HMA actually save my content and serve it to Google as their own. Since adding this PHP script, all the hundreds of pages of my site that HMA had duplicated in the search results now all just say 'proxy access not allowed'. Yay!