Welcome to WebmasterWorld Guest from 54.196.147.57

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

EC2LinkFinder

New kid on the Amazon block

     

iamzippy

11:49 am on Mar 10, 2012 (gmt 0)

5+ Year Member



107.20.nn.nn - - [10/Mar/2012:12:21:12 +0100] "GET /robots.txt HTTP/1.1" 200 ... "" "EC2LinkFinder" "-"
107.20.nn.nn - - [10/Mar/2012:12:21:12 +0100] "GET / HTTP/1.1" ... ... "" "EC2LinkFinder" "-"

Amazon EC2 - US East - Northern Virginia [ ARIN ]

Hitherto unknown to me, hence no entry in robots. Home page hit went into a black hole, since it's from EC2.

Notice the truly empty referrer.

Is this a new star being born?

wilderness

8:38 pm on Mar 10, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Amazon AWS Hosts Bad Bots [webmasterworld.com]

pfui devoted to Amazon

keyplyr

8:56 pm on Mar 10, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




iamzippy, as wilderness referenced, Amazon EC2 (Amazon AWS) is a haven for bad agents. Many of us ban all these ranges, thus anything coming from there is no longer an issue.

frontpage

5:52 pm on Mar 11, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A bad robot hit on 2012-03-09 (Fri) 20:33:39
address is 174.129.NN.NNN, agent is EC2LinkFinder


Resolves back to: NetName: AMAZON-EC2-5

iamzippy

6:22 pm on Mar 11, 2012 (gmt 0)

5+ Year Member



Hmmm. Another one came by. Different IP, but I had by now updated robots.

107.22.nnn.nnn - - [11/Mar/2012:02:38:19 +0100] "GET /robots.txt HTTP/1.1" 301 251 ..."" "EC2LinkFinder" "-"
107.22.nnn.nnn - [11/Mar/2012:02:38:19 +0100] "GET /robots.txt HTTP/1.1" 200 2353 www.... "" "EC2LinkFinder" "-"

That's all she wrote. So it's clear that it follows canonical 301s and it obeys robots.txt.

Just as well. My EC2 policy is er... robust.

Sapo

9:57 pm on Mar 23, 2012 (gmt 0)

5+ Year Member



It doesn't obey robots.txt in that it ignores the crawl-delay directive.

I've black-holed them as well.

Cybl00

12:26 am on Mar 29, 2012 (gmt 0)



How do you black-hole them? They are showing up in my logs and I just want them blocked.

My hosting is on a linux box, the site is mainly php. So far anything I've tried in my .htaccess file doesn't seem to work.

I've got php script in the head of my pages to block proxies and redirect some guy who abuses the site (same guy using proxies now) but the proxies still get through. Now this "ec2linkfinder" shows up in the logs.

So any help anyone can provide, including just pointing me in the right direction would be awesome.

lucy24

4:09 am on Mar 29, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



It doesn't obey robots.txt in that it ignores the crawl-delay directive.

Does anyone follow "crawl-delay"? I'd got a vague idea it was one of those g### proprietary things, but obviously I was mistaken since they themselves say they ignore it.

And whatever happened to Pfui? She's not still going mano a mano with her server is she? :(

So far anything I've tried in my .htaccess file doesn't seem to work.

Can you be more specific about what you've tried and in what way it doesn't work?

wilderness

4:42 am on Mar 29, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



How do you black-hole them? They are showing up in my logs and I just want them blocked.

My hosting is on a linux box, the site is mainly php. So far anything I've tried in my .htaccess file doesn't seem to work.


Denying access to a visitor (regardless of criteria) will not prevent the denied line from appearing in your raw access logs.

Is that possibly what your referring to?

Denying access to to the Amazon server farms is one of the simplest additions to htacess.
The ranges and methods have existed in old threads of this forum for perhaps a decade.

thetrasher

4:18 pm on Apr 2, 2012 (gmt 0)

10+ Year Member



On 2012-04-01 EC2LinkFinder changed its name to GSLFbot

keyplyr

10:56 pm on Apr 2, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Still coming from the same range :)

Cybl00

8:29 pm on Apr 5, 2012 (gmt 0)



Sorry for the late response.

In answer to questions above, some of the things I've tried with the htaccess file is redirecting a specific ip to a specific page, blocking proxy sites and most recently blocking access to a specific folder. For the redirect and the folder access I got 500 Internal Server Error, the proxy blocking simply didn't work.

Not sure why it doesn't work but I end up doing most stuff in php instead. The proxy blocking I've tried in php so far hasn't worked but I haven't really put much time into that.

My site is on shared hosting but from what I can tell the htaccess should work in my site directories. Although all attempts to use htaccess haven't worked or throw a 500 error.

As for the bad bots, I took a shortcut and found a blackhole php script at [perishablepress.com...] with a few minor edits it's working awesome so far, caught bad bots already. I am using it without the htaccess file that is included as it was throwing a 500 error and causing it not to work. I also changed the php version from 4.2 to 5.2 in the php files.

wilderness

9:06 pm on Apr 5, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I am using it without the htaccess file that is included as it was throwing a 500 error and causing it not to work.


That htaccess is virtually empty.
It contains options -indexes, which some hosts already have in place and results in a 500 when added to an htaccess.

The only other thing the file contains are four lines to make the htaccess inaccessible to outsiders, however that is redundant as well, most shared hosts already have that in place above your root.

Your PHP file may later add some lines that contain badly formatted syntax that renders the resulting htaccess with a 500.

Your error documents may also be the cause the 500 (i. e., server loop and/or bad syntax)

wilderness

9:09 pm on Apr 5, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



In answer to questions above, some of the things I've tried with the htaccess file is redirecting a specific ip to a specific page, blocking proxy sites and most recently blocking access to a specific folder. For the redirect and the folder access I got 500 Internal Server Error, the proxy blocking simply didn't work.


Many folks here have proxies, server farms, colo's and various other parties denied access, and they function as intended, thus you either need more practice, or some references which assist you in better-comprehending the procedures.

Cybl00

10:39 pm on Apr 5, 2012 (gmt 0)



Probably a little of both. I haven't done much with htaccess but have read quite a bit and plan to read much more. Any suggestions on what to read?

I think I had taken the options -indexes out and still had the error but I will try again just to be sure.

keyplyr

12:19 am on Apr 6, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Any suggestions on what to read?


[webmasterworld.com...]

[webmasterworld.com...]

lucy24

12:30 am on Apr 6, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



It contains options -indexes, which some hosts already have in place and results in a 500 when added to an htaccess.

Huh?! Do you know this by direct personal experience? Saying the same thing over again in consecutive directories should have no effect at all. Especially something as common as +Indexes or -Indexes, which people will use even if they are otherwise deathly afraid of htaccess.

Apache does warn about mixing options with and without plus/minus signs, but even there, all they say is "unexpected results". I do not know whether this term includes, say, a complete server meltdown.

:: peering into crystal ball ::

Cybl00, your redirects are ending up as 500 errors because you made no provision for requests that have already been redirected, leading to a chain of infinite redirects. Matter of fact it should lead to a browser error, unless by "redirect" you mean "rewrite". I went through the same thing for a while until I pinpointed the problem. Same goes for blocking files: make sure they're not blocked from the custom error document, if any.

But really, why are you redirecting unwanted guests to anywhere on your site at all? Just 403 'em at the gate. Redirects are for the borderline cases that might be human so you need to give them a chance to convince you. Some especially pesky visitors may require a redirect to 127.0.0.1 but most of the time a simple lockout is all you can, or should, do.

wilderness

1:26 am on Apr 6, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Huh?! Do you know this by direct personal experience?


Of course I do and if you start attempting to edit or contradict every word I compose than you and I are certainly going to have words, AGAIN!

Your rambling on your own submissions seems to be tolerated, however I do not wish your rambling introduced for clarification (of which you certainly don't need) or any other reason (merely to pat yourself on the back) to any words I submit.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month