homepage Welcome to WebmasterWorld Guest from 54.81.170.186
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
EC2LinkFinder
New kid on the Amazon block
iamzippy




msg:4427501
 11:49 am on Mar 10, 2012 (gmt 0)

107.20.nn.nn - - [10/Mar/2012:12:21:12 +0100] "GET /robots.txt HTTP/1.1" 200 ... "" "EC2LinkFinder" "-"
107.20.nn.nn - - [10/Mar/2012:12:21:12 +0100] "GET / HTTP/1.1" ... ... "" "EC2LinkFinder" "-"

Amazon EC2 - US East - Northern Virginia [ ARIN ]

Hitherto unknown to me, hence no entry in robots. Home page hit went into a black hole, since it's from EC2.

Notice the truly empty referrer.

Is this a new star being born?

 

wilderness




msg:4427611
 8:38 pm on Mar 10, 2012 (gmt 0)

Amazon AWS Hosts Bad Bots [webmasterworld.com]

pfui devoted to Amazon

keyplyr




msg:4427617
 8:56 pm on Mar 10, 2012 (gmt 0)


iamzippy, as wilderness referenced, Amazon EC2 (Amazon AWS) is a haven for bad agents. Many of us ban all these ranges, thus anything coming from there is no longer an issue.

frontpage




msg:4427885
 5:52 pm on Mar 11, 2012 (gmt 0)

A bad robot hit on 2012-03-09 (Fri) 20:33:39
address is 174.129.NN.NNN, agent is EC2LinkFinder


Resolves back to: NetName: AMAZON-EC2-5

iamzippy




msg:4427890
 6:22 pm on Mar 11, 2012 (gmt 0)

Hmmm. Another one came by. Different IP, but I had by now updated robots.

107.22.nnn.nnn - - [11/Mar/2012:02:38:19 +0100] "GET /robots.txt HTTP/1.1" 301 251 ..."" "EC2LinkFinder" "-"
107.22.nnn.nnn - [11/Mar/2012:02:38:19 +0100] "GET /robots.txt HTTP/1.1" 200 2353 www.... "" "EC2LinkFinder" "-"

That's all she wrote. So it's clear that it follows canonical 301s and it obeys robots.txt.

Just as well. My EC2 policy is er... robust.

Sapo




msg:4432749
 9:57 pm on Mar 23, 2012 (gmt 0)

It doesn't obey robots.txt in that it ignores the crawl-delay directive.

I've black-holed them as well.

Cybl00




msg:4434629
 12:26 am on Mar 29, 2012 (gmt 0)

How do you black-hole them? They are showing up in my logs and I just want them blocked.

My hosting is on a linux box, the site is mainly php. So far anything I've tried in my .htaccess file doesn't seem to work.

I've got php script in the head of my pages to block proxies and redirect some guy who abuses the site (same guy using proxies now) but the proxies still get through. Now this "ec2linkfinder" shows up in the logs.

So any help anyone can provide, including just pointing me in the right direction would be awesome.

lucy24




msg:4434666
 4:09 am on Mar 29, 2012 (gmt 0)

It doesn't obey robots.txt in that it ignores the crawl-delay directive.

Does anyone follow "crawl-delay"? I'd got a vague idea it was one of those g### proprietary things, but obviously I was mistaken since they themselves say they ignore it.

And whatever happened to Pfui? She's not still going mano a mano with her server is she? :(

So far anything I've tried in my .htaccess file doesn't seem to work.

Can you be more specific about what you've tried and in what way it doesn't work?

wilderness




msg:4434671
 4:42 am on Mar 29, 2012 (gmt 0)

How do you black-hole them? They are showing up in my logs and I just want them blocked.

My hosting is on a linux box, the site is mainly php. So far anything I've tried in my .htaccess file doesn't seem to work.


Denying access to a visitor (regardless of criteria) will not prevent the denied line from appearing in your raw access logs.

Is that possibly what your referring to?

Denying access to to the Amazon server farms is one of the simplest additions to htacess.
The ranges and methods have existed in old threads of this forum for perhaps a decade.

thetrasher




msg:4436259
 4:18 pm on Apr 2, 2012 (gmt 0)

On 2012-04-01 EC2LinkFinder changed its name to GSLFbot

keyplyr




msg:4436445
 10:56 pm on Apr 2, 2012 (gmt 0)

Still coming from the same range :)

Cybl00




msg:4437697
 8:29 pm on Apr 5, 2012 (gmt 0)

Sorry for the late response.

In answer to questions above, some of the things I've tried with the htaccess file is redirecting a specific ip to a specific page, blocking proxy sites and most recently blocking access to a specific folder. For the redirect and the folder access I got 500 Internal Server Error, the proxy blocking simply didn't work.

Not sure why it doesn't work but I end up doing most stuff in php instead. The proxy blocking I've tried in php so far hasn't worked but I haven't really put much time into that.

My site is on shared hosting but from what I can tell the htaccess should work in my site directories. Although all attempts to use htaccess haven't worked or throw a 500 error.

As for the bad bots, I took a shortcut and found a blackhole php script at [perishablepress.com...] with a few minor edits it's working awesome so far, caught bad bots already. I am using it without the htaccess file that is included as it was throwing a 500 error and causing it not to work. I also changed the php version from 4.2 to 5.2 in the php files.

wilderness




msg:4437723
 9:06 pm on Apr 5, 2012 (gmt 0)

I am using it without the htaccess file that is included as it was throwing a 500 error and causing it not to work.


That htaccess is virtually empty.
It contains options -indexes, which some hosts already have in place and results in a 500 when added to an htaccess.

The only other thing the file contains are four lines to make the htaccess inaccessible to outsiders, however that is redundant as well, most shared hosts already have that in place above your root.

Your PHP file may later add some lines that contain badly formatted syntax that renders the resulting htaccess with a 500.

Your error documents may also be the cause the 500 (i. e., server loop and/or bad syntax)

wilderness




msg:4437725
 9:09 pm on Apr 5, 2012 (gmt 0)

In answer to questions above, some of the things I've tried with the htaccess file is redirecting a specific ip to a specific page, blocking proxy sites and most recently blocking access to a specific folder. For the redirect and the folder access I got 500 Internal Server Error, the proxy blocking simply didn't work.


Many folks here have proxies, server farms, colo's and various other parties denied access, and they function as intended, thus you either need more practice, or some references which assist you in better-comprehending the procedures.

Cybl00




msg:4437764
 10:39 pm on Apr 5, 2012 (gmt 0)

Probably a little of both. I haven't done much with htaccess but have read quite a bit and plan to read much more. Any suggestions on what to read?

I think I had taken the options -indexes out and still had the error but I will try again just to be sure.

keyplyr




msg:4437785
 12:19 am on Apr 6, 2012 (gmt 0)

Any suggestions on what to read?


[webmasterworld.com...]

[webmasterworld.com...]

lucy24




msg:4437787
 12:30 am on Apr 6, 2012 (gmt 0)

It contains options -indexes, which some hosts already have in place and results in a 500 when added to an htaccess.

Huh?! Do you know this by direct personal experience? Saying the same thing over again in consecutive directories should have no effect at all. Especially something as common as +Indexes or -Indexes, which people will use even if they are otherwise deathly afraid of htaccess.

Apache does warn about mixing options with and without plus/minus signs, but even there, all they say is "unexpected results". I do not know whether this term includes, say, a complete server meltdown.

:: peering into crystal ball ::

Cybl00, your redirects are ending up as 500 errors because you made no provision for requests that have already been redirected, leading to a chain of infinite redirects. Matter of fact it should lead to a browser error, unless by "redirect" you mean "rewrite". I went through the same thing for a while until I pinpointed the problem. Same goes for blocking files: make sure they're not blocked from the custom error document, if any.

But really, why are you redirecting unwanted guests to anywhere on your site at all? Just 403 'em at the gate. Redirects are for the borderline cases that might be human so you need to give them a chance to convince you. Some especially pesky visitors may require a redirect to 127.0.0.1 but most of the time a simple lockout is all you can, or should, do.

wilderness




msg:4437810
 1:26 am on Apr 6, 2012 (gmt 0)

Huh?! Do you know this by direct personal experience?


Of course I do and if you start attempting to edit or contradict every word I compose than you and I are certainly going to have words, AGAIN!

Your rambling on your own submissions seems to be tolerated, however I do not wish your rambling introduced for clarification (of which you certainly don't need) or any other reason (merely to pat yourself on the back) to any words I submit.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved