Forum Moderators: phranque

Message Too Old, No Replies

Python & Ruby user agents - bad?

         

roshaoar

12:04 pm on Jan 7, 2015 (gmt 0)

10+ Year Member



Hello,

I do currently block these but I'm wondering if I'm being overly cautious. As far as I do know, these are typically used by scrapers - is this still correct?

Thanks

wilderness

12:33 pm on Jan 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The python UA is included in just about every example of htaccess denies.

There are a dozen or two common terms that are abused by harvesters and should be part of every black-list. spider and crawler are the most abused. Other common terms are 'synonyms of download'.\

FWIW, in more than 15-years I've a mere three references to 'Ruby', and they were from IP's that wouldn't gain access anyway (one of which was Amazon).

roshaoar

12:42 pm on Jan 7, 2015 (gmt 0)

10+ Year Member



Everytime I tweet out a link, I get visited by ruby, visits to graphics. Always amazonAWS IPs (many different).

topr8

1:00 pm on Jan 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



personally i block all amazonAWS, ditto python UA's

i'd not seen ruby before, but i'd block that too - i can't see it being of any help.

roshaoar

1:17 pm on Jan 7, 2015 (gmt 0)

10+ Year Member



Amazon AWS is just weird. Almost every day, on a 25hr cycle, I get 4x as much traffic to one site from fake googlebot "gocrawl" crawlers from AWS as everything else put together. 100s and 100s of IPs. I block 'em, they keep coming. They look at robots first but don't take the blindest notice.