homepage Welcome to WebmasterWorld Guest from 54.211.230.186
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Punk Spider
caught this punk snooping from the clouds
incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4473572 posted 1:22 am on Jul 8, 2012 (gmt 0)

184.106.80.222

"Punk Spider/PunkSPIDER-v0.1"

robots.txt: YES

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4473572 posted 2:27 am on Jul 8, 2012 (gmt 0)

"Punk Spider"

Arent they all ;)

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4473572 posted 3:01 am on Jul 8, 2012 (gmt 0)

v0.1

Oi! Save your alpha testing for your friends' sites! Come back when you're ready with v. ... well, OK, at least 0.8.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4473572 posted 6:29 pm on Jul 8, 2012 (gmt 0)

Server range (Rackspace). Already blocked.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4473572 posted 6:42 pm on Jul 8, 2012 (gmt 0)

Server range (Rackspace). Already blocked.


That too.

I attempt to be polite with robots.txt set aside as a special case in the firewall so the dynamic robots.txt code will serve up permissions to anyone that asks. However, if you ask for robots.txt and are denied, and then request any other page from that user agent or IP address, you're also denied since the script enforces the robots.txt rules.

The reason I track the IP is some smart ass started asking for robots.txt using one user agent to test the waters then switched the user agent when asking for pages so I started tracking the IPs making the robots.txt requests if they're denied :)

The data center blocking, which includes rackspace, applies to all other files :)

It's complicated yet so simple.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4473572 posted 9:21 pm on Jul 9, 2012 (gmt 0)

Your system is far more sophisticated than mine. :)

My IIS system (sans htaccess) can only intercept IPs and headers on ASP page access. Far too late for me to change it now. :(

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4473572 posted 7:23 am on Jul 21, 2012 (gmt 0)

That PUNK came from a new IP today: 198.101.170.228

Wondering if they're in the cloud.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4473572 posted 10:14 pm on Jul 21, 2012 (gmt 0)

Don't care. Rackspace: blocked 198.101.128/17 :)

Does rackspace operate a cloud? If so, where? What IPs?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved