homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

new bot?

 4:50 pm on Feb 28, 2001 (gmt 0)

Comes from (shows up as: analysis.he.net)
ua: Mozilla/4.0 (compatible;MSIE 5.5;Windows NT 5.0)

Does not seem to honor robots.txt; deep crawls;

up to 5 requests per second!!





 4:53 pm on Feb 28, 2001 (gmt 0)


also does NOT honor robots meta tag (deep-crawled a page I had set as "noindex,nofollow")


 4:52 pm on Mar 17, 2001 (gmt 0)

This one hit my site twice yesterday, and looking back in my logs had been around about the time you posted your message.

I do have pages that require authorization on this site. The funny thing is, it never got robots.txt, but it stays away from the directory that requires authorization. It "acts" like it has seen the robots.txt file, because it gets everything else on the site.

he.net is Hurricane Electric in Fremont, CA.

Anybody else seen this one?


 4:59 pm on Mar 17, 2001 (gmt 0)

I was getting ready to nuke 'em in .htaccess. I thought they were messin' with me. Sorta glad to see it's not just me.

I wonder if its one of the mods here at WmW checking up to see if we're all doing our part in applying the techniques learned here.;)


 2:16 am on Apr 4, 2001 (gmt 0)

Well, h.e.'s back again. Nobody knows?

I just continue to let it rape my site without knowing whether to .htdisallow it or not.

Comes around about once every 2 months, just like google. Grabs everything.


 4:40 pm on Apr 5, 2001 (gmt 0)

I had a visit from cypress.he.net with the user agent Pizilla++ ver 2.45



 9:06 pm on Apr 28, 2001 (gmt 0)

Can't one block this IP?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved