Welcome to WebmasterWorld Guest from 54.145.136.73

Forum Moderators: Ocean10000 & incrediBILL

Palo Alto Networks bot

   
10:13 am on Jun 24, 2014 (gmt 0)



A new bot for me, from a company that Jim Cramer has been pumping the stock on his cable show.

64.74.215.27 - - [23/Jun/2014:20:42:45 -0400] "GET /robots.txt HTTP/1.0" 403 237 "-" "spyder/Nutch-2.1 (just another internet crawler; http://www.paloaltonetworks.com/products/features/url-filtering.html; ghalevy@paloaltonetworks.com)"


I let all bots in through robots.txt, but ban them from going further with various .htaccess tests. This one failed for HTTP/1.0 and UA words spyder, crawler, and Nutch.

I have a suspicion of what the bot is up to regarding an obscure web site as my own, but no doubt experienced webmasters know precisely their goal based on their company services.

[edited by: incrediBILL at 4:31 pm (utc) on Jun 24, 2014]
[edit reason] formatting [/edit]

5:26 pm on Jun 24, 2014 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I've yet to see it but apparently it's been around for almost a year. [projecthoneypot.org...]

FWIW, even obscure sites are not immune from bots basing crawling by IP addresses (akin to auto-dialer spam phone callers). Our small, private server gets bots hitting all the active sites within seconds, presumably after having tried all 255 numbers in our CIDR.

And many bots start by crawling their own server farm mothership, which may include tens of thousands of private sites, obscure or other wise.

Last but not least, all too often long-time bot-spotters like m'self have no clue what all too many bots are up to, or for whom. But their why is easy -- like Bill said the other day, there's money in it.
6:51 pm on Jun 24, 2014 (gmt 0)



Thanks for the insight.

Based on what the company does, I suspected the bot was looking for malware infected websites as a continuing test of their systems for securing networks. I also suspected that they are very interested in identifying malware infected botnets that have yet to execute a zero day attack.

Those are my best guesses.
7:54 pm on Jun 24, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





Many of us block "spyder" "spider" "nutch" "crawler" and other categorical names found in the User Agent.
8:01 pm on Jun 24, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



Palo Alto Techops is a server, listed as part if PNAP, all blocked. I fist spotted them last June coming in on a malformed UA: "'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'" (note the extra single-quote) but the IP you listed is a smaller range inside PaloAlto simply registered as:
Private Customer INAP-SJE-PALOALTOTECHOPS-64-74-215-0 (NET-64-74-215-0-1)
64.74.215.0 - 64.74.215.255
64.74.215.0/24
9:14 pm on Jun 24, 2014 (gmt 0)



Thanks for confirming the range. Before I could put in a block, the bot came back and grabbed 10 pages using a different UA.

"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11"

It looks like my guess of what the bot was up to was wrong.
11:00 pm on Jun 24, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



Yes, that's why sometimes CIDR IP block is the best way to keep them out. They can switch UAs all day.
12:09 am on Jun 25, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




Internap
64.74.0.0 - 64.74.255.255
64.74.0.0/16
11:15 am on Jun 25, 2014 (gmt 0)

5+ Year Member



"GET /robots.txt HTTP/1.0" 403
Why 403?
3:15 pm on Jun 25, 2014 (gmt 0)



I refer you back to my original post.

Here is the opening lines of my .htaccess file.

# Allow all bots to fetch robots.txt
SetEnvIf Request_URI "^/(robots\.txt)$" allow_all

Order Deny,Allow

<Limit GET>
Allow from env=allow_all
</Limit>

The robot gets through initially but is later denied by rewrites that ban UAs later in the file as I said in the OP. I presume this is the reason.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month