Forum Moderators: open

Message Too Old, No Replies

spbot

         

Pfui

9:07 pm on Feb 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ec2-204-236-242-36.compute-1.amazonaws.com
Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )

robots.txt? Yes

Note slash-space-close-paren pattern.

Related: amazonaws.com plays host to wide variety of bad bots [webmasterworld.com]

Pfui

4:04 am on Feb 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Y'know, it's this kind of activity -- 23 robots.txt hits to one site in eight hours & still at going -- that gives even robots.txt-requesting bots a bad name.* Partial listing:

ec2-174-129-106-91.compute-1.amazonaws.com [11:22:29]
ec2-67-202-0-31.compute-1.amazonaws.com [11:32:07]
ec2-174-129-155-12.compute-1.amazonaws.com [11:50:05]
ec2-204-236-242-36.compute-1.amazonaws.com [12:34:46]
ec2-75-101-254-111.compute-1.amazonaws.com [13:40:02]
ec2-174-129-136-94.compute-1.amazonaws.com [13:45:00]
ec2-174-129-61-74.compute-1.amazonaws.com [13:49:43]
ec2-72-44-48-77.compute-1.amazonaws.com [13:49:45]
ec2-67-202-41-44.compute-1.amazonaws.com [14:07:58]
ec2-174-129-136-47.compute-1.amazonaws.com [14:30:32]
ec2-75-101-204-87.compute-1.amazonaws.com [14:31:51]
ec2-174-129-61-74.compute-1.amazonaws.com [14:40:58]
ec2-67-202-0-47.compute-1.amazonaws.com [14:52:07]
ec2-174-129-84-116.compute-1.amazonaws.com [15:04:29]
ec2-72-44-42-173.compute-1.amazonaws.com [16:00:23]
ec2-204-236-211-119.compute-1.amazonaws.com [16:02:04]
ec2-204-236-197-86.compute-1.amazonaws.com [16:26:54]
ec2-75-101-219-131.compute-1.amazonaws.com [16:43:38]
ec2-75-101-179-97.compute-1.amazonaws.com [17:09:40]
ec2-67-202-0-47.compute-1.amazonaws.com [17:31:57]
ec2-72-44-56-37.compute-1.amazonaws.com [17:51:37]
ec2-67-202-2-164.compute-1.amazonaws.com [19:02:15]
ec2-75-101-204-87.compute-1.amazonaws.com [19:32:51]

*And the bad name that comes to my mind is -- "Jerks."

jdMorgan

7:26 pm on Feb 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



spbot claims to respect robots.txt, but they apparently don't understand a "disallow all except" and/or multiple-user-agent policy record constructs, such as:

User-agent: Googlebot/
User-agent: Slurp
User-agent: msnbot
User-agent: Teoma
Disallow: /cgi-bin

User-agent: *
Disallow: /


They fetch that, and then come right back with page requests - sadly, 403'ed until they get this sorted.

Jim

incrediBILL

7:44 pm on Feb 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just noticed this one bouncing off the firewall.

There's already enough SEO spiders, enough is enough.

Whitelisting, the other blacklist. ;)

jdMorgan

3:42 am on Feb 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, so I also tried
User-agent: spbot
Disallow: /

User-agent: Googlebot/
User-agent: Slurp
User-agent: msnbot
User-agent: Teoma
Disallow: /cgi-bin

User-agent: *
Disallow: /


just to make it real simple, but spbot fetched that and then still tried to fetch pages. So it's a 403 with a zero-byte content-body for them, I'm afraid. I have firewall envy here, BTW... shared hosting, no firewall control :(

With no robots.txt compliance, no-rDNS-available Amazon Cloud as a host, and no crawler-ident support, their challenge to Majestic 12 isn't likely going to go so well...

Jim

incrediBILL

3:51 am on Feb 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim, that's why my robots.txt compliance is reinforced because I don't care if they comply or not, if they step off the path >SLAM!< goes the bear trap.

Robot.txt is just a suggestion, one most ignore, so bouncing off a firewall is no trespassing sign for those that can't read or won't honor robots.txt.

jdMorgan

4:19 pm on Feb 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, all variants of the robots.txt file examples above were (are, and always have been) 'backed up' with 403 rules in .htaccess. But it'd would've been nice to be able to bounce the unwelcome requests at the door, before they even bothered Apache. Unfortunately, it's not an option on this site.

Jim

Pfui

5:50 pm on Feb 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Report Abuse of AWS Services
http://aws.amazon.com/contact-us/report-abuse/

Well, it might work... :)

incrediBILL

11:40 pm on Feb 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just put AWS's IPs in the server firewall and sleep better at night.

Pfui

7:31 pm on Feb 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"spbot" continues to relentlessly hit multiple sites 24/7 but at least it's always only asked for robots.txt and heeded its full Disallow. No longer.

Regular* Pestilence:

ec2-75-101-172-174.compute-1.amazonaws.com
Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )

09:18:44 /robots.txt

New Twist: <15 min. later

ec2-72-44-48-77.compute-1.amazonaws.com
Mozilla/5.0 (compatible; spbot/1.0; +http://www.seoprofiler.com/bot/ )

09:30:20 /robots.txt
09:30:21 /

*429 robots.txt requests in nine days.
Googlebot's total for the same period? 440.

From the bot-runner site's Code of Ethics: "Some search engine optimization companies and software developers use unethical tricks and techniques..."

Couldn't have said it better myself.

incrediBILL

8:13 pm on Feb 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Now that everyone is attempting to offer cloud computing options I'm expecting to see a lot more multiple IP bots running around similar to the mess at AWS.

Pfui

4:04 am on Feb 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Note new version number. (Only saw jump from 1.0 to 1.2.) No apparent conduct changes -- still hitting way too often (e.g., 26 robots.txt hits x3 hours). Now hitting similarly on two disparate sites.

ec2-67-202-0-47.compute-1.amazonaws.com
Mozilla/5.0 (compatible; spbot/1.2; +http://www.seoprofiler.com/bot/ )

robots.txt? Yes (...this time)

Pfui

7:03 am on Mar 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Looks like someone keeps tweaking the bot, version-wise. Too bad they don't just make it stop;)

ec2-67-202-10-125.compute-1.amazonaws.com
Mozilla/5.0 (compatible; spbot/2.0.1; +http://www.seoprofiler.com/bot/ )

robots.txt? Yes

tangor

7:16 am on Mar 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Without revealing our content (remembering tos) I'm curious as to which of the following sites are getting hit most by these bots AND how many requests per week one considers to be too many:

commerce
authority
hobby

I have all three above. My 4 commerce sites are hit 3x over the other two categories, 2 authority and 1 hobby.

As long as robots.txt is the only file abused (honored) I don't have a problem as it is a very small whitelist file. On the other hand I do enjoy this forum which reveals abuses to bandwidth by bots. I'm merely seeking perspective on this subject.

jmccormac

9:29 am on Mar 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My main site is getting hit every few hours by this despite originally having had its UA blocked in the robots.txt file. I had to 403 it and it still keeps hammering away. Also 403 blocked on another of my sites but again it still keeps hammering away. At this stage an IP ban is probably the only way to deal with it. My main site is an authority site and the other is a country level web directory.

Regards...jmcc

jdMorgan

6:01 pm on Mar 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Authority site mainly, but commerce and hobby sites as well.

After fooling around with robots.txt and 403 responses, I found that the only way to make spbot go away is to feed it *something* with a 200-OK response. So, I gave it a custom page outlining seoprofiler's various technical shortcomings, and perhaps they'll publish *that* as the "profile" of my site... :)

Oh, and this "custom page" is indeed disallowed by robots.txt, so it's their own fault if they take it...

Jim

Pfui

10:12 pm on Mar 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@tangor: Similar to Jim: Authority site mainly, but commerce and hobby sites as well.

@all:

1.) Another incremental version change:

ec2-174-129-146-92.compute-1.amazonaws.com
Mozilla/5.0 (compatible; spbot/2.0.2; +http://www.seoprofiler.com/bot/ )

robots.txt? Yes

2.) In the last three weeks, for just one site:

16 hits: spbot/2.0
217 hits: spbot/2.0.1
70 hits: spbot/2.0.2
----
303 hits total (70 more than next-closest bot, AskJeeves)

I'll try Jim's 'feeding thing' in hopes this bot will go away for good. But spending time custom-coding to deflect a cloaked bot-runner's irresponsibility irks me. Kind of like having to change your phone number to stop a crank caller.