Welcome to WebmasterWorld Guest from 54.198.221.13

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Yahoo! Slurp/3.0 used for iffy purposes?

26 hits to "list.php" files -- on non-PHP site

     
11:52 am on Apr 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


The following never-seen-before-today hits to non-existent files in existing directories look more like rapid-fire PHP exploits, except that the host and UA are Yahoo. Anyone else notice Slurp/3.0 running this amok lately? -Annie

Notes: In addition to robots.txt, the only accurate, real-file hits are marked [okay]. The site is Yahoo-authenticated and Site Explorer's allowed URL list is accurate.
-----
llf320021.crawl.yahoo.net
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...]

02:27:44 [okay]
02:28:55 [okay]
02:47:13 /robots.txt
02:49:26 /dir/list.php?show=4
02:51:09 /dir/list.php?show=38
02:52:14 /dir/list.php?show=5885
02:52:44 /dir/list.php?search=LAFUMA
02:53:13 /dir/list.php?show=30
02:54:01 [okay]
02:54:51 /dir/contact.php
02:55:21 /dir/list.php?show=9
02:55:51 /dir/list.php?show=5
02:56:21 /dir/list.php?show=3902
02:56:51 /dir/list.php?expand=10
02:57:21 /dir/list.php?show=28
02:57:51 /dir/list.php?search=Flow
02:58:21 /dir/list.php?search=Atomic
02:58:51 /dir/list.php?show=36
02:59:21 /dir/list.php?show=853
02:59:51 /dir/list.php?expand=2
03:00:51 /dir/list.php?expand=3309
03:01:21 /dir/list.php?show=22
03:02:21 /dir/list.php?search=HIGH%20PEAK
03:02:51 /dir/list.php?search=ELAN
03:03:51 /dir/register.php
03:04:21 /dir/list.php?show=4851
03:04:51 /dir/list.php?search=Tyrolia
03:05:21 /dir/list.php?expand=20
03:05:51 /dir/list.php?show=51
03:06:21 /dir/forum
03:06:51 /dir/list.php?show=5866
03:07:21 /dir/list.php?show=27
03:15:50 [okay]
03:20:12 [okay]
03:31:24 /dir/list.php?show=3903
-----

P.S. Posted here rather than in "Yahoo Search Engine and Directory" because those convos seemed more about SERPs than Slurp activity. Apologies if in wrong place.

P.P.S. Congrats, IncrediModBILL! :)

5:44 pm on Apr 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5461
votes: 3


Here's a few months old thread:
[webmasterworld.com...]
6:58 pm on Apr 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member hobbs is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 19, 2004
posts:3056
votes: 5


Personally I am [ this ] close to banning Yahoo all together.
7:00 pm on Apr 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5461
votes: 3


Personally I am [ this ] close to banning Yahoo all together.

You should be ashamed Hobbs ;)

Where your sense of history, tradition and dedication ;)

Don

7:18 pm on Apr 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member hobbs is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 19, 2004
posts:3056
votes: 5


I will be the first in line to visit them in a fossil museum Don ;-)
9:28 pm on Apr 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 24, 2002
posts:894
votes: 0


I am [ this ] close

I'm even closer, on 5 out of 6 sites one Y! range is banned for excessive crawling. One site with 300+ page was only showing 13 in the index while for months Y! had been a permanent feature on the server. On 2 out of 6 sites a second range is banned for repeatedly tripping the bot trap.

Don, I'm all for history and tradition but I also expect the same from them. ;o)

9:46 pm on Apr 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Hey, Don:) I'd read over that thread, thanks, and some other ones and it looks like people were/are still iffy about banning Slurp 3.0 entirely, and/or no one's sure of the eventual effect(s).

FWIW: 10-plus hours in, Slurp 3 is still hitting away at nonexistent "list.php" files -- in between legit files -- even after I smacked it with mod_rewrite four hours ago. (sighs) Then again, Slurp China drops by umpteen times/day and it's been rewritten for years...

FWIW Redux: I've already blocked all Y crawlers but Slurp (regular) and Slurp DE. And I'm this close to blocking the latter because .de is right up there on my Countries Spawning Bad Bots list.

Is anyone else blocking Slurp 3.0, and/or Slurp DE? Any noticeable drop in SERPs or good traffic?

10:31 pm on Apr 12, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


no one's sure of the eventual effect(s).

We're sure of the eventual effects: ZERO YAHOO TRAFFIC!

I would check my traffic stats before making such a rash decision.

11:00 pm on Apr 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


How would you handle relentless hits by Slurp 3.0 to nonexistent files, Bill? How long until you pulled its plug one way or another? Or would you ignore it?

(Since this is new (mis)behavior, I figured I'd block Slurp 3.0 for a while, then open it back up and watch. We shall see.)

4:35 am on Apr 13, 2008 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10103
votes: 550


Slurp (Yahoo IP range) and it's brother Slurp (Inktomi IP range) crawl hundreds of my main site's pages almost every day, whether I've updated or not.

I have valid expiry headers. I have a valid and accurate sitemap.xml. No other bots screw up like this. For a major player, they are extremely rude and egocentric when it comes to respecting our properties.

4:49 am on Apr 13, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 26, 2006
posts:1623
votes: 0


I've considered blocking "Slurp DE"
1:15 pm on Apr 13, 2008 (gmt 0)

Junior Member from DE 

10+ Year Member

joined:June 25, 2005
posts:182
votes: 1


[webmasterworld.com...]
Apparently Yahoo! Slurp DE is the crawler for a (D)irectory (E)ngine service that crawls preferred content explicitly listed by Yahoo! Search content service partners.

Slurp DE will respect robots.txt rules for User-Agent: Slurp DE or User-Agent: Yahoo! Slurp DE. If those user agents are not listed Slurp DE will obey User-Agent: Slurp.

11:49 am on Apr 29, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member hobbs is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 19, 2004
posts:3056
votes: 5



68.180.139.zzz (Yahoo)

"REAP-crawler Nutch/Nutch-1.0-dev (Reap Project; [reap.cs.cmu.edu...] Reap Project)"

nice choice of name 5 reap and 2 nutch in one UA?

On the page it says:

The REAP crawler is a web robot that sifts the web looking for documents that can be used by the REAP project, a research project at Carnegie Mellon University that develops software to help people that are learning English to improve their vocabulary skills

I just got it, this is not Yahoo, but a site hosted at Yahoo, this could get confusing, anyone has the Yahoo client hosting Ip ranges?

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members