Welcome to WebmasterWorld Guest from 188.8.131.52
Notes: In addition to robots.txt, the only accurate, real-file hits are marked [okay]. The site is Yahoo-authenticated and Site Explorer's allowed URL list is accurate.
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...]
P.S. Posted here rather than in "Yahoo Search Engine and Directory" because those convos seemed more about SERPs than Slurp activity. Apologies if in wrong place.
P.P.S. Congrats, IncrediModBILL! :)
I am [ this ] close
Don, I'm all for history and tradition but I also expect the same from them. ;o)
FWIW: 10-plus hours in, Slurp 3 is still hitting away at nonexistent "list.php" files -- in between legit files -- even after I smacked it with mod_rewrite four hours ago. (sighs) Then again, Slurp China drops by umpteen times/day and it's been rewritten for years...
FWIW Redux: I've already blocked all Y crawlers but Slurp (regular) and Slurp DE. And I'm this close to blocking the latter because .de is right up there on my Countries Spawning Bad Bots list.
Is anyone else blocking Slurp 3.0, and/or Slurp DE? Any noticeable drop in SERPs or good traffic?
(Since this is new (mis)behavior, I figured I'd block Slurp 3.0 for a while, then open it back up and watch. We shall see.)
I have valid expiry headers. I have a valid and accurate sitemap.xml. No other bots screw up like this. For a major player, they are extremely rude and egocentric when it comes to respecting our properties.
Apparently Yahoo! Slurp DE is the crawler for a (D)irectory (E)ngine service that crawls preferred content explicitly listed by Yahoo! Search content service partners.
Slurp DE will respect robots.txt rules for User-Agent: Slurp DE or User-Agent: Yahoo! Slurp DE. If those user agents are not listed Slurp DE will obey User-Agent: Slurp.
"REAP-crawler Nutch/Nutch-1.0-dev (Reap Project; [reap.cs.cmu.edu...] Reap Project)"
nice choice of name 5 reap and 2 nutch in one UA?
On the page it says:
The REAP crawler is a web robot that sifts the web looking for documents that can be used by the REAP project, a research project at Carnegie Mellon University that develops software to help people that are learning English to improve their vocabulary skills
I just got it, this is not Yahoo, but a site hosted at Yahoo, this could get confusing, anyone has the Yahoo client hosting Ip ranges?