Forum Moderators: open

Message Too Old, No Replies

Slurp is turning bad.

         

waynne

10:03 am on Oct 9, 2008 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm sure I'm not alone here but it would be interesting to hear your experiences.

Slurp is ignoring robots.txt directives. I have blocked a directory in robots.txt 6 months ago. Then I put a spider trap in this directory to trap bad spiders and every few days slurp drops right in it!

Crawling speed is also very quick and causing problems with load on a massive Mysql heavy site. I have used the

User-agent: Slurp
Crawl-delay: 135

Directive for a 135 seconds display, and 1 month on am still getting requests from Slup on the 74.6.x.x ip blocks!

Due to the low volume of traffic I get from yahoo which is only around 200 visitors a day and considering they are the second largest load on my bandwidth I am considering banning them from my server.

I filed a report to Yahoo and so far have had no response and Slurp is still being badly behaved.

jdMorgan

1:10 am on Oct 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, if you don't say, we gotta ask:

  • robots.txt is in site root folder?
  • No other policy records in that robots.txt file might be interpreted as applying to Slurp?
  • robots.txt syntax is correct? -- Blank line after each & every policy record, including the last one?

    I have seen Slurp misbehave before, but not recently. It has never ever fetched my spider-bait, though.

    Jim

  • Tastatura

    1:25 am on Oct 11, 2008 (gmt 0)

    10+ Year Member



    I haven't seem Slurp misbehaving recently either (nor it got into my spider trap from robots file)

    waynne

    11:06 am on Oct 13, 2008 (gmt 0)

    10+ Year Member Top Contributors Of The Month



    I was hoping user-agent:* was read by Slurp I have duplicated the rules for * to user-agent: Slurp and will report back on my findings.

    I didn't have a trailing slash either.

    waynne

    11:07 am on Oct 13, 2008 (gmt 0)

    10+ Year Member Top Contributors Of The Month



    Oops! I meant trailing blank line - not slash - sorry!

    g1smd

    10:12 pm on Oct 15, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    To clarify, you do need a blank line after each record, including the last one.

    In fact, I always end robots.txt with at least 2 or 3 blank lines.

    jdMorgan

    11:19 pm on Oct 15, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I was hoping user-agent:* was read by Slurp

    If by that you mean:

    # Disallow all robots from fetching /cgi-bin files
    User-agent: *
    Disallow: /cgi-bin


    then Slurp should accept that, without requiring a policy record addressed specifically to Slurp.

    Jim