homepage Welcome to WebmasterWorld Guest from 54.167.244.71
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Come on Yahoo. Really?
bobothecat2




msg:4632277
 10:41 pm on Dec 19, 2013 (gmt 0)

98.138.240.153 - - [19/Dec/2013:15:39:22 -0700] "GET /robots.txt HTTP/1.1" 403 306 "-" "User-Agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp"

Been seeing a lot of these over the past few days. Guess it shouldn't be too hard to see why the requests have been denied.

[edited by: Ocean10000 at 3:31 am (utc) on Dec 20, 2013]
[edit reason] Broke autolink [/edit]

 

dstiles




msg:4633438
 4:59 pm on Dec 24, 2013 (gmt 0)

I've seen a lot of weird yahoo hits overnight in the 68.180.224.0/24 range. They carry the proper slurp UA below but use a proxy WITHIN the same /24 range. For example...

Actual IP: 68.180.224.168
Proxy IP: 68.180.224.168
Proxy: stolesole.corp.gq1.yahoo.com[44B4E0E5] (ApacheTrafficServer/4.0.1)
UA: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Proxying within the same range? There would appear to be some testing going on. rDNS does not admit it's a crawler so it got banned anyway but I have now 403'd the whole /24.

[edited by: phranque at 9:07 pm (utc) on Dec 24, 2013]
[edit reason] fix url [/edit]

wilderness




msg:4633521
 5:11 am on Dec 25, 2013 (gmt 0)

A recent and similar thread [webmasterworld.com]

lucy24




msg:4633527
 6:45 am on Dec 25, 2013 (gmt 0)

:: delayed reaction ::

Are you saying that the UA string begins with the literal text "User-Agent: "? Or was that just an artifact of typing the post?

Do you really want to lock people out from robots.txt? It just gives them an excuse to say "Well, I wanted to obey robots.txt ::whine:: but they wouldn't let me see it!"

bobothecat2




msg:4633541
 11:36 am on Dec 25, 2013 (gmt 0)

Are you saying that the UA string begins with the literal text "User-Agent: "?


Yes. "User-Agent" is part of their UA string, which is what caused them to get blocked in the first place.

keyplyr




msg:4633578
 11:49 pm on Dec 25, 2013 (gmt 0)




Yes. "User-Agent" is part of their UA string, which is what caused them to get blocked in the first place.

ditto

dstiles




msg:4633846
 8:38 pm on Dec 27, 2013 (gmt 0)

My Christmas Eve posting was obviously a result of too much hurry (certainly not booze!). :(

Apart from mucking up a url (? thanks, phranque) I duplicated the IPs - they should differ in the final numerical subset - sorry, forget what it should be now. :)

Apologies all round!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved