Forum Moderators: open

Message Too Old, No Replies

Personifi heads up

HEAD request with Wget/1.10.2 and then GET with Mozilla/5.0 X11

         

caribguy

10:37 pm on Sep 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Incoming from 67.228.167.nnn

Any info on who these folks are and what they do, other than the disingenuous

provides powerful, easy to implement solutions that allow web properties, ad networks, and mobile providers to optimize the relevancy of ads, content, and recommendations
bla bla on their site?

A couple of oddities:

  • it's looking at multiple domains hosted on this ip address
  • www.example.com 67.228.167.nnn - - [29/Aug/2008:13:20:36 -0500] "HEAD / HTTP/1.0" 200 - "-" "Wget/1.10.2"
  • www.example.com 67.228.167.nnn - - [29/Aug/2008:13:20:36 -0500] "GET / HTTP/1.1" 200 123456 "-" "Mozilla/5.0 (X11; U; Linux x86_64; de; rv:1.8.1.1) Gecko/20061223 BonEcho/2.0.0."
  • anytime the server response is not 200 it will imediately follow with a GET
  • tries to HEAD development.example.com for a management page that is unique to our implementation, that domain name does not exist in public ip space and is redirected to www.example.com

    It's slow crawling, requests from 40 seconds - 16 minutes apart. From its asking for our dev server, I can only presume that we have some data leakage somewhere that I need to look at (either on the dev server, one of the machines on our local network, the dns provider, the isp, carnivore, ?).

    Could this be a Phorm-like traffic intercepting appliance?

    Don't like this at all!

  • Mokita

    12:32 am on Sep 2, 2008 (gmt 0)

    10+ Year Member



    That IP belongs to SoftLayer. I've had the range blocked for a long time:

    deny from 67.228.

    That'll stop all their activities on your server or site no matter what UA they try to employ ;)

    [edited by: Mokita at 12:34 am (utc) on Sep. 2, 2008]

    caribguy

    12:46 am on Sep 2, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Thank you! I'm also seeing:

    67.228.207.nnn - - [27/Aug/2008:20:51:16 -0500] "GET /file.html HTTP/1.1" 200 123456 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    nslookup 67.228.207.nnn
    Non-authoritative answer:
    nnn.207.228.67.in-addr.arpa name = adsoft-development.nnn.

    ...and blocked now

    keyplyr

    6:20 am on Sep 2, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    That IP belongs to SoftLayer. I've had the range blocked for a long time - Mokita

    I agree. Lots of bad stuff comes from all their IP ranges.

    And caribguy, as you've found out, that Googlebot UA is spoofed. I've had good results using a white list of IP addresses for all things Google, since they get spoofed the most.

    caribguy

    7:56 pm on Nov 19, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Thanks keyplyr, looks like that's the way I'll have to deal with it as well...

    More spoofing from adsoft-development.com today: 74.86.176.nnn goodbye and farewell to

    NetRange: 74.86.0.0 - 74.86.255.255
    CIDR: 74.86.0.0/16

    dstiles

    11:37 pm on Nov 19, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Re: data leakage - we discovered one of our private URLs in google - not directly but it had been ripped from a public computer and added to a load of compromised #*$! forums.

    We think it was ripped using a keyboard monitor but we also discovered that the person whose private access it was used a google search to go to the URL instead of the browser Location/Address bar. It seems a lot of people do this. As far as I know google do not add such searches to their bot scan lists. :)

    caribguy

    8:35 pm on Nov 21, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    As far as I know google do not add such searches to their bot scan lists. :)

    Are you suggesting that 3rd party browser extensions / toolbars could intercept searches and send the data to their developers? /sarcasm

    That could have happened in my case too. While I try to keep my development environment as clean as possible, I did pull up the test/internal site from other machines on our network - wouldn't be surprised if those machines would be full of junkware...

    dstiles

    8:52 pm on Nov 21, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Oh, I know they do! :)

    I'm merely saying I don't think google use that info to follow up on URLs. Or maybe they do. I do know there is no actual occurrence of that private URL in google's index.