Forum Moderators: open

Message Too Old, No Replies

Tide

Is this a new msn spider?

         

quiet

3:21 am on Oct 1, 2003 (gmt 0)

10+ Year Member



One of my sites just got completely spidered by tide###.microsoft.com. I'm not used to seeing this... so I'm assuming its a good thing.

Have anything to do with them moving away from looksmart?

q

coyote

3:30 am on Oct 1, 2003 (gmt 0)

10+ Year Member



Can you post an example from your log?

quiet

3:39 am on Oct 1, 2003 (gmt 0)

10+ Year Member



The names weren't in the logs but they traced to the tide names. The logs just show the IPs. 207.46.225.nnn (251,243)

207.46.225.243 - - [30/Sep/2003:22:50:28 -0400] "GET **page** HTTP/1.0" 200 8457 "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )"

I was surprised since I hadn't seen it before and it they zipped in while I was going through the logs.

And they were pretty thorough.

-q

martinibuster

3:43 am on Oct 1, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The only references I could find to this is here [macedition.com], where it's listed under, "Major service providers that have unique names for each user connection"

And a WebmasterWorld post [webmasterworld.com] from August 2002 where Brett guesses it may be an MSN user.

quiet

4:01 am on Oct 1, 2003 (gmt 0)

10+ Year Member



The pattern was more spiderish though. Could have been someone opening new windows and digging down and closing them right away or continuing down the tree... but it was like 5-10 seconds between hits and I don't see how anyone could have even found the links.

Would have had to be a site grabber prog then. That could be. Maybe I just have the msn/looksmart thang on my brain.

;)

q

(and thanks for the replies!)

coyote

4:19 am on Oct 1, 2003 (gmt 0)

10+ Year Member



Quiet, it sounds like someone running a bot. Were there any requests for images, externally linked JS and CSS files (if you use those)? If not, then it's a bot or site ripper. Ban by IP since it uses a common UA.

Also, MSN's search spider identifies itself. It only visited me once and, if I remember clearly, is called MSNbot (someone correct me if I'm wrong).

quiet

5:44 am on Oct 1, 2003 (gmt 0)

10+ Year Member



I agree at this point. Too bad.

It grabbed gifs but not the css's and didn't look for robots.txt either.

Guess I was hoping to hard that I was seeing something new.

Thanks tons!

q

added - I picture everyone saying "new isn't always good" ;)

wilderness

12:04 pm on Oct 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



there are some very extensive threads on these ms bots.
131.107 and 247.46.

the 131.107 thread is over 100 mails.

bull

5:10 am on Oct 5, 2003 (gmt 0)

10+ Year Member



getting the same thing now, distributed between 207.46.* and 131.107.* , again a spoofed UA. Did I mention I hate these things?

207.46.225.251 - - [05/Oct/2003:04:15:36 +0200] "GET /deep/oldfile.htm HTTP/1.0" 301 261 www.-.net "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
131.107.170.65 - - [05/Oct/2003:04:15:36 +0200] "GET /deep/newfile.htm HTTP/1.0" 200 4061 www.-.net "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
207.46.225.251 - - [05/Oct/2003:04:15:43 +0200] "GET /deep/olfile_d.htm HTTP/1.0" 200 3171 www.-.net "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
131.107.170.65 - - [05/Oct/2003:04:15:49 +0200] "GET /deep/apic.gif HTTP/1.0" 200 84 www.-.net "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"

Critter gets pix! but then:

131.107.170.65 - - [05/Oct/2003:04:19:08 +0200] "GET /deep/forcemetoframeset.html?/deep/orphanedpage_e.htm HTTP/1.0" 200 506 www.-.net "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
207.46.225.243 - - [05/Oct/2003:04:19:21 +0200] "GET /deep/indexd.html HTTP/1.0" 200 1185 www.-.net "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
207.46.225.251 - - [05/Oct/2003:04:19:29 +0200] "GET /deep/indexe.html HTTP/1.0" 200 1186 www.-.net "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"

and seems to be JS capable.