http://www.webmasterworld.com Welcome to WebmasterWorld Guest from 38.103.63.16
register, login, search, glossary, subscribe, help, library, PubCon, announcements , recent posts, unanswered posts
Subscribe and Support WebmasterWorld
Home / Forums Index / The Search Engine World / Search Engine Spider Identification
Forum Library : Charter : Moderators: incrediBILL

Search Engine Spider Identification

  
Possible Bot or Spammer?
Or is this Live bot?
EarleyGirl


#:3265933
 10:09 pm on Feb. 27, 2007 (utc 0)

I see something strange in my access log. At first glance, it appears someone came in from MSN Live on a search for "airlines" to my site which has nothing to do with airlines. That's the first red flag. The log file shows this:

tide526.microsoft.com - - [11/Feb/2007:02:10:22 -0500] "GET / HTTP/1.1" 200 3612 "http://search.live.com/result.aspx?q=airlines&mrt=en-us&FORM=LVSP" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; WOW64; SV1)"

Yet, doing a search for this on Live brings an error page. A true search would look like this:
search.live.com/results.aspx?q=airlines&mkt=en-us&FORM=LIVSOP
Notice the differences:
results.aspx and LIVSOP and MKT
Looks awfully suspect to me. Can someone shed some light? Is this really a microsoft bot? If it really is from microsoft.com, why would they be performing fake searches?

Edit: I just checked January's log file. They used keyword "hydrocodone" last month (again, nothing to do with my site). What is up?

[edited by: EarleyGirl at 10:28 pm (utc) on Feb. 27, 2007]

Brett_Tabke


#:3265966
 10:41 pm on Feb. 27, 2007 (utc 0)

First, that user and host is ms corporate. I would say you are being visited by a human checking qc on selected keywords.

wilderness


#:3266065
 12:30 am on Feb. 28, 2007 (utc 0)

tide526.microsoft.com equals 207.46.18.30

Dan's page "DID" stay current on IP ranges of all the major bots.
His site however has changed.
An old link from Archive org
http://web.archive.org/web/20060603044348/http://joseluis.pellicer.org/ua/

MS and all the major SE's are offering a variety of tools, tool bars, plug ins and such which offer a variety of IP ranges not formerly utilized.

EarleyGirl


#:3266985
 7:30 pm on Feb. 28, 2007 (utc 0)

Odd that hydrocodone (a drug) would be used as a keyword for checking qc. My site has nothing to do with airlines or drugs nor would it appear in a search using those keywords.

Also, is the search being done from a desktop or different application? It doesn't work from a browser when I try it. It brings up an error, the URL isn't right. It might be from Microsoft but I don't think they were coming in from a search. It just appears that way. I wonder why?

First, that user and host is ms corporate. I would say you are being visited by a human checking qc on selected keywords.

Brett_Tabke


#:3266988
 7:33 pm on Feb. 28, 2007 (utc 0)

That doesn't mean your site wasn't seen in those kw's - and hence the qc check behind the scenes...

volatilegx


#:3267046
 8:04 pm on Feb. 28, 2007 (utc 0)

My site tracks specifically search engine spiders. I've never seen 207.46.18.30 displaying spider-like behavior.

EarleyGirl


#:3267075
 8:36 pm on Feb. 28, 2007 (utc 0)

That doesn't mean your site wasn't seen in those kw's - and hence the qc check behind the scenes...

You've lost me on that one. I don't see how that's possible. I don't have a site search (or a wiki or comments or a forum for that matter) so that wasn't used by a spammer. Up until this month, it was a Flash site, nothing but an swf - no words to pick up.

Also, for the months of November and December, "www" was used as the keyword from tide525.microsoft.com. Why would that need a qc check?

[edited by: EarleyGirl at 8:38 pm (utc) on Feb. 28, 2007]

Brett_Tabke


#:3267099
 9:06 pm on Feb. 28, 2007 (utc 0)

They were checking THEIR index behind the scenes before pushing it live.

EarleyGirl


#:3267106
 9:17 pm on Feb. 28, 2007 (utc 0)

Thanks Brett. My apologies if I seem slow to catch on. I was just trying to understand it.

Brett_Tabke


#:3267121
 9:35 pm on Feb. 28, 2007 (utc 0)

I know it is all complicated.

What happens is:

- they build an index.
- it has errors in it.
- they run a util on high value kws that flags possible problems for a hand check.
- they do the hand check and delete obvious mistakes.

EarleyGirl


#:3267129
 9:49 pm on Feb. 28, 2007 (utc 0)

Thanks Brett. That clarifies things.

 

Home / Forums Index / The Search Engine World / Search Engine Spider Identification
All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
WebmasterWorld ® and PubCon ® are a Registered Trademarks of WebmasterWorld Inc.
© WebmasterWorld Inc. / SearchEngineWorld 1996-2008 all rights reserved