Forum Moderators: mack
Just saw this guy, fell into a spider trap:
131.107.137.47 - - [11/Apr/2003:01:31:08 -0600] "GET /a/deep/link.html HTTP/1.1" 200 12589 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"
No referer, came in on a deep link (like from a SE), and d/l pages but no images. After about 5 hits, he tried to grab a trap, and got banned. Grabbed a page every 5 secs or so...
IP resolves to Redmond.... did Bill just get himself banned?
dave
It's always there on my log files.Through the day.
Can somebody clarify what is this?Do have any potential problem with this.
Aravind
I found these forums whilst trying to find out more about the 'MicrosoftPrototypeCrawler'. We've had a slight grazing from this bot and I wanted to know whether it was legitimate or not - the lack of crawler-info URL was highly suspicious.
Much as I hate to diss a good conspiracy theory, our website sells archaeology/history books and has never mentioned MS. The pages it has requested are all listed in Google [never see an http_referer though] so maybe it's working off that.
I wait with baited breath to find out what's really goin' down with this thing. Haven't banned it yet as it's only requested 11 pages so far (since 18th April).
Keep up the good work, I've learned a heck of a lot already this afternoon here.
Best
Tom
I can buy that there is/will be a MS SE bot and that the bot’s IP will be 131.107.163.46 through 131.107.163.50 [webmasterworld.com], but I don’t think 131.107.137.47 necessarily has anything to do with the MS SE bot.
Only time will tell.
[edit]
I’m sorry. I didn’t mean to question your integrity. I said in another thread that I sometimes leave things out. I left the message and then had to step out. While I was out I thought, oh crap, that could be taken the wrong way. And I was going to change it as soon as I got back, which I didn’t get a chance to because you were quicker than I.
I hope you understand my meaning and I truly apologize for any misunderstanding. Please accept it.
[/edit]
Prerequisites to project Code Name: Tahoe
Tracking and Viewing Changes on the Web
Jan 1996
[research.microsoft.com...]Information Retrieval & Analysis
[research.microsoft.com...]Code Name: Tahoe
Tahoe Graduates to SharePoint Portal Server
March 12, 2001
Informationweek.com
[informationweek.com...]
One of the most recent is a product code-named Tahoe. It's a search engine that will debut later this year as part of Microsoft's SharePoint Portal Server
And this is out. It only does intranets. But they have some experience in developing SE’s. So we should not see them crashing systems from day 1.
Now after thinking as long and as hard as I could about this, and comparing M$ history, what about the following? I may be really off in leftfield here, but consider the following.
M$ did not want to kill NN necessarily, they just didn’t want them to be a predominant player in the browser world. This is because M$ has a vision of every computer sharing information easily. And you have to admit, with IE, I can view .doc, .ppt, .xls, etc. all with IE. Thus selling more office products.
Now it could be a concept that major SE’s hold enough power that by changing their algorithms, or if one becomes more prevalent than another one, the SE’s could make or break a .com. This due to the fact, and indeed according to my research, about 50% of the people doing searches never go to page 2. And if changing a SE’s algorithm or one SE getting more users than another, causes too many high stake players to fall to page 2 or lower, it can crush a company. I think Mr. Gates feels that the SE’s have too much power. But I don’t think he wants to get into purchasing over 100,000 PC and the labor, etc. in setting them up to become a Public SE. Too much investment into what is already a saturated market. Keep in mind that Mr. Bill lost something like $1b in the dot com crash.
It is known that .NET has the capability and examples to build a SE. It is missing the filtering algorithms. There are already several 3rd party SE’s based on .NET one can buy.
What if they don’t want google’s traffic, they just don’t want google to have it either. By supplying companies and end users with the tools to create their own search engines based on just that person’s or company’s interests, and allowing that database to share it’s information with secure servers, it could become a distributed computing SE.
For example, Motorola in Chicago could set up their own SE bot to look for information/data that is important for their employees to get their jobs done more efficiently. Motorola in Florida does the same thing, and these 2 SE bots can share information between themselves. Then all the other Motorola facilities do the same. You get a company now with several internal SE bots just getting very specific information/data, they now save time weeding out information that is not important to the work their company is trying to do, and in a search you keep employees from surfing off to sites not work related. If you connect your suppliers into certain portions of the SE database, now they become smarter and can give you better service and products. If all the Fortune 1000 companies did this, what kind of impact would that have on the major SE’s?
It won’t make SE’s obsolete, but it could cut down considerably on their amount of traffic, thus making them less valuable and much less powerful. It also takes out a bunch of sites that have better marketing than products and removes them from traffic flow to major corporations.
Now, misinformation is a game that the CIA as well as big businesses use all the time. Nothing better than a rumor that isn’t true to get your competitors to create a business defense, but it is the wrong defense because of rumors and once they find out what is really going on, they cannot recover fast enough to defend off any attempts. This has happen numerous times in the business world.
While this is just a concept, I don’t think it is that far fetched. Especially for anyone who has read Bill Gates’ book. And of course, M$ sells even more .NET and IIS in the process.
Any opinions? Or have I totally lost it?
How much would it cost them to just get the hardware and ppl in place to start, let a lone to catch up and surpass google? Then add benefits for all those ppl. This way, they distribute the cost across many different companies and they would not have any investment. Only new sales.
Even if they threw as much money as is imaginable at it, how many years before they would start to see a return on that investment? I just don’t think the stockholders would wait that long. And it could cause the price of the stock to go down. A lot about this entire MS SE deal just doesn’t make much sense to me. As a matter of fact, I can’t see anything that makes sense about it.
carfac
I think they expand the SE to look outside of the firewall at filtered sites or based on where people already have bookmarks.
But what the heck do I know?
[edited by: jim_w at 5:18 pm (utc) on May 6, 2003]
A unified Search interface... Longhorn can instantly search from a variety of locales, including local files, contacts and the Internet. "Filter by" options can also be used to narrow down results
[betanews.com ]
.
From ZDNet
[zdnet.com.com...]
Of course, they’re raising every single flag regarding Windows .NET Server and pushing everyone’s attention that way, but for Longhorn (the codename of the next desktop version of Windows) there has been little announced or confirmed. Looking for confirmed facts about Longhorn, and possibly more importantly the versions of Windows AFTER Longhorn is like the proverbial search for the needle
Bill Gates likes to be in control. And if he can't be, then he does what ever it takes to take the control out of someone else hands. At least that is what it appears to me. The sad part here is I’m actually a pro-ms person.
martinibuster
It says search not crawl. I don't know. Samething?
One aim of Longhorn seems to be to integrate search into the desktop environment.
I hope not every Tom, Dick and Harry and of course Tina, Diana, and Hilary, (to be politically correct), will be able to create an indexing system on their desktop. It is already too hard to keep out email harvesters, etc. That would make it impossible. Not to mention, people selling bandwidth are going to get rich pretty darn quick and a lot of ppl that can’t afford to pay for any more bandwidth could be out of business.
Microsoft I said has a product that could, if they expand on it, compete with me. It deals with a business culture that was invented by Motorola in the 80’s called Six Sigma. Six Sigma has been adopted by such companies as GE and Honeywell, to name just a couple. Six Sigma, for it to work, requires a return on any investment in a very short time. This is based on the fact that a lot of companies spend too much on R&D and tooling up before they can get a return on the investment. Then the technology is obsolete and they never get any return. It also deals with measurements in the PPM, (parts per million) ranges.
korkus2000
I also think microsoft is savvy enough to take on a SE and see a return.
Yes I agree, they are savvy enough to do it, but, to me and based on my Six Sigma training, it is a question of after tooling up for such a project, could they go head-to-head with not just google, but at least yahoo also, and make a profit in a reasonable amount of time.
Didn’t I read here somewhere that google has over 100,000 or was it over 50,000 machines involved in the operation? (hell, I could have dreamed it, literally) The cost of the hardware, and the time to put it together, debug it, etc. would be years. Google didn’t start with that many, I’m sure they grew into that many a few at a time. So MS would have to start with that many and would still have to play catch up. Everything would have to go just right for MS if that’s the case. There is no margin for error, and that goes against the Six Sigma philosophy. Remember, what just six months ago, ad revenues were down, not only on the internet, but print ads as well? And if ad revenues could fall for no apparent reason, what else could? Traffic in general?
Look at MSN, it was suppose to be the end to AOL as I recall, now while I realize that AOL/Time Warrner shot themselves in the foot somewhat, MSN has not ended them. I don’t see Microsoft making the same kind of mistake again. I think Gates is too smart for that. But maybe I’m giving him more credit that he deserves.
If they do build a crawler, which it looks like they have
I’ve seen so much stuff published on the internet that was wrong, I don’t believe anything anymore until I have lots-o-proof.
I would suspect to see it in all programs and out on the web in every shape and form you can think of.
Curses! Maybe I’m just wishing that only larger companies will be sending out bots. Maybe if we all wish together really, really hard, only large companies will have the resources to do it. Of course like my mother use to say, ‘wish in one hand and pee in the other, …’
I’m probably wrong, but it is something to think about. I have a feeling that oneway or another our jobs are going to get harder.
Or are you talking about filtering results from a MS SE?Speculating, yes.
This is something that is integrated into Longhorn, "a refined search interface that lets users dig through local files, contacts, and the Internet."
Longhorn will also feature a brand new file system dubbed WINFS (Windows Future Storage), that intends to give users greater access to their information.
Integrating the web into the desktop environment has been a longtime aim of Microsoft. My speculative point is, if they are going to give users access to internet search, doesn't it make sense to give them a Microsoft crawled and controlled internet database?
and make a profit in a reasonable amount of time.One word: X-Box.
Speculating, yes.
One word: X-Box.
But isn’t that like comparing apples to oranges?
No.
My statement is a response to your questioning if MS would embark on a project if there were a question of their ability to "make a profit in a reasonable amount of time."
MS has a history of gritting their teeth and losing massive amounts of money for the sake of the long term goal. X-box is a perfect example of them knowingly losing money for the long term.
Red Herring article [redherring.com] (from last year)
Microsoft expects to lose $750 million in the current fiscal year ending June 30 and another $1.1 billion in the next fiscal year, according to a source familiar with the matter.
Windows 98 had an inbuilt Browser (IE 4) as a part of it's interface. The real reason was the Browser war(with Netscape) and to dominate their product.
Windows XP has an inbuilt Compression utility that infuriated third party vendors like Winzip.
Longhorn has "a refined search interface that lets users dig through local files, contacts, and the Internet."
This just fits into the Microsoft Pattern and goes with their aim of competing with google.
Their desired result is Microsoft everywhere for Everything.