Forum Moderators: mack

Message Too Old, No Replies

MSN Tech Preview Stays Fresh

Pages created in last 48 hours showing up.

         

mark aardsma

3:06 am on Jul 10, 2004 (gmt 0)

10+ Year Member



Just noticed that my pages were not showing up in MSN due to an apparent URL format preference of the MSN spider/index. Created new URLs and pages are in the tech preview after just a couple of days. I was going to post and ask if the tech preview is a one-time snapshot or will be updated, but now I know. Hope the quick inclusion of new pages will be a feature of the live MSN search as well. -Mark

Receptional

12:13 pm on Jul 10, 2004 (gmt 0)



due to an apparent URL format preference of the MSN spider/index

Care to elaborate for the forum? :)

mark aardsma

3:02 pm on Jul 10, 2004 (gmt 0)

10+ Year Member



Sure, not trying to be secretive, just not certain which change made the difference. I had mod_rewritten URLs like:

[my-domain-name.com...]

Which did not show up in MSN at all, although the home page on the site was ranking really well. I figured that the number of dashes and/or the lack of a file extension were the problem. I couldn't find any results on MSN with more than three dashes, and I don't think I saw any without a file extension except home pages. I changed to:

[my-domain-name.com...]

Which showed up in MSN soon after the spiders next visit, and surprised me with the quick update of the MSN preview index. The only link to the new page was on my home page.

Now to see what Google thinks of the new format. I don't want to sabotage Google rankings (the old format was doing great in Google) for a search engine that might not be live for a year, but just trying to learn by experiment.

My hope is that since the new URL format looks just like your average every-day web page no search engines will decide it contains spammy clues.

I have to say that if MSN search is really using "clues" like number of dashes to decide what pages are good quality that is not impressive and of short-term usefulness at best. In reality the number of dashes has no direct relationship to the quality of the page (although it may have a statistical relationship). My site was a good example of quality pages that happened to have several dashes in the URLs. And if I should decide to post spammy pages on my site now, it would be a good example of spammy pages with good "clean" URLs. A successful search engine will need to determine the actual quality of the page through intelligent methods, not by guessing at clues. -Mark

Scarecrow

5:01 pm on Jul 10, 2004 (gmt 0)

10+ Year Member



Google will index name_here.html but you won't get any Google juice out of name_here as opposed to name-here.html. You should have tried the hyphen with the extension, not the underscore with the extension.

Google considers the hyphen the equivalent of a space, so it sees name-here as two words. The underscore makes name_here one word.

Try searching in Google for john-smith and then for john_smith and you'll see what I mean.

You are much better off with name-here, even if MS doesn't like it. But I really doubt MS would be bothered by this. If they are, it's their problem and they had better get it fixed. I think it was your lack of a file extension that made the difference.

mark aardsma

6:08 pm on Jul 10, 2004 (gmt 0)

10+ Year Member



Hi Scarecrow,

Thanks for the input. I'll do some experimenting and post back if I learn anything useful.

Mark

Receptional

6:44 pm on Jul 11, 2004 (gmt 0)



The original Microsoft research looked like it was going to set the "dash" spam filter to five, not 3 ( see section 3 of research paper here [research.microsoft.com]) So looks like they decided the filter should be more extreme.

Dixon.

mark aardsma

9:04 pm on Jul 12, 2004 (gmt 0)

10+ Year Member



Thanks for posting that link. I read the paper and found it interesting, but not very impressive. I think this sort of statistical guessing at what is spam is entirely the wrong approach.

The paper refers to dashes in the hostname only, not in the other portions of the URL. However, based on the results I am seeing in the tech preview MSN search, I didn't find any results with more than one dash in the filename either.

Mark