homepage Welcome to WebmasterWorld Guest from 54.147.196.159
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
What I like better about MSN compared to Google
so far
sdani

10+ Year Member



 
Msg#: 117 posted 10:47 pm on Jul 4, 2004 (gmt 0)

My robots.txt restricts crawling of anything in /legal directory. My privacy page, which is in /legal directory, restricts crawling and indexing via meta tags..
1. On site:mydomin.com google still lists my privacy page as one of the web pages on my site, because it found the link on my home page (although it does not show any content from that page and obeys noindex tags).
2. MSN shows all the pages of my site if I do a site:domain.com, EXCEPT this privacy page link and another one where I have specified NoIndex.

So.. what I like is that MSN fully obeys the robots.txt and NoIndex tags, while Google shows those links.. may be just to boost the number of pages in its index.

 

Hagstrom

10+ Year Member



 
Msg#: 117 posted 8:30 am on Jul 9, 2004 (gmt 0)

My robots.txt restricts crawling of anything in /legal directory. My privacy page, which is in /legal directory, restricts crawling and indexing via meta tags..

If your robots.txt prohibits crawling of the directory, then how is Google supposed to see your meta tags?

sdani

10+ Year Member



 
Msg#: 117 posted 10:10 am on Jul 9, 2004 (gmt 0)

Thats just an additional precaution, that if it ever gets to the page, I tell it, not to index it.

py9jmas

10+ Year Member



 
Msg#: 117 posted 10:45 am on Jul 9, 2004 (gmt 0)

what I like is that MSN fully obeys the robots.txt and NoIndex tags, while Google shows those links..

Did Googlebot try and retrieve URLs forbidden in robots.txt? That is what the Standard for Robots Exclusion is all about - retrieval. The main reason it was introduced was to stop robots getting 'lost' in infinite URL spaces generated by CGI programs - not to stop a search engine linking to a page.

[robotstxt.org...]

sdani

10+ Year Member



 
Msg#: 117 posted 11:26 am on Jul 9, 2004 (gmt 0)

py9jmas, okay so what is the way to tell a search engine not to link to a page? and what is the use of listing (linking) a page and just increasing the page count if the page should not be indexed?

rfgdxm1

WebmasterWorld Senior Member rfgdxm1 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 117 posted 12:34 pm on Jul 9, 2004 (gmt 0)

>Did Googlebot try and retrieve URLs forbidden in robots.txt? That is what the Standard for Robots Exclusion is all about - retrieval. The main reason it was introduced was to stop robots getting 'lost' in infinite URL spaces generated by CGI programs - not to stop a search engine linking to a page.

Right. Look at the name of the file: robots.txt. Basically it is how a site tells a spider "I don't want your bot wasting the bandwidth *I* pay for". The idea wasn't privacy. If someone wants privacy, then don't put the content on the WWW without password protection. Anything less has the problem it is nothing but an attempt at security by obscurity.

Leosghost

WebmasterWorld Senior Member leosghost us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 117 posted 12:49 pm on Jul 9, 2004 (gmt 0)

what I like is that MSN fully obeys the robots.txt

Errhhhmmm! Are you taking about the same MSN and the same internet as the rest of us ..cos some of their bots have been totally ignoring robots .txt whenever they feel like it for along time now ..and are currently doing so again ...

Maybe the name of the game is to eventually be able to put up a bigger "indexed pages" number on the "Search page" than google ..but if they keep this up there are gonna be some very specific robot bans going in all over ...

On the other hand Redmond could send out checks for all the bandwidth they are costing us while they do their market research ....< only in my dreams >

mack

WebmasterWorld Administrator mack us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 117 posted 12:53 pm on Jul 9, 2004 (gmt 0)

Thats just an additional precaution, that if it ever gets to the page, I tell it, not to index it.

It's very possible for pages within prohibited areas to be displayes in the serps. This can happen when google knows the page exists because there are links pointing to it. Very often the page will appear in the results as title with no description. The title will be based on anchor I assume.

Mack.

GoogleGuy

WebmasterWorld Senior Member googleguy us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 117 posted 9:54 pm on Jul 9, 2004 (gmt 0)

What mack said. If a page is in robots.txt, we won't crawl it, but we can still return it as a search result if we have good evidence that the page is relevant to a query. In this case, we'll return just the url (no title and no cached page because we didn't fetch the page itself).

Here's a good example of why that can help users. For a long time, the California Department of Motor Vehicles (DMV) had a robots.txt that didn't let search engines crawl their site. But for a query like "california dmv" we could still return the proper url, even though we weren't able to fetch the page.

sdani, if you don't want the page to show up at all, you can guarantee that by letting Google see the noindex meta tag by fetching that page.

For the curious readers: we were eventually able to convince the DMV to let search engines crawl the site, but we did have to make an appointment and then wait in line for a while. ;)

sdani

10+ Year Member



 
Msg#: 117 posted 10:02 pm on Jul 9, 2004 (gmt 0)

Thanks GoogleGuy.. I did not know that if I allow from robots.txt and specify noindex metatag, then the url will not show up atall.

I think this works (for me atleast).
SD

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved