Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: mademetop
to the site as an experiment in order to reduce the number of copies of the site around. It seems to me that the SERPS cache has very little value to the general user, and several disadvantages to the site owner - and not just the fact that it could be considered a copyright infringement in itself. But I have a few questions:
<meta name="robots" contents="noarchive">
1. I am aware that Google and Yahoo claim that using the noarchive value will not affect spidering or ranking. Does anyone have any hard evidence to the contrary? The site in question does not use cloaking in any form.
2. I believe both Yahoo and Google respect the above meta tag, but the last I heard MSN does not. Is that still the case? I may ban msnbot completely if they continue to show a cached version.
3. If there is no disadvantage to using noarchive, why isn't everyone doing it? The only large site I know is using noarchive is WebmasterWorld.
The good thing about archiving is that it gives you proof that you had the content first. Check archive.org and you can usually find the approximate time that someone added your content to their site, if it has been crawled before.
Banned them, too! ;) I can easily prove the ownership of the content. However, I'm not really asking about methods to stop the theft, rather whether using "noarchive" is accompanied by particular problems when the site is not cloaked.
I'm using "noarchive" and MSN is respecting it.
Google will only add a "freshdate" if they are allowed to cache the page; MSN adds a freshdate regardless; Yahoo doesn't have freshdates.
Do you think noarchive tag will solve the hijack problem?
No. The noarchive is far from a panacea, and the particular situation is quite complex (not just the current wave of scraping, this is more of a plagiarism problem).
With the noarchive I simply want to reduce the number of places where the site content is available. I can keep an eye on visits to the site itself, but I am surrendering control when the content appears on a third-party site such as in a SE cache.
I'm using "noarchive" and MSN is respecting it.
That's good news.
I do know that using it has not interfered with (at least one of) my pages PR or SERP placement.
Even better news!
So, is it simply inertia which is stopping people from using noarchive, or is it continued concern that it will cause problems with ranking? Is there any advantage for the site owner to allow the search engines to offer a cached version a site?
I can keep an eye on visits to the site itself
Sounds interesting...but how many sites do you own or manage? Can you really keep an eye on the visits to the site and know the intention of visitors? Some of them may be genuine visitors/customers and some may be looking to steal something from your site. Some of those thieves may be using proxies and programs that change proxies for a browser in just a few mouse clicks. Some of the more expert thieves may be using automated bots to extract content from your site and so on.
I may ban msnbot completely if they continue to show a cached version.
I don't know what your site(s) are about but I do know that MSN traffic is the best converting traffic for the b2c sites. So banning the msnbot will be like throwing the baby with the water, IMHO of course.
For the MSN question, balam has confirmed that the noarchive tag is respected, so I've got no problem in allowing their bot in. It would be a shame to ban them, and you have said.
So your effort will be like closing the stable door after the horse has bolted.
Only 100% sure solution of this problem looks like the use of cloaking but then that opens another pandora box.
Is there any advantage for the site owner to allow the search engines to offer a cached version a site?
I can think of two:
1) when your site is slow or down, some visitors will still be able to see your content
2) easy detection which version of a page has been indexed old, new, newest, etc)
Assuming the site contains internal links I think having the site cached can be an advantage. As a search engine user, I use the cached versions to take advantage of the keyword highlight feature. Often when I find what I am looking for on the initial cached page, I subequently visit other pages in the site.
On the front page description to this thread, there is the comment:
the NoArchive tag has become a requirement for most commercial sites
Are there many here who use noarchive systematically on some or all sites in particular sectors? Do you use it site-wide or just on select pages?
If you sell a service or widget, it is a requirement, at least on your "buy it now" pages. There's been discussion here before where clients will use cached to make purchases or use it as a bargaining tool to receive services for less money. If you also carry inventory information, buyers will buy the widget from the cached page that is now out of stock.
<meta name="MSSmartTagsPreventParsing" content="TRUE">
they seem to be behaving themselves a bit more. Maybe it says "I know what you're up to and I don't like it" to the bot and he pays attention to the other tags ...
... p.s. if you don't hear from me again, check Bill Gates' alibi for the night I was murdered ...
joined:Sept 26, 2001
This meta tag is useless. M$ changed their minds and reversed their plans about implementing the extended features of SmartTags when the web community voiced an very loud objection.
Although the SmartTag framework still exists for those using Windows/IE and there is a possibility that some level of this technology may be put into use in the future, currently the tag does absolutely nothing and everyone I know who initially installed it across their website, has since removed it.
However, if you have evidence that the tag actually does something, I think we'd all be interested.
Now I haven't gone through all my logs, but I'm not aware of MSNbot having done anything untowards (on my site - YMMV!). I also just checked their index and found nothing that shouldn't be there. How recently did you catch them misbehaving, internetheaven?
> they seem to be behaving themselves a bit more.
I highly suspect that this is a coincidence, given that...
> This meta tag is useless.
...and it was/is a tag that works against Microsoft technology. It flys in the face of logic, but I do realize we're talking about Microsoft.
> Do you use it site-wide or just on select pages?
Myself, I do not use it on select pages. That is, most all pages have the attribute but a select few are cachable by Google. This is for the/any psychological effect a freshdate may have on a searcher.
SEO means playing tech games with the engines and mind games with searchers.
It's obviously very early days, but things are looking good so far.
I have also found MSN disobey most of the regular robot rules
I've never had any problem with msnbot, either with robots.txt or robots meta tags. You might want to validate your robots.txt to make sure there's no problem.
For my site, Google are now omitting the Cached link next to some pages. Again, just like MSN, there has been no perceptible change whatsoever in ranking.
joined:Apr 22, 2004
have you seen any negative effects yet?
No problems: I've dropped a couple of slots on my primary keyword, but I've got a bit of a 302 problem from a couple of directories and an authority site has jumped ahead of me with two pages on my subject (I'm still in position 3 and 4). Other keywords are fine if not better, and traffic from Google and MSN has increased. The site has always done badly in Yahoo, no change there.
Of course, the only search engines I know support this tag are Google, MSN and Yahoo - it won't block caching by other sites.
how do you do this on the server side
Cloaking ;) Either add it for known bots, or do what I'm doing for a new forum I'm launching soon and make it appear only when a user is not logged in.