Forum Moderators: Robert Charlton & goodroi
The meta tags are:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
The robots.txt file is:
User-agent: Googlebot
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: *
Disallow: /
My questions are-
Could a third party re-insert it into the index with some kind of link bombing?
Why does google ignore the request not to index?
[edited by: Robert_Charlton at 9:47 am (utc) on Jan. 27, 2008]
Why does google ignore the request not to index?
The robots.txt disallow is fighting the robots meta tag noindex. By disallowing all (well behaved) bots, you're preventing the engines from seeing the noindex robots meta.
Unless Google sees the noindex robots meta, it will do its best to index "good references" it can find to a page, even if it hasn't crawled the page. So, if there are active links out there still pointing to your page, Google will index the url in the link, and it will sometimes rank it.
Also, the page title has changed to a series of words I've never used.
Does this look like it might be text in any way related to your page, as in a link anchor that might have been linking to you?
[edited by: Robert_Charlton at 10:06 am (utc) on Jan. 27, 2008]
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOARCHIVE"> to
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW, NOARCHIVE"> or
<META NAME="ROBOTS" CONTENT="NONE"> You could also include a PHP header on the page...
header('X-Robots-Tag: noindex, nofollow, noarchive', TRUE);
RewriteEngine on
RewriteRule !^no-index\.html http://example.com/no-index.html [R=301,L]
Then make no-index.html the following:
<html>
<head>
<title> </title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta name="robots" content="noindex,nofollow,noarchive" />
</head>
<body>
</body>
</html>
The RewriteRule will redirect any request that is not for example.com/no-index.html to example.com/no-index.html. The meta tag will prevent no-index.html from being indexed,followed,archived, and the site will disappear from the SERPs.
You should be able to safely remove the robots.txt.
Justin
<added>
Another mod_rewrite alternative is:
RewriteEngine on
RewriteRule .? - [F]
The preceding will serve a 'Forbidden' error any time any page is accessed. Basically, it says to *everyone* 'You do not have permission to access the site', and will cause it to be dropped from the indexes. (This one might be the easiest / most effective.)
DO NOT use either of these suggestions if you need to allow access to the site, because they will not allow any visitor (SE or human) to see it.
</added>