Welcome to WebmasterWorld Guest from 50.19.34.234

Message Too Old, No Replies

Domain reappearing in results after I blocked it

Google appears to ignore meta tags and robots file

     

wintercornuk

9:18 am on Jan 27, 2008 (gmt 0)

10+ Year Member



I've got a domain which for legal reasons needs to be removed from all search engines. This was successfully done three months ago. Now it suddenly appears in Googles index when searching for its single word domain. Also, the page title has changed to a series of words I've never used.

The meta tags are:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">

The robots.txt file is:

User-agent: Googlebot
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: *
Disallow: /

My questions are-

Could a third party re-insert it into the index with some kind of link bombing?

Why does google ignore the request not to index?

[edited by: Robert_Charlton at 9:47 am (utc) on Jan. 27, 2008]

Robert Charlton

10:00 am on Jan 27, 2008 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Why does google ignore the request not to index?

The robots.txt disallow is fighting the robots meta tag noindex. By disallowing all (well behaved) bots, you're preventing the engines from seeing the noindex robots meta.

Unless Google sees the noindex robots meta, it will do its best to index "good references" it can find to a page, even if it hasn't crawled the page. So, if there are active links out there still pointing to your page, Google will index the url in the link, and it will sometimes rank it.

Also, the page title has changed to a series of words I've never used.

Does this look like it might be text in any way related to your page, as in a link anchor that might have been linking to you?

[edited by: Robert_Charlton at 10:06 am (utc) on Jan. 27, 2008]

wintercornuk

12:41 pm on Jan 27, 2008 (gmt 0)

10+ Year Member



So the best way forward is to remove the robots.txt file and just let the meta tags work?

The page title is in lower case (something I've never used on this site) and I don't think anyone has linked to it using that specific text. Very odd.

Halfdeck

2:00 pm on Jan 27, 2008 (gmt 0)

5+ Year Member



Yeah, remove the robots.txt disallow and those pages should disappear from the SERPs.

londrum

2:07 pm on Jan 27, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



maybe the second meta tag is overriding the first one. I don't think it's supposed to do that, but it pays to be safe.
i'd change

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">

to

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW, NOARCHIVE">

or

<META NAME="ROBOTS" CONTENT="NONE">

You could also include a PHP header on the page...

header('X-Robots-Tag: noindex, nofollow, noarchive', TRUE);

jd01

5:21 pm on Jan 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want to make sure it does not get indexed and have access to mod_rewrite, try:

RewriteEngine on
RewriteRule !^no-index\.html http://example.com/no-index.html [R=301,L]

Then make no-index.html the following:

<html>
<head>
<title>&nbsp;</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta name="robots" content="noindex,nofollow,noarchive" />
</head>
<body>
</body>
</html>

The RewriteRule will redirect any request that is not for example.com/no-index.html to example.com/no-index.html. The meta tag will prevent no-index.html from being indexed,followed,archived, and the site will disappear from the SERPs.

You should be able to safely remove the robots.txt.

Justin

<added>
Another mod_rewrite alternative is:
RewriteEngine on
RewriteRule .? - [F]

The preceding will serve a 'Forbidden' error any time any page is accessed. Basically, it says to *everyone* 'You do not have permission to access the site', and will cause it to be dropped from the indexes. (This one might be the easiest / most effective.)

DO NOT use either of these suggestions if you need to allow access to the site, because they will not allow any visitor (SE or human) to see it.
</added>

g1smd

7:23 pm on Jan 27, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I have found that when using "forbidden", the pages take a very long time to drop out of the index.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month