homepage Welcome to WebmasterWorld Guest from 54.197.110.151
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 70 message thread spans 3 pages: 70 ( [1] 2 3 > >     
Google ignores the meta robots noindex tag.
Thousands of pages show that tag in the Google cache!
g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 12:27 am on Jun 14, 2006 (gmt 0)

How many people have noticed the many thousands of Supplemental pages with a 2005 June or July cache date that have been indexed, show in the SERPs with a full title and description, rank, and have a <meta name="robots" content="noindex"> tag both on the live page and in the old cached copy linked from the SERPs.

Oh yeah! There are thousands of them. Now that is a programming bug.

.

Obviously when you place the <meta name="robots" content="noindex"> tag on a page, you expect that the page will be spidered, but not indexed. But if Google keeps no data about that URL, they will never "remember" anything about that URL, and will return every few minutes to "discover" the content, again and again.

However, when you think about it, Google must keep a copy of that page internally so that they can tell when the content has changed, and so that they know where that page links to, and so that they know the index/noindex status of the page.

Such a page should never appear in search results, ever. Well, now they do, and in very large numbers; all (so far) marked as Supplemental Results, and with cache dates from a year ago.

It appears that something in their system is forgetting to check the index/noindex status of the pages in their database and is showing them all in the SERPs whatever their status.

I first noticed this yesterday; but found it on some searches that I have not done for several months. I have no idea how long this bug has been showing up... it could be several months.

 

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 1:05 am on Jun 14, 2006 (gmt 0)

I see it too -- this certainly is an error/bug and not just some intentional change. An unfortunately, it has the effect of taking away a valuable tool for keeping duplicate content and urls out of the index.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 1:06 am on Jun 14, 2006 (gmt 0)

I see some pages that were last modified in 2003 May and which have had the <meta name="robots" content="noindex"> tag on each one since at least that time, that are listed in Google as Supplemental Results with a full title and snippet, and with a full cache from 2005 June. The Google cache shows the <meta name="robots" content="noindex"> tag.

Google has been ignoring the robots noindex meta tag. I know. I designed that site. I am the only one with FTP access and the files have not been altered since 2003, and have always had the robots noindex on them since that date. The whole site was and still is disallowed, as it is the development copy of the site. The live site is elsewhere, on some other server.

In looking further I have found many other pages on many other sites that have exactly the same problem. You can see the noindex tag on the cached copies!

texasville

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 34757 posted 2:07 am on Jun 14, 2006 (gmt 0)

>>>>The whole site was and still is disallowed, as it is the development copy of the site. The live site is elsewhere, on some other server. <<<

I wonder if that will bring a dup content penalty. Man, things are a mess. G is deindexing pages I want indexed and sticking in pages you DON'T want indexed. Cheese!

trinorthlighting

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 34757 posted 2:29 am on Jun 14, 2006 (gmt 0)

There was an issue with the adsense bot indexing pages and not following meta tags. Are these adsense or adwords sites?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 11:51 am on Jun 14, 2006 (gmt 0)

No adsense or adwords. Site has been out of the SERPs for three years - or at least it should have been. I do know that it was definately not showing in the SERPs in 2003 and 2004. I haven't really checked since then. I had no reason to. I suspect that it has recently reappeared in the SERPs, perhaps just the last few weeks or months, even though the cache is from a year ago.

Key_Master

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34757 posted 12:18 pm on Jun 14, 2006 (gmt 0)

It's not just noindex. I have a site that uses noarchive,nofollow that are showing up in the index also. Never had a problem using it before. I just noticed this issue a few days ago.

The cached dates listed are also from mid-2005.

John Carpenter

5+ Year Member



 
Msg#: 34757 posted 12:56 pm on Jun 14, 2006 (gmt 0)

That would be a nasty bug. Well, robots.txt has always worked for me. Anyone seen a violation of robots.txt by G?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 1:18 pm on Jun 14, 2006 (gmt 0)

Sort of. For some other site (a forum) that has various pages (pages that are not threads) disallowed, and all pages are www pages, I see an error.

When I do a site:domain.com search I do see the pages that I should see, but when I do a site:domain.com -inurl:www search, I can see a load of www pages that are disallowed. These all have an old cache and are marked as Supplemental Results.

I wrote about that over at: [webmasterworld.com...] just a few days ago, specifically msg#75: [webmasterworld.com ]

zeus

WebmasterWorld Senior Member zeus us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 1:32 pm on Jun 14, 2006 (gmt 0)

serious Im getting tied of all there bugs now

boplicity

5+ Year Member



 
Msg#: 34757 posted 1:48 pm on Jun 14, 2006 (gmt 0)

Copyright infringement at its finest.

Of course, google wants to do good -- so we should let it steal our work.

soapystar

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34757 posted 1:49 pm on Jun 14, 2006 (gmt 0)

when i posted this last year comments just suggested i had the tag wrong...now at least we can agree that google ingores the noindex tag..if you dont want to be spidered you must exclude the directory in robots.txt

lammert

WebmasterWorld Senior Member lammert us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 34757 posted 1:55 pm on Jun 14, 2006 (gmt 0)

Ignoring the noindex tag happens for me since two or three weeks.

The strange thing is that not all websites I have seem to suffer from the ignored noindex tag problem. It only appears on one site which has a bunch of supplemental results. the other sites have loads of noindex pages which are still hidden from the index.

Just as with the "-inurl:www bug" g1smd also discovered a week ago, this bug seems also a supplemental index problem. It seems Google rewrote the supplemental index software and forgot some basic functionality which has been there for years.

cmendla

10+ Year Member



 
Msg#: 34757 posted 2:18 pm on Jun 14, 2006 (gmt 0)

This is bad timing for me.

I was reading here about the advantages of blue-widget-parts over bluewidgetparts or blue_widget_parts as filenames. I'm starting to go through my sites and change the filenames over.

Unfortunately I'm using Frontpage (yeah I know..). FP doesn't play well with htaccess. The solution I'm using is to rename blue_widgets.htm to blue-widgets.htm etc. Then I take the original page and do a meta refresh to redirect to the new page and strip out everything but a 'this page has moved' notice. the final part is to add a robots tag of noindex, follow.

I was hoping that the googlebot would heed the robots tag with noindex, follow However it now appears that it may not heed this directive.

I think I'll be OK since the blank pages probably won't be indexed anyway. As a safeguard I'll probably have to add the old pages to robots.txt which will be a little tedious. If I don't add them to robots.txt then I will have a lot of blank pages which might cause a penalty.

I want to thank everyone here for helping keep me up to date on things..It's invaluable! This thread saved me from creating a major headache for myself.

circuitspore

5+ Year Member



 
Msg#: 34757 posted 2:25 pm on Jun 14, 2006 (gmt 0)

>> Anyone seen a violation of robots.txt by G?

I had a site with a directory that had been blocked with robots.txt since February. The robots.txt was submitted to G via the URL removal tool in Feb.

Pages almost immediately fell out of the index & weren't requested by Googlebot AT ALL in march or april. Come May 8th, Gbot requested over 6000 of these pages that were (at still are) blocked w/ robots.txt.

Shortly after, the supp index for this site disappeared.

So yeah, I've had a lot of fun with Gbot COMPLETELY ignoring a robots.txt exclusion.

trillianjedi

WebmasterWorld Senior Member trillianjedi us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 2:34 pm on Jun 14, 2006 (gmt 0)

All of my NOINDEX pages are correctly not indexed, across 3 completely different sites where I use it (to prevent "printer friendly" pages getting into the index).

Any other common denominators that you can spot?

What makes NOINDEX work for some, but not for others?

TJ

trinorthlighting

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 34757 posted 2:41 pm on Jun 14, 2006 (gmt 0)

I have a directory that was supplemental and blocked by the robots.txt

My pages are not showing up.

I would venture to say this has something to do with the crawling of the supplemental index.

LJCoolB

5+ Year Member



 
Msg#: 34757 posted 2:54 pm on Jun 14, 2006 (gmt 0)

We had a comment that linked to multiple pOrn sites on one of our corporate blogs -- we deleted the comment months ago. In fact, the blog itself has a completely new URL. The comment is back in the supplementals with a cache date of July 2005. Found it in a sitesearch for a keyword related to the blog.

This is a cache issue for us. The page does not even exist anymore. But there it is, full of inappropriate links in the google results.

When you click on the link in the results (the description is all about p0rn) you are taken to the new version of the blog. Very frustrating.

trinorthlighting

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 34757 posted 2:56 pm on Jun 14, 2006 (gmt 0)

Use the google url removal tool and 404 the pages.

gdawg

5+ Year Member



 
Msg#: 34757 posted 2:59 pm on Jun 14, 2006 (gmt 0)

I have seen Google ignoring my robots.txt file as well. In my situation I have a page from my old site that I implemented a 301 redirect to the same page on my new site and excluded this page in my robots.txt file on the new site. For some reason Google is completely ignoring the robotx.txt file and this page is indexed and ranking on the 1st page of results.

stuntdubl

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34757 posted 3:01 pm on Jun 14, 2006 (gmt 0)

Why should google have to obey standards? They ARE the web. :)

John Carpenter

5+ Year Member



 
Msg#: 34757 posted 3:28 pm on Jun 14, 2006 (gmt 0)

Why should google have to obey standards? They ARE the web. :)

That would be MS rather than G.

texasville

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 34757 posted 3:55 pm on Jun 14, 2006 (gmt 0)

>>>That would be MS rather than G. <<<

Nope..that's G.
MS thinks they are all things pc.

herb

10+ Year Member



 
Msg#: 34757 posted 5:13 pm on Jun 14, 2006 (gmt 0)

G is not alone. We have a site we have been developing for a couple of months. Every page has:
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">

Yahoo's has listed 66 of the pages. Every cached page source reveales the meta tag.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 7:10 pm on Jun 14, 2006 (gmt 0)

<meta name="robots" content="none">

Been using this for years and I can't find any of my pages that utilize it in the index. I just spot checked 10 of them from different domains and they are nowhere to be found so it appears to be working on my end.

oxbaker

5+ Year Member



 
Msg#: 34757 posted 8:20 pm on Jun 14, 2006 (gmt 0)

did you implement a robots.txt file, i always pair them with the ROBOTS meta tag and have no issues so far!

hth,
mcm

Atomic

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34757 posted 8:40 pm on Jun 14, 2006 (gmt 0)

did you implement a robots.txt file, i always pair them with the ROBOTS meta tag and have no issues so far!

I have several pages on several different sites where I did this when the pages were added and not afterwards. The pages still appear in the index. It's a great feeling when you see sale items from Nov, 2005 preserved forever despite doing everything you are supposed to do.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 9:12 pm on Jun 14, 2006 (gmt 0)

Did you implement a robots.txt file, i always pair them with the ROBOTS meta tag and have no issues so far!

The robots.txt file Disallow will produce URI only listings with Google.

Based on experience and discussions here at WebmasterWorld, you would remove the Disallow: and drop the Robots META Tag on the pages you don't want indexed.

When the bot requests the robots.txt file and there is a Disallow for a page you don't want indexed, Googlebot will index the URI only.

If you are disallowing the bot from visiting the page, it won't see the Robots META Tag to follow whatever directives you have in there.

So, the best method as described here on WebmasterWorld is to use the Robots META Tag to keep stuff out of the index and not the robots.txt file.

I've only seen the disallowed URI only listings when performing advanced search queries.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 9:28 pm on Jun 14, 2006 (gmt 0)

My original observation is that pages containing the <meta name="robots" content="noindex"> tag (for the last three years) are now showing in SERPs with a full title and snippet, and have a link to a cached page from 2005 July, and that the cached page clearly shows the meta noindex tag too.

I see many results like this all over the place. They are all (so far) pages that are being shown as Supplemental Results.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34757 posted 9:54 pm on Jun 14, 2006 (gmt 0)

I think we've concluded that Google is broken right now for many so whatever is occuring now is outside the norm.

This 70 message thread spans 3 pages: 70 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved