homepage Welcome to WebmasterWorld Guest from 54.196.18.51
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 70 message thread spans 3 pages: < < 70 ( 1 2 [3]     
Google ignores the meta robots noindex tag.
Thousands of pages show that tag in the Google cache!
g1smd




msg:708088
 12:27 am on Jun 14, 2006 (gmt 0)

How many people have noticed the many thousands of Supplemental pages with a 2005 June or July cache date that have been indexed, show in the SERPs with a full title and description, rank, and have a <meta name="robots" content="noindex"> tag both on the live page and in the old cached copy linked from the SERPs.

Oh yeah! There are thousands of them. Now that is a programming bug.

.

Obviously when you place the <meta name="robots" content="noindex"> tag on a page, you expect that the page will be spidered, but not indexed. But if Google keeps no data about that URL, they will never "remember" anything about that URL, and will return every few minutes to "discover" the content, again and again.

However, when you think about it, Google must keep a copy of that page internally so that they can tell when the content has changed, and so that they know where that page links to, and so that they know the index/noindex status of the page.

Such a page should never appear in search results, ever. Well, now they do, and in very large numbers; all (so far) marked as Supplemental Results, and with cache dates from a year ago.

It appears that something in their system is forgetting to check the index/noindex status of the pages in their database and is showing them all in the SERPs whatever their status.

I first noticed this yesterday; but found it on some searches that I have not done for several months. I have no idea how long this bug has been showing up... it could be several months.

 

Tommy_Two




msg:708148
 6:39 am on Jun 20, 2006 (gmt 0)

<Added>
In the two hours since I started the note above, the number of my pages Google reported knowing dropped from "about 8640" to "about 297" which is about what it should be. Now if it will just STAY that way...

G* still delivers some annoying SUPPLEMENTAL RESULTS for some searches but that does seem to be getting better.
</Added>

Phil_Payne




msg:708149
 8:46 am on Jun 20, 2006 (gmt 0)

I had a set of results last week that implied Google was ignoring noindex and treating nocache as noindex.

I wonder if it's a parser bug.

moftary




msg:708150
 12:00 pm on Jun 20, 2006 (gmt 0)

Also, when I exculde a certain portion from my site from being indexed using robots.txt, it goes to supplemental.

--moftary

Tommy_Two




msg:708151
 12:11 pm on Jun 20, 2006 (gmt 0)

Here is what Google Sitemap Diagnostic says about ROBOTS.TXT

URLs restricted by robots.txt [?]

Below are URLs we tried to crawl (found either through links from your Sitemaps file or from other pages) that we didn't crawl because they are listed in your robots.txt file. You may have specifically set up a robots.txt file to prevent us from crawling this URL. If that is the case, there's no need to fix this; we will continue to respect robots.txt for this file. If you want us to crawl these pages, make sure that your robots.txt file doesn't restrict our access.


G* doesn't say it won't display them, only that is didn't crawl them. :-(

wiseapple




msg:708152
 12:20 pm on Jun 20, 2006 (gmt 0)

Our page count is slowly moving down also... We have a couple of sections which are noindex, nofollow. Somehow, Google ended up with these in the index. They were all in the supplemental and dated back to around June-July 2005.

Our site has approximately 20,000 pages. Using the "site:" command, we have gone from around 260,000 pages down to about 145,000 in a week or so. Not sure what is up with this. Not sure what the other 120,000 pages are/were? It would be nice to see an accurate page count.

Thanks.

g1smd




msg:708153
 6:40 pm on Jun 20, 2006 (gmt 0)

Yes, Supplemental Results with a 2005 June or July cache date (maybe others too) and a noindex meta tag, are shown in the index, rank for search terms related to content on that page, and have a Google cache that shows the noindex tag in place on the page.

pageoneresults




msg:708154
 6:47 pm on Jun 20, 2006 (gmt 0)

Our site has approximately 20,000 pages. Using the "site:" command, we have gone from around 260,000 pages down to about 145,000 in a week or so. Not sure what is up with this.

I'm watching one site now go from 12,000,000 to 32,000,000 to 18,000,000 to 34,000,000 day in, day out. The fluctuations in page counts are a sure sign that something is broken. ;)

Not sure what the other 120,000 pages are/were? It would be nice to see an accurate page count.

This may not apply to you but many sites that are dynamic have URI paths to the same content through many different queries. At some point, Googlebot got smart and started generating and indexing those queries. So, you may have a time when your page count is 12 times that of what it should be and there may be a possibility that Googlebot has found 12 different ways to reach the same content. :(

M3Guy




msg:708155
 10:33 pm on Jun 20, 2006 (gmt 0)

I had a similar issue with robots commands recently.

After having dissalowed G, MSN and Y access to a folder on the server G promtly went and index every page within it.

I then changed the robots file, after advice received here, and put noindex or none etc. into the head of each page.

Low and behold the pages were recrawled and indexed with the various commands displayed in the cached pages.

Seem to have not sorted it however by using rewrite to show a 403 to any robot in the list that tries to access these pages

Tommy_Two




msg:708156
 10:47 pm on Jun 23, 2006 (gmt 0)

Being tired of GooGoo showing year old SUPPLEMENTAL RESULTS in some searches, I decided to limit the search to "Within the Last Three Months" in Advanced Search. Nope! Those pesky year old Cached pages still show up.

g1smd




msg:708157
 11:00 pm on Jun 23, 2006 (gmt 0)

The connection with the logic of the Supplemental Database is broken. It serves whatever it feels like when a supplemental result is available for the query.

This 70 message thread spans 3 pages: < < 70 ( 1 2 [3]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved