Google indexed "noindex" pages

Forum Moderators: open

Message Too Old, No Replies

Google indexed "noindex" pages

And cached meta refreshed pages

jimbeetle

3:18 am on Nov 29, 2002 (gmt 0)

Because some affiliate links are so long and complicated I have a group of pages I use to meta refresh to the affiliate company's pages.

I slapped a simple...

...on them. Today when I used allinurl: to see what pages Google has indexed (kind of disheartening to see 1,500+ pages all PR0, but that's another story), I was quite surprised to see all of these "noindex" pages listed in Google.

What was more surprising is that the cached pages were not my pages but those of the company of which we are an affiliate, those pages to which my pages meta refreshed.

So I added "nofollow" to the meta as a possible quick fix and to see what would happen. Probably better to handle in robots.txt, though.

Question is though, why would Google index and cache them in the first place?

Muddled in Manhattan,

Jim

martinibuster

3:20 am on Nov 29, 2002 (gmt 0)

Pages in google are "generally" from the previous months crawl. This may be the case. You may have to wait until this update is finished, or for next month's update.

stevenha

3:45 am on Nov 29, 2002 (gmt 0)

Here's somewhat off topic comment, but I have a "noindexy page" on my site, that's been noindex,follow, for a couple years. It's got a link on that page to a friends page. (The only place the link occurs)

When I used that cool animated java tool (sorry, I can't remember its URL) that shows "related sites" based on the Google Sets results, it showed a linkage between my site's main page, to this friends page.

So, even though Google won't show my noindexy page in its SERPs, it's willing indicate the linkage relationship to my friends site. I thought that was kind of interesting.

[edited by: stevenha at 3:54 am (utc) on Nov. 29, 2002]

jimbeetle

3:50 am on Nov 29, 2002 (gmt 0)

martinibuster, these pages have been "noindex" since they were first put up in June so Google should never have indexed them in the first place. Plus caching the page that they are refreshing to is the most confusing part of it.

Maybe Google's been dancing a bit too hard?

martinibuster

5:33 am on Nov 29, 2002 (gmt 0)

It might be the "refresh" that google's following.

On the other hand, it's important to put in the "no follow" so that the bot stops at the door and doesn't follow links. How long has the "no follow" been in there?

I put up a guestbook for a client, and dropped the noindex,nofollow meta (from day one) to keep out the bots and spammers.

So far, this page has a PR zero, and no spammers. Worked fine for all bots.

GoogleGuy

5:47 am on Nov 29, 2002 (gmt 0)

Um, could it have been a partially indexed page where we only saw the url but didn't actually fetch the page? Did you see a snippet from the page in the search results?

jimbeetle

9:57 pm on Nov 29, 2002 (gmt 0)

martinibuster,

I just put the "nofollow" in there yesterday to see if it would have any effect.

Googleguy,

Some pages have true snippets, most use the meta description (they probably have a "noarchive" on them), but all are from the other sites pages.

The bot didn't fetch our page, it fetched the page ours is refreshing to.

The url is ours, the page title is theirs.

The cached page is theirs with all relative links pointing to our site, not theirs.

Figure the best way to handle this is robots.txt but since this is very curious decided to wait so I can answer any questions you have.

More muddled,

Jim

martinibuster

7:05 am on Nov 30, 2002 (gmt 0)

Yesterday is too late to deal with a crawl that happened last month. this mojths update is the result of last months crawl.

My opinion is that it will take another crawl (going on at this moment), and another month for this to work itself out.

You have to keep in mind that G updates it's ENTIRE index once a month.

jimbeetle

9:22 pm on Nov 30, 2002 (gmt 0)

The question doesn't really have anything to do with when the pages were crawled and indexed, but that Google indexed them in the first place.

Google should not index pages with a valid "noindex" tag. That it sometimes happens is something we all -- and Google -- should be aware of.

That Google caches or has cached incorrect pages is troublesome. Might not the 60 incorrectly cached pages that Google thinks are on my site be considered exact duplicates of the pages on the site to which I'm linking? I see a lot of problems here.

Just trying to clear up an intersting question.

Jim

nancyb

9:42 pm on Nov 30, 2002 (gmt 0)

If the page is already in the google index, putting a noindex, nofollow tag won't get it removed later. You should probably use the "remove" feature to get them removed [google.com...]

I had a whole section the was noindex, nofollow'ed from the day the pages went up. Google indexed them anyway and they were in the db for over two years. It wasn't a big deal, just a section I didn't think needed to be searchable. Just checked and those pages have finally been removed, but that may be because I moved my domain in Oct and have an .htaccess disallow on that directory now.