Forum Moderators: open
I slapped a simple...
<meta name="robots" content="noindex">
...on them. Today when I used allinurl: to see what pages Google has indexed (kind of disheartening to see 1,500+ pages all PR0, but that's another story), I was quite surprised to see all of these "noindex" pages listed in Google.
What was more surprising is that the cached pages were not my pages but those of the company of which we are an affiliate, those pages to which my pages meta refreshed.
So I added "nofollow" to the meta as a possible quick fix and to see what would happen. Probably better to handle in robots.txt, though.
Question is though, why would Google index and cache them in the first place?
Muddled in Manhattan,
Jim
When I used that cool animated java tool (sorry, I can't remember its URL) that shows "related sites" based on the Google Sets results, it showed a linkage between my site's main page, to this friends page.
So, even though Google won't show my noindexy page in its SERPs, it's willing indicate the linkage relationship to my friends site. I thought that was kind of interesting.
[edited by: stevenha at 3:54 am (utc) on Nov. 29, 2002]
On the other hand, it's important to put in the "no follow" so that the bot stops at the door and doesn't follow links. How long has the "no follow" been in there?
I put up a guestbook for a client, and dropped the noindex,nofollow meta (from day one) to keep out the bots and spammers.
So far, this page has a PR zero, and no spammers. Worked fine for all bots.
I just put the "nofollow" in there yesterday to see if it would have any effect.
Googleguy,
Some pages have true snippets, most use the meta description (they probably have a "noarchive" on them), but all are from the other sites pages.
The bot didn't fetch our page, it fetched the page ours is refreshing to.
The url is ours, the page title is theirs.
The cached page is theirs with all relative links pointing to our site, not theirs.
Figure the best way to handle this is robots.txt but since this is very curious decided to wait so I can answer any questions you have.
More muddled,
Jim
My opinion is that it will take another crawl (going on at this moment), and another month for this to work itself out.
You have to keep in mind that G updates it's ENTIRE index once a month.
Google should not index pages with a valid "noindex" tag. That it sometimes happens is something we all -- and Google -- should be aware of.
That Google caches or has cached incorrect pages is troublesome. Might not the 60 incorrectly cached pages that Google thinks are on my site be considered exact duplicates of the pages on the site to which I'm linking? I see a lot of problems here.
Just trying to clear up an intersting question.
Jim
I had a whole section the was noindex, nofollow'ed from the day the pages went up. Google indexed them anyway and they were in the db for over two years. It wasn't a big deal, just a section I didn't think needed to be searchable. Just checked and those pages have finally been removed, but that may be because I moved my domain in Oct and have an .htaccess disallow on that directory now.