Forum Moderators: open
A few months back, I found all of these html pages in Google's index, and they are only linked by javascript links.
I also have a separate glossary section containing some of these same definitions, but on different pages (these pages use my site template). I was annoyed that Google indexed the popups, potentially exposing me to duplicate content penalties.
I placed these pop up html pages under the robots.txt, but months later they are still in Google's index. Worse yet, I found that over 50 of them have a PR of 3.
What should I do? I don't want to leak Pagerank to those pages and I don't want a duplicate content penalty!
In any case, you can use Google's URL removal tool to get those pages out of the index: [google.com ]
<meta name="robots" content="noindex,nofollow"> As you have seen, just using a Javascript link is not enough to stop Googlebot.
Putting them in remote files will keep the Googlebot from indexing them and leaking pagerank through them. Its not an issue of whether they can or cannot... but they do not when the code is put into a remote file
get rid of the robots exclusion and add meta robots noindex to the files.
If you are concerned about PR leaks, put small link back to the home page in a copyright notice at the bottom of each of those pages.
I am not surprised that they found your links. Tey even find URLs in the regular text to follow. What they don't (or I better make that "didn't") do was to pass PR through any of those links.
Try adding a rel="nofollow" to the links at this point.
Are you sure that no one else is linking to those pages?
Will Google see my site as less authoritative if I try to stop some pages from being indexed in this manner? I'm not hiding anything or doing anything devious, but you never know with Google.