|Maximizing Google Internal PR|
What to do with links to "print this article"
On pages of my web site, I have links to "print this article" pages and "email this article to a friend". Now, I block these pages in my robots.txt file to avoid duplicates, and also because I wouldn't want searchers to end up on these pages instead of the original article. But these pages cause me a few other problems:
1- For one, it seems some unethical people (to stay polite) have run bots on them to get a simplified version of these articles automatically and publish them without my permission.
2- Some bots don't follow the robots.txt and still crawl those pages, which is a problem in itself because I actually record the traffic on those pages to compile a statistic page for "the most printed article", "the most emailed article", etc. It's just a fun stats page but they skew the results.
3- I also have a lot of supplemental pages and many others who simply aren't indexed in Google, possibly because the amount of content has outgrown too rapidly the amount of backlinks I'm getting (all original content, mind you, so no duplicate penalties).
Having so many pages on the site, I think saving up on these 2-3 links per page would possibly improve the PR distribution just enough to gain back a few hundreds of supplementals...
You could just put a rel="nofollow" attribute in those anchor tags. See [webmasterworld.com...] for a clarification from Matt Cutts on this. "...nofollow'ed links are dropped out of our link graph; we don't even use such links for discovery."
Appreciate your time.
Perhaps I'm being paranoid, but Matt Cutt's carefully worded clarification in that thread still gives me a little pause for thought.
I personally would not use them on any site that had anything else going on that could be flagged. Filters and penalties seem to me to be triggered when a succession of flags are raised about a site, and I still think that use of nofollow could be one of them.
But if your approach is 100% 'ethical' and focused purely on content and not links then I'd use it for simplicity.
I'm with FF, in these days of pointers towards "intent" manipulating internal PR must be up there with the best....
|On pages of my web site, I have links to "print this article". |
Instead of having a "separate page" for the "print this article" version, why not just use a print stylesheet instead, that gets rid of that duplicate.
You can also put this at the top of your "print this article" version...
<meta name="robots" content="none">
Place it right after the <head> and right after the charset metadata if present.
P.S. If you use the above meta robots tag to keep those pages out of the indices, you'll need to remove the entries from your robots.txt file. If you block it via robots.txt, the bot is not going to see the actual page with the robots meta tag.
I like that approach, pageoneresults, because it addresses the chance that someone else links to your print version. You can only have the nofollow on your own links, so an offsite backlink might still cause some issues.
I had a similar problem with print pages being indexed even though they had a 'noindex,nofollow' on them. To get around this, the approach I now use is to include a form on the page with a hidden variable and a submit button labelled 'print format'. The target is the same page. The variable is detected server side (php) and the page is displayed with a different template and stylesheet.
Does this actually work"?
Would maybe using the URL Removal Request tool to get
it out of indexed pages for the site, and then
re-includng the link on accrdant pages, but instead
I see Gbot making get calls on some of my sites, for
an external js file I use to display DHTML menus, but
I'm not entirely convinced Gbot actually finds/follows
I Googled whether or not Gbot can "read" and follow JS
links, seems to be conflicting info.
Nonetheless, wouldn't it be worth a try>
|Does this actually work"? |
Based on my day in and day out use of it, I'd say it works perfectly.
Remember, you cannot have a robots.txt entry for that page if you are using the meta robots element. If you are Disallowing via robots.txt, the bot can't get to that page to read the robots meta element, it is negated. So, remove any robots.txt entries and then utilize the meta robots element. Some feel safer with...
<meta name="robots" content="noindex, nofollow">
Which is fine. The "none" is shorthand for noindex, nofollow and part of the spec. The major SEs all adhere to the protocol in this instance.
I hope this isn't straying from the thread: "Maximizing Google Internal PR". I have a site that has about 75 - 80 widgets. All of our "widgets" require a diferent size accessory.
We sell these widgets and the "widget accesories". For the good of my customers and ourselves - we include very comprhensive "fitting guides" to those "widget pages" that require an exact "fitted accessory" due to size or brand name of the widget.
On these 75+ widget pages we have a single absolute link to click that says "See our Fitting Guides for an exact product match" - as anchor text) on thsse pages that sell a widget requiring a sized accessory.
Is this bad internal linking? I can't think of a more natural way to assist our customers in selecting the proper accessory.
Thank you all for your suggestions, I also noticed that the following post about Matt Cutt's comment on the subject kind of remove the stigmas of using rel="nofollow" on our own internal links.
So are we certain 100% that using this attribute in our links is exactly the same as blocking the target pages in our robots.txt file? I felt there was still a leak of PR with the robots.txt. method while the rel="nofollow" seemed to nip it in the bud.