Forum Moderators: open
We would like to embed a tag inside the filename part of the URL (rather than adding a CGI name-value pair).
For example current page name might be "widgetA-details.htm". On a page that links to this page more than once, for example the one in the left navigation bar we might put "widgetA-nav-details.htm" while the same link in the body of the page we might put "widgetA-body-details.htm".
We would implement this using apache rewrite rules, not as a 301 redirect (because the redirect would result in a spurious hit in our logs). So from the outside world, it looks like two URLs, but the resulting page will be identical.
How would Google handle this? Presumably duplicate detection would see that they are the same page, but which one gets picks as the "right" page? How about PR -- does it flow to the "right" page or both pages, and will this cause some dilution of PR. Is this something Google would see as an attempt to spoof them?
Thanks all for your thoughts on this situation.
kaled -- can you expand upon your concerns, or is this more of a gut feeling. Do you think a 301 redirect would be any better? Or adding cgi args (e.g. widget1-detail.htm?src=nav)?
Thanks both for your replies. Not exactly what I want to hear, but better now than later :-)
Google as of today is allowing duplicate content.
widgetpage.com and pageofwidgets.com and widgetspage.com are all identical pages and they are all showing up in the SERPs, one right after the other.
They even appear to have turned off the ability to report such things to them, as another thread is discussing.
Unless this cannot be achieved for some technical reason, this approach is likely to cause the fewest problems in the long run.
You might consider placing parameters after a # char. This would have the advantage that spiders will not keep crawling the same page, etc. You would have to do some research on this.
Kaled.
The # anchor idea is a good idea. Is it certain that Google ignores anything after the # in a URL? If so, that would be a completely fine and simple solution.
Thanks again.
Is it certain that Google ignores anything after the # in a URL? If so, that would be a completely fine and simple solution.
Am I certain? NO However, I've never noticed duplicate content that appeared to result from such links so I guess I'm 99.9% sure.
There was some discussion on this a day or two ago. I believe it was said that everything after the # is reserved for use by browsers and is ignored by search engines. When I read it, it seemed irrelevant to me so I did not take careful note.
You would have to do some experimentation to see how browsers behave, however, I would not anticipate a problem.
Kaled.
PS
If its purely for internal use within a website, you could consider using the robots meta tag NOFOLLOW with your orginal plan. This would avoid duplicate content issues. Also if you use javascript.write to create links dynamically, these will not be followed by robots.
The question is a different one:
Does the server _see_ the hash?
If it doesn't see it, you can't use it for your purpose.
Not entirely sure what you mean. However, if necessary, javascript can be used to extract the # part of the url and pass to perl/php script on the server. It's not pretty but it's not difficult either. In any case, javascript will be needed if targetted links (to a location within a page) need to be tracked (to separate the target string from the ID and jump to the target in the page).
I'm not recommending use of the # method, simply suggesting it for consideration.
Using the OnClick event might be more suitable for use within a single website.
Kaled.
Yes, it is, but it slowly filters it out. I am watching a page that was indexed 8 times, and appeared in results in 8 consecutive entries. There was a mix of www and non-www entries, with and without a virtual directory path, and with several dynamic variables. Over the last 5 weeks the 8 entries have been reduced to three.
>> If its purely for internal use within a website, you could consider using the robots meta tag NOFOLLOW with your orginal plan. <<
On the other hand, if you do really have separate pages with the same content, use the noindex,follow tag on the ones that you do not want listed.
At this point in time that page on the newer site has normal Page Rank for it's location in the navigation and the original page on the original site, instead of having the PR it always did, is now PR0.