Welcome to WebmasterWorld Guest from 54.166.224.46

Forum Moderators: open

Message Too Old, No Replies

Two different URLs for same page?

How does Google handle this case

     
5:13 pm on Nov 6, 2003 (gmt 0)

10+ Year Member



We would like to implement a mechanism for tracking which link a user clicked on a page when two or more links on that page go to the same other page. Our purpose is purely to identify and track user behavior in order to improve the site experience.

We would like to embed a tag inside the filename part of the URL (rather than adding a CGI name-value pair).

For example current page name might be "widgetA-details.htm". On a page that links to this page more than once, for example the one in the left navigation bar we might put "widgetA-nav-details.htm" while the same link in the body of the page we might put "widgetA-body-details.htm".

We would implement this using apache rewrite rules, not as a 301 redirect (because the redirect would result in a spurious hit in our logs). So from the outside world, it looks like two URLs, but the resulting page will be identical.

How would Google handle this? Presumably duplicate detection would see that they are the same page, but which one gets picks as the "right" page? How about PR -- does it flow to the "right" page or both pages, and will this cause some dilution of PR. Is this something Google would see as an attempt to spoof them?

Thanks all for your thoughts on this situation.

10:41 pm on Nov 6, 2003 (gmt 0)

10+ Year Member



Without 301 it will be duplicate content.
We used to do the same, but then we were afraid because of all the duplicate content and now we do it the proper way.
Also it looked ugly in the SERPs.

If the logs are your only concern: just write a filter to ignore the 301s e.g.: egrep -v " 30[12] "

1:07 am on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member kaled is a WebmasterWorld Top Contributor of All Time 10+ Year Member



This sounds like a clever idea. It does not sound like a good one.

Kaled.

1:58 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



plasma -- thanks for your reply. Part of what I am curious about is the duplicate content. I was under the impression that Google detected and ignored duplicate content -- am I mistaken?

kaled -- can you expand upon your concerns, or is this more of a gut feeling. Do you think a 301 redirect would be any better? Or adding cgi args (e.g. widget1-detail.htm?src=nav)?

Thanks both for your replies. Not exactly what I want to hear, but better now than later :-)

2:11 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



Yes, you are mistaken.

Google as of today is allowing duplicate content.

widgetpage.com and pageofwidgets.com and widgetspage.com are all identical pages and they are all showing up in the SERPs, one right after the other.

They even appear to have turned off the ability to report such things to them, as another thread is discussing.

2:18 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



Wow. Thank you very much for correcting my false impression.

Plan B or C it is :-)

2:25 pm on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member kaled is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Others know far more than I do on this sort of stuff. However, I would ensure that different urls that result in the same content are differentiated by use of the query part (i.e. text following?).

Unless this cannot be achieved for some technical reason, this approach is likely to cause the fewest problems in the long run.

You might consider placing parameters after a # char. This would have the advantage that spiders will not keep crawling the same page, etc. You would have to do some research on this.

Kaled.

2:57 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



kaled -- thanks again for your response. The query string approach seems simple, although if Google saw the two variants as different pages it would still leave me not knowing what would happen with PR and so forth.

The # anchor idea is a good idea. Is it certain that Google ignores anything after the # in a URL? If so, that would be a completely fine and simple solution.

Thanks again.

3:12 pm on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member kaled is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Is it certain that Google ignores anything after the # in a URL? If so, that would be a completely fine and simple solution.

Am I certain? NO However, I've never noticed duplicate content that appeared to result from such links so I guess I'm 99.9% sure.

There was some discussion on this a day or two ago. I believe it was said that everything after the # is reserved for use by browsers and is ignored by search engines. When I read it, it seemed irrelevant to me so I did not take careful note.

You would have to do some experimentation to see how browsers behave, however, I would not anticipate a problem.

Kaled.

PS
If its purely for internal use within a website, you could consider using the robots meta tag NOFOLLOW with your orginal plan. This would avoid duplicate content issues. Also if you use javascript.write to create links dynamically, these will not be followed by robots.

3:20 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



Depending on the complexity of the query_string Google might choose NOT to crawl these urls at all.
mod_rewrite is ok for 'nice' urls.
But as soon as duplicate content could appear I strongly suggest 301.
4:07 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



The # anchor idea is a good idea. Is it certain that Google ignores anything after the # in a URL? If so, that would be a completely fine and simple solution.

The question is a different one:
Does the server _see_ the hash?
If it doesn't see it, you can't use it for your purpose.

4:41 pm on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member kaled is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The question is a different one:
Does the server _see_ the hash?
If it doesn't see it, you can't use it for your purpose.

Not entirely sure what you mean. However, if necessary, javascript can be used to extract the # part of the url and pass to perl/php script on the server. It's not pretty but it's not difficult either. In any case, javascript will be needed if targetted links (to a location within a page) need to be tracked (to separate the target string from the ID and jump to the target in the page).

I'm not recommending use of the # method, simply suggesting it for consideration.

Using the OnClick event might be more suitable for use within a single website.

Kaled.

4:52 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



You could also hide the extra links on the page so that spiders can't see them. E.g. make the navbar link in the ordinary format and make additional links in javascript.
6:13 pm on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



>> Google as of today is allowing duplicate content. <<

Yes, it is, but it slowly filters it out. I am watching a page that was indexed 8 times, and appeared in results in 8 consecutive entries. There was a mix of www and non-www entries, with and without a virtual directory path, and with several dynamic variables. Over the last 5 weeks the 8 entries have been reduced to three.

>> If its purely for internal use within a website, you could consider using the robots meta tag NOFOLLOW with your orginal plan. <<

On the other hand, if you do really have separate pages with the same content, use the noindex,follow tag on the ones that you do not want listed.

8:32 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



g1smd hit it right on - just put a noindex tag on one of the pages
9:14 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



Thanks to all for great suggestions and information.
1:20 am on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member



As a side note, I've got a page that has always had normal Page Rank. I did another "specialty" site for that subject and put that page on the other as well because it just plain belongs there.

At this point in time that page on the newer site has normal Page Rank for it's location in the navigation and the original page on the original site, instead of having the PR it always did, is now PR0.

10:44 pm on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



As I said, above, put the noindex,follow robots meta tag on all of the pages that you do NOT want to be indexed, otherwise Google will choose for you which pages to drop from the listings.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month