homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Two different URLs for same page?
How does Google handle this case
sublime1

10+ Year Member



 
Msg#: 18128 posted 5:13 pm on Nov 6, 2003 (gmt 0)

We would like to implement a mechanism for tracking which link a user clicked on a page when two or more links on that page go to the same other page. Our purpose is purely to identify and track user behavior in order to improve the site experience.

We would like to embed a tag inside the filename part of the URL (rather than adding a CGI name-value pair).

For example current page name might be "widgetA-details.htm". On a page that links to this page more than once, for example the one in the left navigation bar we might put "widgetA-nav-details.htm" while the same link in the body of the page we might put "widgetA-body-details.htm".

We would implement this using apache rewrite rules, not as a 301 redirect (because the redirect would result in a spurious hit in our logs). So from the outside world, it looks like two URLs, but the resulting page will be identical.

How would Google handle this? Presumably duplicate detection would see that they are the same page, but which one gets picks as the "right" page? How about PR -- does it flow to the "right" page or both pages, and will this cause some dilution of PR. Is this something Google would see as an attempt to spoof them?

Thanks all for your thoughts on this situation.

 

plasma

10+ Year Member



 
Msg#: 18128 posted 10:41 pm on Nov 6, 2003 (gmt 0)

Without 301 it will be duplicate content.
We used to do the same, but then we were afraid because of all the duplicate content and now we do it the proper way.
Also it looked ugly in the SERPs.

If the logs are your only concern: just write a filter to ignore the 301s e.g.: egrep -v " 30[12] "

kaled

WebmasterWorld Senior Member kaled us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 18128 posted 1:07 am on Nov 7, 2003 (gmt 0)

This sounds like a clever idea. It does not sound like a good one.

Kaled.

sublime1

10+ Year Member



 
Msg#: 18128 posted 1:58 pm on Nov 7, 2003 (gmt 0)

plasma -- thanks for your reply. Part of what I am curious about is the duplicate content. I was under the impression that Google detected and ignored duplicate content -- am I mistaken?

kaled -- can you expand upon your concerns, or is this more of a gut feeling. Do you think a 301 redirect would be any better? Or adding cgi args (e.g. widget1-detail.htm?src=nav)?

Thanks both for your replies. Not exactly what I want to hear, but better now than later :-)

quotations

10+ Year Member



 
Msg#: 18128 posted 2:11 pm on Nov 7, 2003 (gmt 0)

Yes, you are mistaken.

Google as of today is allowing duplicate content.

widgetpage.com and pageofwidgets.com and widgetspage.com are all identical pages and they are all showing up in the SERPs, one right after the other.

They even appear to have turned off the ability to report such things to them, as another thread is discussing.

sublime1

10+ Year Member



 
Msg#: 18128 posted 2:18 pm on Nov 7, 2003 (gmt 0)

Wow. Thank you very much for correcting my false impression.

Plan B or C it is :-)

kaled

WebmasterWorld Senior Member kaled us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 18128 posted 2:25 pm on Nov 7, 2003 (gmt 0)

Others know far more than I do on this sort of stuff. However, I would ensure that different urls that result in the same content are differentiated by use of the query part (i.e. text following?).

Unless this cannot be achieved for some technical reason, this approach is likely to cause the fewest problems in the long run.

You might consider placing parameters after a # char. This would have the advantage that spiders will not keep crawling the same page, etc. You would have to do some research on this.

Kaled.

sublime1

10+ Year Member



 
Msg#: 18128 posted 2:57 pm on Nov 7, 2003 (gmt 0)

kaled -- thanks again for your response. The query string approach seems simple, although if Google saw the two variants as different pages it would still leave me not knowing what would happen with PR and so forth.

The # anchor idea is a good idea. Is it certain that Google ignores anything after the # in a URL? If so, that would be a completely fine and simple solution.

Thanks again.

kaled

WebmasterWorld Senior Member kaled us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 18128 posted 3:12 pm on Nov 7, 2003 (gmt 0)

Is it certain that Google ignores anything after the # in a URL? If so, that would be a completely fine and simple solution.

Am I certain? NO However, I've never noticed duplicate content that appeared to result from such links so I guess I'm 99.9% sure.

There was some discussion on this a day or two ago. I believe it was said that everything after the # is reserved for use by browsers and is ignored by search engines. When I read it, it seemed irrelevant to me so I did not take careful note.

You would have to do some experimentation to see how browsers behave, however, I would not anticipate a problem.

Kaled.

PS
If its purely for internal use within a website, you could consider using the robots meta tag NOFOLLOW with your orginal plan. This would avoid duplicate content issues. Also if you use javascript.write to create links dynamically, these will not be followed by robots.

plasma

10+ Year Member



 
Msg#: 18128 posted 3:20 pm on Nov 7, 2003 (gmt 0)

Depending on the complexity of the query_string Google might choose NOT to crawl these urls at all.
mod_rewrite is ok for 'nice' urls.
But as soon as duplicate content could appear I strongly suggest 301.

plasma

10+ Year Member



 
Msg#: 18128 posted 4:07 pm on Nov 7, 2003 (gmt 0)

The # anchor idea is a good idea. Is it certain that Google ignores anything after the # in a URL? If so, that would be a completely fine and simple solution.

The question is a different one:
Does the server _see_ the hash?
If it doesn't see it, you can't use it for your purpose.

kaled

WebmasterWorld Senior Member kaled us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 18128 posted 4:41 pm on Nov 7, 2003 (gmt 0)

The question is a different one:
Does the server _see_ the hash?
If it doesn't see it, you can't use it for your purpose.

Not entirely sure what you mean. However, if necessary, javascript can be used to extract the # part of the url and pass to perl/php script on the server. It's not pretty but it's not difficult either. In any case, javascript will be needed if targetted links (to a location within a page) need to be tracked (to separate the target string from the ID and jump to the target in the page).

I'm not recommending use of the # method, simply suggesting it for consideration.

Using the OnClick event might be more suitable for use within a single website.

Kaled.

PhilC

10+ Year Member



 
Msg#: 18128 posted 4:52 pm on Nov 7, 2003 (gmt 0)

You could also hide the extra links on the page so that spiders can't see them. E.g. make the navbar link in the ordinary format and make additional links in javascript.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 18128 posted 6:13 pm on Nov 7, 2003 (gmt 0)

>> Google as of today is allowing duplicate content. <<

Yes, it is, but it slowly filters it out. I am watching a page that was indexed 8 times, and appeared in results in 8 consecutive entries. There was a mix of www and non-www entries, with and without a virtual directory path, and with several dynamic variables. Over the last 5 weeks the 8 entries have been reduced to three.

>> If its purely for internal use within a website, you could consider using the robots meta tag NOFOLLOW with your orginal plan. <<

On the other hand, if you do really have separate pages with the same content, use the noindex,follow tag on the ones that you do not want listed.

Marval

10+ Year Member



 
Msg#: 18128 posted 8:32 pm on Nov 7, 2003 (gmt 0)

g1smd hit it right on - just put a noindex tag on one of the pages

sublime1

10+ Year Member



 
Msg#: 18128 posted 9:14 pm on Nov 7, 2003 (gmt 0)

Thanks to all for great suggestions and information.

Marcia

WebmasterWorld Senior Member marcia us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 18128 posted 1:20 am on Nov 8, 2003 (gmt 0)

As a side note, I've got a page that has always had normal Page Rank. I did another "specialty" site for that subject and put that page on the other as well because it just plain belongs there.

At this point in time that page on the newer site has normal Page Rank for it's location in the navigation and the original page on the original site, instead of having the PR it always did, is now PR0.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 18128 posted 10:44 pm on Nov 8, 2003 (gmt 0)

As I said, above, put the noindex,follow robots meta tag on all of the pages that you do NOT want to be indexed, otherwise Google will choose for you which pages to drop from the listings.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved