Forum Moderators: open

Message Too Old, No Replies

How does Google handle identical but diff-variable pages?

e.g., photos.php?var1=123&var2=555 and one with changed var2

         

ThatAdamGuy

2:24 am on Jan 11, 2004 (gmt 0)

10+ Year Member



Hi there,

I've posted this [webmasterworld.com] in the AdSense folder to get an idea of how the MediaBot treats a particular situation, and I hope I am not remiss for reposting a similar note to get feedback on how the general GoogleBot handles the same situation.

In a nutshell, I want to know how Google handles database driven pages when two are more have identical content but differing second variables, e.g.,:
somesite.com/photos.php?var1=123&var2=147
and
somesite.com/photos.php?var1=123&var2=623

Var2 contains user navigation info, such as which gallery is being browsed, in case you're curious.

I'm concerned that two problems may occur from this situation:
1) Google may see identical content on multiple pages with nearly-but-not-entirely matching URLs, and then penalize the site for duplicate content.
2) Various sites may all use different var2's in the linking to particular photo pages, thus diffusing those page's PR.

Am I correct in fearing these situations? If so, I'll do my best to have the photo gallery author implement the var2 in a cookie.

hutchins13

6:32 am on Jan 11, 2004 (gmt 0)

10+ Year Member



I have had no problems getting both pages indexed. The first variable is the item number and the second variable is the category the item is in. The two product pages are exactly the same except for the URL. Both have PR also.

Harwich

7:11 pm on Jan 12, 2004 (gmt 0)

10+ Year Member



I also have not had any issues with this. I use a multi variable url line. First variable is an action such as display and second variable is provides a category, sub-cat or specific product.

Good Luck
Harwich

allanp73

7:32 pm on Jan 12, 2004 (gmt 0)

10+ Year Member



I have hundred of product pages which were indexed with no problem.

g1smd

1:12 am on Jan 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



... yeah but the original question asked about the specific situation where the second variable is NOT data that makes the page content change.

I would say that in the long term that Google would drop the "duplicates", however, in the short term it may very well index the page many times.

I am aware of a page indexed about 5 times with sml00B.asp, sml00B.asp?Tx=1, sml00B.asp?Pf=1, and many other variants. The content is identical. The Pf=1 variant is the Printer Friendly version for example. I don't expect it to last. The number has slowly reduced from about 8 or 9.

skuba

5:39 pm on Jan 14, 2004 (gmt 0)

10+ Year Member Top Contributors Of The Month



For a long time I believed that Search Engines, specially Google, couldn't read URLs past certain punctuation (eg?, &, =) and spaces.
So, to make SE friendly site we always avoided having them on the URLs.
I am now on this project where there is a third party company developing the website. The site is mainly dynamic and has punctuation and what I guess is worse, spaces on the URLs.
I am in charge of the project and I was requesting them to make some changes to the code and also use mod-rewrite to substitute punctuation and spaces by dashes.
But I was doing some searches in google and actually I found some pages indexed that had punctuation and eben 1 page that had a space on the URL (%20)
http*//www.example.com/thumb.htm?dept_id=5&deptName=New%20Arrivals
I was wondering is google crawling dynamic pages now?
Even completely non SE friendly pages with spaces?

Or was that one just an exception and probably google would index A LOT more pages of the site if I got rid of punctuation and spaces?

What do you say?

Thanks a lot for the input

mikemcs

6:39 pm on Jan 14, 2004 (gmt 0)

10+ Year Member



I have this same set up and Google loves it... But at one time I had my vars 3 - 4 deep (Cat=4&Id=34&Something=else&another=thing) and that was a differnt story. I dont remember what the limit was but 2 is ok if I remember correctly.

skuba

8:01 pm on Jan 14, 2004 (gmt 0)

10+ Year Member Top Contributors Of The Month



I read that google don't like the word ID on the URLs because it looks likea session ID ( a trap).
But we got some pages that has parameter dept_id indexed.
But I think it's not as friendly having dynamic pages.
I guess google woul dindex a lot more pages if they looked like static pages.
I need to decide if I will pay the company to make the changes to the code and implement the mod_rewrite or not.

g1smd

1:07 am on Jan 15, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Avoid spaces in filenames (and in folder names too). Try to use hyphens or dots. Words separated with an underscore are NOT seen as separate words.

In dynamic URLs try to limit the query string to less than 3 variables.

Try to avoid using ID as part of the query, and avoid having any long number (more than about 6 digits I guess) in the query that might look like a session ID even if it isn't one.

seomike2003

3:23 am on Jan 15, 2004 (gmt 0)



I have hundreds of pages that use a one variable format
mysite.com/widget.php?id=1111
mysite.com/widget.php?id=1112

the thing i see is that google doesn't give the page any rank even though all of these pages show up page 1 in alot of the SERPs they target for. I know they can do better so I'm implementing a mod rewrite to change the url to
mysite.com/widget/1111
mysite.com/widget/1112
so now the SE's will think that widget and 1111 are directories and then it will score them :)
I can make them as dynamic as I want and still make them look like a directory :)

mikemcs

2:04 pm on Jan 15, 2004 (gmt 0)

10+ Year Member



I would have to agree with seomike2003 on this one. Not only does the directory Url function better for Google the URL just looks better. The problem for me is I have 100+ pages index with google and to have to struggle with converting them over and getting them list correctly, creating a redirect page and getting all my backlinks corrected seems a bit much. I know its possible with mod rewrite and a good custom 301 page but I am too scared to mess things up and lose traffic even for a month or 2. I would love to hear how someone else managed to do this effectively

danieljean

3:04 pm on Jan 15, 2004 (gmt 0)

10+ Year Member



I like the mod_rewrite because it just looks prettier to people surfing; I'm not at all concerned about Google indexing it.

In fact, I am running into a problem similar to that of ThatAdamGuy. On each product page I have a link to the same page with "&currency=CAD" or "&currency=USD", whatever the case may be. Problem is, that creates 3 pages for Google to index, and the GoogleBot is indexing nearly all of them.

I'm exploring various possibilities, including using javascript to place a cookie client-side that indicates either the currency in my case, or a navigation feature for ThatAdamGuy. Without a link, GoogleBot can't get confused, but this won't work for people without javascript enabled.

If I try to redirect from a page with a currency parameter to the very same page, the only thing that will change is the price- and if the GoogleBot goes back, say, to the homepage, the prices will have mysteriously changed from when they started crawling (and change every time they click on a currency change link!).

Another possibility would be to use mod_rewrite interpreting:

somesite.com/CAD/showProduct?productId=125
somesite.com/USD/showProduct?productId=125

as:
somesite.com/showProduct?productId=125&currency=CAD
somesite.com/showProduct?productId=125&currency=USD

Of course, I'm not sure I like what that would do to page rank, what with having to redirect people based on IP/locale from the very front page, something which would also confuse Google- although there are more pages which should offset the loss of PR. But then, they are mostly dupes... :(

GoogleGuy- If you're reading this, any insight would be appreciated.

seomike2003

5:19 pm on Jan 15, 2004 (gmt 0)



You won't have to change anything. All a mod rewrite does is translate /widget/1111 back to widget.php?id=1111 through the handler before it runs it through the php engine on the back end.

So old SERP links using widget.php?id=2222 will work. Then in come the spiders and reindex the site and discover new linking and ranking widget/2222. For a while you might have twice as many results :)

danieljean

5:50 pm on Jan 15, 2004 (gmt 0)

10+ Year Member



seomike, I understand quite well what mod_rewrite does.

What happens though when there is an extra parameter that can have multiple values for the exact same page- is there a way to avoid getting GoogleBot confused, and getting duplicate pages in their index?

seomike2003

5:59 pm on Jan 15, 2004 (gmt 0)



DanielJ

All a mod rewrite does is assign a variable to a variable like this query right here
somesite.com/USD/showProduct?productId=125
I would assign a mod rewrite variables to translate the incoming page request

product/=showproduct.php
category/=productId
scented_widgets=125

so then I can change my linking to
somesite.com/USD/product/category/scented_widgets
mod rewrite changes it back to this when the link is clicked
somesite.com/USD/showProduct?productId=125
so the php engine can output the correct query

I can have 50 variables in a query and optimize everylast one of them and turn them into a directory instead of a query variable. So imagine implementing that to a shopping cart. I can optimize every item to rank and so on.

The only down fall is the server load. It can get heavy with alot of queries.

A good example of optimization is nextag.com
they have a mod rewrite going big time and they rank for items in their site down to the very names of the items they sell.
[google.com...]

that is good SEO.
sticky me if you have questions :)

skuba

7:08 pm on Jan 15, 2004 (gmt 0)

10+ Year Member Top Contributors Of The Month



I won't be able to get rid of all spaces. There are spaces on dept names and product names.
I can't replace the spaces by dashes "-" in those cases. The names come from the database, if I put dashes in there, the product names will be displayed with the dashes on the site.
So Product Name XYX will be displayed
Product-Name-XYZ...

Any ideas of how to work around that?

mikemcs

7:09 pm on Jan 15, 2004 (gmt 0)

10+ Year Member



seomike2003 having two pages listed in google with the same content from what I have read is bad news. I sound like I am just tring to get out of some work here lol but I really think its more that I am not sure the best process to do this.

g1smd

8:52 pm on Jan 15, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




>> I can't replace the spaces by dashes "-" in those cases. The names come from the database, if I put dashes in there, the product names will be displayed with the dashes on the site. <<

Sure you can put dashes in the filenames. You're using a script to run the site, so you can do anything you like. You'll need to alter the script so that the information from the database is printed without dashes if it is just a description on screen, but the dashes are added if it is a filename.

skuba

9:18 pm on Jan 15, 2004 (gmt 0)

10+ Year Member Top Contributors Of The Month



Well, I don't know. The guy from the company that developed the site says that the variable reads the name directly from the database, so that's why it will always show with the spaces.
What do you think?
I guess he is just trying to talk me into not doing those changes to the site...

danieljean

10:01 pm on Jan 15, 2004 (gmt 0)

10+ Year Member



skuba- he's either trying to get out of work or not knowledgeable. If a dash is never (and I stress _never_) going to be used in the database, before making a call you could replace all dashes with spaces- and replace all spaces with dashes when you create links.

seomike- I'm sorry, I don't think I was clear.

I would just as soon avoid having two URLs like this:
somesite.com/USD/showProduct/125
somesite.com/CAD/showProduct/125

The only difference would be the price, and I am afraid that Google would penalize it for being duplicated content. I would much prefer having only the one URL:

somesite.com/showProduct/125

g1smd

10:23 pm on Jan 15, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not a database programmer so I don't know the syntax but this is what you need:

READ $word FROM database.
$description = $word.
$thepagename = REPLACE (" " WITH "-") USING $word

Then you use one of these two variables ($description or $thepagename) in place of each occurance of $word in the script.

One of the regular PHP or ASP gurus here could probably write a few lines of real code in as many minutes.