homepage Welcome to WebmasterWorld Guest from 54.227.41.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google including pages with garbage after question mark
smallcompany




msg:3995188
 7:59 pm on Sep 24, 2009 (gmt 0)

Since some time ago I started seeing pages in Google webmaster tools that are linked to with the variables that come after question mark.

For example, under HTML suggestions, I have two pages described as with duplicate tags, while they're not. They're just:

page.html
page.html?something

This has not been the case few months ago or so.

Do I have to take care of this myself, by configuring robots.txt and Apache so such pages do not get picked up?

Thanks

 

tedster




msg:3995674
 2:54 pm on Sep 25, 2009 (gmt 0)

Different query strings definitely are different urls - so this is a canonical url issue and you do need to take care of it.

There are many ways to go:

1. Configuring the server so that spurious query strings receive a 404 response is ideal.

2. Blocking the bad query string in robots.txt is also OK, but since these urls are already indexed, you may also need to do a removal request.

3. The canonical meta tag is another way to go.

4. There's also the new query parameters function from within Webmaster Tools.

jd01




msg:3995713
 3:48 pm on Sep 25, 2009 (gmt 0)

I'm not sure if this is what you are saying or not, but most servers serve a page with a Query_String, just like the original... This means if I (yes, I) was your competitor I (yes, without access to your site) can 'duplicate' your content to an algo, by simply linking to:

yourpage.html?duplicate=1
yourpage.html?duplicate=2
yourpage.html?duplicate=3

Then submitting MY page containing the links to the search engine of my choice... This is why I strip Query_Strings from my URLs with Mod_Rewrite:

RewriteCond %{QUERY_STRING} .
RewriteRule .? http://www.example.com%{REQUEST_URI}? [R=301,L]

But, as I was just reading, there is Almost Nothing a competitor can do to harm your rankings... (Unless, of course they know more about search engines and websites than the lawyer reviewing the terminology on the page. Then they might be able to do quite a bit...)

smallcompany




msg:3995804
 6:01 pm on Sep 25, 2009 (gmt 0)

Many thanks!

There's also the new query parameters function from within Webmaster Tools.

I like how you can add specific parameters to be ignored. I just did it. Now I wonder how this affects existing incoming links that incorporate those parameters? Are just parameters ignored or links at whole?
To me it sounds like it ignores parameters only.

RewriteCond %{QUERY_STRING} .
RewriteRule .? http://www.example.com%{REQUEST_URI}? [R=301,L]

This is what I ideally wanted to do but always dreaded because I was afraid I would lose my data carried through variables and written into cookies.
I guess I should (finally) test it, rather then just talk about it.

jd01




msg:3995974
 11:57 pm on Sep 25, 2009 (gmt 0)

This is what I ideally wanted to do but always dreaded because I was afraid I would lose my data carried through variables and written into cookies.
I guess I should (finally) test it, rather then just talk about it.

You would with that, because it strips all of them...

You have a few options:

1.) Change the .htaccess Query_String pattern from . (dot) to a more definite matching pattern, so it checks to see if the Query_String is valid and if the Query_String does not match the pattern, remove it.

(Your script would have to be adjusted to work correctly, and it depends on what your Query_Strings are, but you could add a 'start' or 'end' pattern to all variables allowed to be passed in a Valid Query_String. EG var=abc or var=bca could both be valid now, but could be changed to be var=Aabc and var=Abca, then you could use Mod_Rewrite to check for A @ the beginning of each var= and if it's not there, you know it's spoofed.)

2.) If you only allow Query_Strings on certain pages, check them for a proper match of the pattern you use, then strip the Query_String from the rest.

3.) Remove the rule from the .htaccess file, and move the removal process to PHP. Then in the PHP check for an 'exact match' of allowed Query_Strings generated and if the Query_String is not valid, strip it or return a 404 error.

There's some other things you can do using a combination of the above or by expanding on them a bit, but I hope this give you some ideas.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved