Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Home Page is Listed 100's of Different Ways

exmaple.com?123, etc.

         

toughturkey

9:43 pm on Jun 29, 2005 (gmt 0)

10+ Year Member



When I do a site:example.com search, I see Google has my index page listed about 200 times in the following way:

www.example.com
www.example.com?x=1
www.example.com?x=2
www.example.com?x=3

You get the picture? Is there a way to return a 410 for the extra results without affecting the original page?

Any comments appreciated. thanks

[edited by: ciml at 1:47 pm (utc) on June 30, 2005]
[edit reason] Examplified [/edit]

toughturkey

2:55 pm on Jun 30, 2005 (gmt 0)

10+ Year Member



no takers?

SlyOldDog

3:04 pm on Jun 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Change the variable x to id

google hates variables called ID so that might work.

bird

3:31 pm on Jun 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't link to your homepage with parameters in the URL.

If anyone arrives at your homepage with parameters in the URL, redirect them via 301 to the parameterless URL.

toughturkey

4:13 pm on Jun 30, 2005 (gmt 0)

10+ Year Member



no one is coming via those .com?=x links anymore except the SE's, so a 301 sounds like the answer all right.

What would that look like, Something like this perhaps?

redirect 301 /?x=* http://www.example.com

eduardomaio

8:03 pm on Jun 30, 2005 (gmt 0)

10+ Year Member



Are you using PHP? If you are use this code in top of everything


<?php
if ($x <> "") {
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.domain.com");
exit;}
?>

Hope it helps!

walkman

9:03 pm on Jun 30, 2005 (gmt 0)



you need to 301 all of them to /, either via PHP or apache rewrite. Making disappear the existing ones first is best though.
Otherwise, it's really, really bad. According to Google you have 200+ IDENTICAL pages. I use rewrite for all my pages so I have banned Google from indexing any URLs with? . No one can stop someone from linking to www.yourdomain.com?you-just-got-a-dupe

toughturkey

9:53 pm on Jun 30, 2005 (gmt 0)

10+ Year Member



thanks for the tips, I am on a steep learning curve all right. I just found the apache forum so I'll see what's there.

thanks!

enotalone

1:56 am on Jul 1, 2005 (gmt 0)

10+ Year Member



2-3 weeks ago I noticed the same thing on our site: in our case it was a phpbb session variable sid, but 503 copies of main page in the index.

As soon as i noticed it did what bird suggested you to do (301s) but about 2-3 past and these 503 copies are still in the index. Though they are supplemental but i would still rather not have them.

toughturkey

6:31 am on Jul 1, 2005 (gmt 0)

10+ Year Member



enotalone - yes that sounds like the same situation. Mine also appear as supplementary. Could you share the code you used? I am struggling with this a bit.

enotalone

10:40 am on Jul 1, 2005 (gmt 0)

10+ Year Member



my code is in php and almost the same as what eduardomaio posted above.

it simply checks for the presense of a variable and redirects if the variable is found.

you could do it through .htaccess too but i am not good at that.

if you can use php just copy eduardomaio's code to the top of your main page.

Wizard

12:33 pm on Jul 1, 2005 (gmt 0)

10+ Year Member



If you want to prevent any URLs with?, like mentioned before www.example.com?you-got-a-dupe, I'd recommend something like the following PHP code:

<?

if (isset($QUERY_STRING) and $QUERY_STRING)) {

if (410) { // you can either send 410 Gone
header("HTTP/1.1 410 Gone");
}

else { // or redirect permanently with 301
header("HTTP/1.1 301 Moved Permanently");
header("Location: [domain.com");...]
}

exit;
}

?>

In contrary to the previous PHP example, this is independent of the name of variables after? mark.

Using mod_rewrite is a bit more tricky, because using simple RewriteRule may have problems with seeing what's after quotation mark. You'd probably have to use RewriteCond on QUERY_STRING variable.

toughturkey

3:34 pm on Jul 1, 2005 (gmt 0)

10+ Year Member



hmmm, so i would have to change my index.html to index.php to be able to use that snippet?

Thanks again for everyones help.

claus

10:29 pm on Jul 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I saw it coming in February, so here's the complete recipe on how to fix that particular problem on the Apache web server:

[webmasterworld.com...]

(you don't need php and you don't have to change any file names)

garyr_h

3:49 am on Jul 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For those of you looking at this thread and need to use the?x=1 in the url, I strongly suggest using eduardomaio's method. Just be sure to add your cookie/script above the redirect so it passes through.

This will tell Google and others not to index that page, but will still allow you to set cookies on the visiters browser.

eduardomaio

11:37 am on Jul 2, 2005 (gmt 0)

10+ Year Member



I'm using that method in one website and Google never indexed again an old variable.

If your have index.htm and want to keep it you need to add to your .htaccess file this


AddType application/x-httpd-php .php .html

toughturkey

2:06 pm on Jul 2, 2005 (gmt 0)

10+ Year Member



Well nothing seems to affect the Header Output after reading a ton of posts here on the topic and toying with code for hours now. It always returns a 200. Here is what I am using:

RewriteEngine on
RewriteCond %{QUERY_STRING}!^$
RewriteCond %{REQUEST_URI}!/boards/
RewriteRule .* http://www.example.com/ [R=301,L]

(there is a space before the quotation mark). My htaccess works fine otherwise with about 10 functioning 410 redirects in it. Any thoughts on what my next step could be?

I'd really like to avoid going to PHP.

claus

10:20 pm on Jul 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Toughturkey, I read your example code as follows:

-------------------------
1) Turn on the rewrite engine
2) If there is a query string, AND
3) The directory "/boards/" is not the current one, THEN
4) Redirect all requests to the home page with a 301, AND
5) Do no more for that request ( the [L] flag)
-------------------------

So, it ought to work.

Have you used the [L] flag earlier in your ".htaccess" file? If yes, try removing it from the previous rule(s).

Have you turned on the rewrite engine two or more times in your ".htaccess" file? If yes, try doing it only once.

Is "/boards/" a physical directory, or is it a virtual one that only exists as a rewrite? (ie. it's really a parameter in a dynamic URL). If so, the real URL must be used in stead.

Apart from the above three, I really can't see what could be wrong here. So, try posting in the Apache forum if these tips don't help.

MikeNoLastName

10:41 pm on Jul 5, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Claus,
I read and tried your earlier post (I read it yesterday, see other thread I started a couple days ago about duplicate listings with?). Although simple and elegant, the way it is, it will end up in an infinite loop because the QUERY-STRING is not stripped, and when the browser (or search engine) tries to get the 301 redirected page on the SAME SERVER again, the .htaccess takes over again and keeps redirecting (or 404'ing, etc)
The only effective way I found was to do a 301 redirect to Google.com which returns a 404 error. (or maybe a 302 if you prefer ;). I suppose you could also redirect it to a non-existant page on some other server of your own, as long as you are NOT using this code on it.
Also Clauses original code traps for ALL QUERY-STRINGS which is not always so helpful if you have places where they ARE needed. In those cases you can simply replace the!^$ with "123" or whatever for EACH AND EVERY ONE of the bad indexes that you wish to remove as in the following which I am currently using and which seems to work fine:

RewriteCond %{QUERY_STRING} 123
RewriteRule .* http://www.example.com/ [R=301,L]
RewriteCond %{QUERY_STRING} 1234
RewriteRule .* http://www.example.com/ [R=301,L]

That's the easy part... Getting G to respider quickly them is the hard part :)

claus

8:21 pm on Jul 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> infinite loop

I'm sorry if I forgot to double check the rewrite rule against this. If you pass on everything with the "$1" parameter, then of course the query string will be passed on as well.

So, this will make a loop:

RewriteRule (.*) http://www.example.com/$1 [R=301,L]

... while this will not:

RewriteRule .* http://www.example.com/ [R=301,L]

I personally redirect my unwanted query strings to a non-existing page (that way serving a 404), so I haven't experienced this problem myself.

Right this moment I have no idea about how to solve it, but I'll take a look at the threads to leave a remark at least (if possible).

theBear

8:40 pm on Jul 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is what I'm using for two sites that have to have access to variables for some scripts.

All scripts are perl and reside inside the cgi-bin folder structure.

This ruleset issues a forbiden error. This is a custom error page on our servers.

You could issue a 410 gone by changing the [F] to a [G].

If you have other scripting systems you'll have to modify this rule set to cover them as well.

If your site is static html this would work as well.


RewriteCond %{QUERY_STRING}!^$
RewriteCond %{REQUEST_URI}!^/cgi-bin
RewriteRule ^(.*)$ - [F]

Claus I'm chasing a better way to handle this mess but I'm running into a wall currently.

Testing is a royal pain as the .htaccess file is getting complicated.

claus

12:03 am on Jul 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem with "forbidden" is that this one assumes that there is content, which there is no access to. This is clearly not the case.

The problem with "gone" is that this one assumes that there was content, which there is no longer access to. This is clearly not the case either.

You would want to issue a 404 in stead:

--------------------------------------------- 
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{REQUEST_URI} !^/cgi-bin
RewriteRule .* /any-filename-that-does-not-exist.htm
---------------------------------------------

By making an internal redirect (no "R" flag) to a filename that does not exist, your normal 404-rules will take over and display whatever page you have set up to handle a 404 error (or the default, if none).

Of course, the server status code "404" will be issued as well.


I should add that Google in particular doesn't always consider pages as non-existing just because Googlebot gets a 404. Perhaps this is intentional, so that they don't accidentally delete a page because a server is down, or whatever.

However, using the URL-console you can remove your pages if they return a 404, or you can redirect them using the code above, to a real, existing, html file with the robots meta tag "noindex".

theBear

12:20 am on Jul 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Claus,

My version of Apache goes 500 on me if I do the rewriterule .* /no-such-file.htm

Otherwise I would have.

toughturkey

6:32 pm on Jul 10, 2005 (gmt 0)

10+ Year Member



To update the situation from the original post (and by the way thanks to everyone for the suggestions) but what I did was make the change to PHP and used eduardomaio's suggestion. And it works great.

claus

9:01 pm on Jul 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> Apache goes 500

Well, that's odd, but then it is unexpected that somebody would redirect to something that's not there, so in a sense the server's right.

Just kidding of course. The 500 error is sort of a fallback status code when something went wrong and no other code seems to make sense, so it could be anything. I'd look through the error logs to see if I could get any closer to the core of the problem.

In the meantime I'd probably use the [G] or [F] flag in stead.

jd01

9:55 pm on Jul 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteCond %{THE_REQUEST} \?.+\ HTTP/ [NC]
RewriteRule (.*) /$1? [R=301,L]

Might help you guys out...

Justin