Welcome to WebmasterWorld Guest from 54.196.238.210

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Page displays with spurious query string on URL

I need it to return 404-Not Found

     

troyid

12:45 am on Jan 25, 2010 (gmt 0)

10+ Year Member



I have detected a problem after discovering a page that Google has indexed.

Google has indexed identical copies of a page.

www.domain.com/directory/Oceania/Australia/

and

www.domain.com/directory/Oceania/Australia/?ID=462

The page with ?ID=462 should display a 404 error.

How can I set this up in my htaccess file?

jdMorgan

5:01 am on Jan 25, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Why does Google display this URL... Where did it find the link?

The correct solution depends on that.

Jim

troyid

5:14 am on Jan 25, 2010 (gmt 0)

10+ Year Member



Goodness know's how. Probably an external site. Anyway, I need the ?ID pages to 404.

g1smd

8:05 am on Jan 25, 2010 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Does Google WebmasterTools hint where the link is?

Does the link send traffic? If it does, I might be tempted to 301 redirect to the correct URL.

troyid

8:27 am on Jan 25, 2010 (gmt 0)

10+ Year Member



No it doesn't send any traffic. I really want to 404 it.

g1smd

11:12 am on Jan 25, 2010 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



This should work:

RewriteCond %{QUERY_STRING} &?ID=462&?
RewriteRule ^directory/Oceania/Australia - [F]

If you need id and Id and iD to also fail, add [NC] to the condition.

If you want any and all ID numbers to fail, remove the 462&? part from the condition.

troyid

6:37 pm on Jan 25, 2010 (gmt 0)

10+ Year Member



Thanks. Works great!

troyid

10:22 pm on Jan 26, 2010 (gmt 0)

10+ Year Member



I would like to know what the rewrite rule would be if this was the situation.

www.domain.com/?ID=462

g1smd

12:10 am on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Clears any ID appended to any path:

RewriteCond %{QUERY_STRING} &?ID=
RewriteRule .* - [F]

jdMorgan

12:27 am on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Gentlemen,

I believe a 404-Not Found was requested, and both code samples return 403-Forbidden.

Jus' sayin'...

Jim

troyid

3:25 am on Jan 27, 2010 (gmt 0)

10+ Year Member



Thanks for pointing that out Jim. I was going to say something but thought a 403 was better than nothing. A 404 would be better though.

troyid

3:31 am on Jan 27, 2010 (gmt 0)

10+ Year Member



g1smd, I can't use the last rewrite rule you posted as I have script that appends ?ID= to the url in my /scripts/ folder. I need something that only works in the root folder.

jdMorgan

4:12 am on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



On Apache 1.x or 2.x

# Create 404 on all root folder requests with query strings
# appended by rewriting to a file that does not exist
RewriteCond %{QUERY_STRING} &?ID=
RewriteRule ^[^/]*$ /non-existent-file.hmtl [L]

On Apache 2.x, you can try this (untested) :

# Return 404 response on all root folder
# requests with query strings appended
RewriteCond %{QUERY_STRING} &?ID=
RewriteRule ^[^/]*$ - [R=404,L]

Jim

troyid

4:26 am on Jan 27, 2010 (gmt 0)

10+ Year Member



Jim, I tried the untested one and it works perfectly. Exactly what I wanted.

troyid

4:29 am on Jan 27, 2010 (gmt 0)

10+ Year Member



Jim, what would be the equivalent of the untested rewrite rule for?

RewriteCond %{QUERY_STRING} &?ID=462&?
RewriteRule ^directory/Oceania/Australia - [F]

jdMorgan

4:50 am on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Just swap in the RewriteRule pattern.

You might do better by deciding exactly which directories should 404 if a query string is present, and which should not and then listing them. Then reduce these lists by removing all but the "common paths" from directories which should all be treated in common -- For example, if all subdirectories of "/directory/" should be 404'ed when a query string is present, then "directory/" is all you need to match in the pattern. Take the shorter or least-likely-to-change (should or should not 404) list and code for that.

It might be that simply excluding the specific path to your script directory, and perhaps the path to your Web-accessible "stats" directory would be an easy, compact solution. But you have to decide, as pretty much all hosts configure their servers differently.

If you read our Forum Charter [webmasterworld.com], you'll find that while we're happy to get you started with examples or to help you to fix a *difficult* bug, we cannot write your code for you. Please check out the documents cited in that charter, using them to decipher the examples above, and see if you can help us help you...

Thanks,
Jim

crobb305

5:27 am on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



While we are talking about query strings, I can access my index page using /?
I can also access my internal pages with /file.htm? (or /file.htm?blah)

I have tested what I thought would work, but I am having trouble getting there:

rewriteCond %{QUERY_STRING} .
rewriteRule (.*) http://www.example.com/$1? [R=301,L]

I know I don't have it right yet, and I am probably not putting it in the right order with respect to my other redirects. Any suggestions?

g1smd

9:34 am on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



404? 403? Pfft, blame it on a late night.

Thanks. Works great!

I was going to say something ...

You should have, then I would have picked up the error rightaway. :)

I had meant to post this:

RewriteCond %{QUERY_STRING} &?ID=
RewriteRule .* /does-not-exist [L]

but the conversation has moved on a bit since then.

jdMorgan

3:30 pm on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



On review, I think it would also be a good idea to clear the query string. I'm not sure, but we might get a loop if we don't:

On Apache 1.x or 2.x


# Create 404 on all root folder requests with query strings
# appended by rewriting to a file that does not exist
RewriteCond %{QUERY_STRING} &?ID=
RewriteRule ^[^/]*$ /non-existent-file.hmt[b]l?[/b] [L]

Since the documented behavior of Apache 2.x is to discard the substitution-path, it shouldn't be necessary to modify the Apache-2.x-specific code I posted above.

---

crobb305,

If you get 'naked question mark' requests, the code above won't work because the question mark is a delimiter between the URL-path (or optional fragment identifier/named-anchor) and the query string. Therefore it is not visible in either the URL-path examined by RewriteRule or the %{QUERY_STRING} variable.

So in order to 'see' it, we have to look at %{THE_REQUEST} :


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^?\ ]*\?[^\ ]*\ HTTP/
RewriteRule ^ /non-existent-file.hmtl? [L]

Note that in both snippets above, the question mark not followed by a query string at the end of the RewriteRule substitution path serves only as a mod_rewrite operator to clear the originally-requested query string. This question mark *will not* appear in the rewritten path.

Jim

jdMorgan

3:49 pm on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Note also that the code I just posted is still somewhat specific to the case where you want to 404 the request. I generally recommend using a 301-Moved Permanently redirect to remove the spurious question mark or query string if the URL is otherwise "good."

# If a spurious query string delimiter and/or query string is appended
# to an otherwise-valid URL, externally redirect the request to strip
# off the query string delimiter and query (else just let it go 404).
RewriteCond $1 !^(forum/index\.php¦stats/)
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^?\ ]*\?[^\ ]*\ HTTP/
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

The RewriteRule pattern shown here matches any URL; It will be necessary to exclude all scripts (or their entire directories) if your site uses server-side scripts (including "stats" and/or a "control panel"). The first RewriteCond is just an example of such an exclusion, and the broken pipe "¦" character will need to be replaced with a solid pipe character if you use this RewriteCond.

Note that as discussed in several recent threads, file- and directory-exists checks should always be done last in order to avoid wasting a lot of server resources.

Jim

crobb305

7:31 am on Mar 27, 2010 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



jdMorgan,

Your last post seems to be working perfectly for cases where a spurious question mark occurs after or in the middle of an otherwise legitimate filepath.

For exmaple, it works perfectly for:
example.com/filename?blahblah -or-
example.com/filename?

But I am having problems getting the question mark removed when it occurs at the beginning of the filename as in:
example.com/?filename

In this case, it redirects to the homepage.

Do you have any suggestions? I have been spending some time trying to learn this, and searching around. My fear is adding code to strip that question mark, and messing up the portion of your code that is working perfectly.

I appreciate your help.

crobb305

7:32 am on Mar 27, 2010 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



jdMorgan,

Your last post seems to be working perfectly for cases where a spurious question mark occurs after or in the middle of an otherwise legitimate filepath.

For exmaple, it works perfectly for:
example.com/filename?blahblah -or-
example.com/filename?

But I am having problems getting the question mark removed when it occurs at the beginning of the filename as in:
example.com/?filename

In this case, it redirects to the homepage.

Do you have any suggestions? I have been spending some time trying to learn this, and searching around. My fear is adding code to strip that question mark, and messing up the portion of your code that is working perfectly.

Also, I want to verify that I have placed this code in the proper order relative to other rules. I have placed it at the end of all canonical redirects, non-www to www, etc.

This is such a huge undertaking for me, but I have learned a lot :)

I appreciate your help.

jdMorgan

12:06 pm on Mar 27, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The code is correct as posted, since a request for "example.com/?<anything or nothing>" *is* a request for the "home page." If this "doesn't work" on your site, then the problem is one of site design.

Again, you'd do well to stop for a few days, think about this very carefully, and then make a list of all "bad" URLs and their desired dispositions. Only with a very-solid list of requirements can any correct coded solution be created, and lack of solid requirements leads to too-long threads like this one.

"Searching around" is fine, as long as what you are reading are mod_rewrite and regular-expressions tutorials and documentation (such as that cited in our Forum Charter). mod_rewrite code tends to be extremely case-specific, and therefore, you may search for years before finding that the on-line resource with a solution that matches your problem most closely is... this thread.

There really is no alternative but to learn to read and write the code yourself, and that is what this forum is really intended to help you to do.

Jim

crobb305

6:17 pm on Mar 27, 2010 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Well by "searching around", I do mean in the mod_rewrite tutorials and on this forum. Certainly, I wouldn't use some random code from an untrusted resource. I am trying to be very careful and meticulous.

I guess the reason the example.com/?filename was bothering me is because it stems from a legitimate request for "filename". Someone, somewhere, is (or was) linking to me this way, and Googlebot is requesting it. I just thought there was a way to strip that out "?". The code you shared above is working perfectly. I have excluded all scripts and everything is working fine. I will just allow the /?filename to redirect to the homepage and not fret over it, considering that it is the last remaining "404" I have to cover (as showing up in Google Webmaster Tools)

By the way, I didn't realize that I somehow double posted last night (worked very late). It looks like I tried to edit a post to add a sentence and make corrections, but for some reason it reposted.

Thanks again Jim for all your help.

Chris
 

Featured Threads

Hot Threads This Week

Hot Threads This Month