homepage Welcome to WebmasterWorld Guest from 50.17.162.174
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Page displays with spurious query string on URL
I need it to return 404-Not Found
troyid




msg:4067007
 12:45 am on Jan 25, 2010 (gmt 0)

I have detected a problem after discovering a page that Google has indexed.

Google has indexed identical copies of a page.

www.domain.com/directory/Oceania/Australia/

and

www.domain.com/directory/Oceania/Australia/?ID=462

The page with ?ID=462 should display a 404 error.

How can I set this up in my htaccess file?

 

jdMorgan




msg:4067081
 5:01 am on Jan 25, 2010 (gmt 0)

Why does Google display this URL... Where did it find the link?

The correct solution depends on that.

Jim

troyid




msg:4067084
 5:14 am on Jan 25, 2010 (gmt 0)

Goodness know's how. Probably an external site. Anyway, I need the ?ID pages to 404.

g1smd




msg:4067113
 8:05 am on Jan 25, 2010 (gmt 0)

Does Google WebmasterTools hint where the link is?

Does the link send traffic? If it does, I might be tempted to 301 redirect to the correct URL.

troyid




msg:4067126
 8:27 am on Jan 25, 2010 (gmt 0)

No it doesn't send any traffic. I really want to 404 it.

g1smd




msg:4067183
 11:12 am on Jan 25, 2010 (gmt 0)

This should work:

RewriteCond %{QUERY_STRING} &?ID=462&?
RewriteRule ^directory/Oceania/Australia - [F]

If you need id and Id and iD to also fail, add [NC] to the condition.

If you want any and all ID numbers to fail, remove the 462&? part from the condition.

troyid




msg:4067445
 6:37 pm on Jan 25, 2010 (gmt 0)

Thanks. Works great!

troyid




msg:4068499
 10:22 pm on Jan 26, 2010 (gmt 0)

I would like to know what the rewrite rule would be if this was the situation.

www.domain.com/?ID=462

g1smd




msg:4068602
 12:10 am on Jan 27, 2010 (gmt 0)

Clears any ID appended to any path:

RewriteCond %{QUERY_STRING} &?ID=
RewriteRule .* - [F]

jdMorgan




msg:4068611
 12:27 am on Jan 27, 2010 (gmt 0)

Gentlemen,

I believe a 404-Not Found was requested, and both code samples return 403-Forbidden.

Jus' sayin'...

Jim

troyid




msg:4068695
 3:25 am on Jan 27, 2010 (gmt 0)

Thanks for pointing that out Jim. I was going to say something but thought a 403 was better than nothing. A 404 would be better though.

troyid




msg:4068702
 3:31 am on Jan 27, 2010 (gmt 0)

g1smd, I can't use the last rewrite rule you posted as I have script that appends ?ID= to the url in my /scripts/ folder. I need something that only works in the root folder.

jdMorgan




msg:4068715
 4:12 am on Jan 27, 2010 (gmt 0)

On Apache 1.x or 2.x

# Create 404 on all root folder requests with query strings
# appended by rewriting to a file that does not exist
RewriteCond %{QUERY_STRING} &?ID=
RewriteRule ^[^/]*$ /non-existent-file.hmtl [L]

On Apache 2.x, you can try this (untested) :

# Return 404 response on all root folder
# requests with query strings appended
RewriteCond %{QUERY_STRING} &?ID=
RewriteRule ^[^/]*$ - [R=404,L]

Jim

troyid




msg:4068718
 4:26 am on Jan 27, 2010 (gmt 0)

Jim, I tried the untested one and it works perfectly. Exactly what I wanted.

troyid




msg:4068720
 4:29 am on Jan 27, 2010 (gmt 0)

Jim, what would be the equivalent of the untested rewrite rule for?

RewriteCond %{QUERY_STRING} &?ID=462&?
RewriteRule ^directory/Oceania/Australia - [F]

jdMorgan




msg:4068724
 4:50 am on Jan 27, 2010 (gmt 0)

Just swap in the RewriteRule pattern.

You might do better by deciding exactly which directories should 404 if a query string is present, and which should not and then listing them. Then reduce these lists by removing all but the "common paths" from directories which should all be treated in common -- For example, if all subdirectories of "/directory/" should be 404'ed when a query string is present, then "directory/" is all you need to match in the pattern. Take the shorter or least-likely-to-change (should or should not 404) list and code for that.

It might be that simply excluding the specific path to your script directory, and perhaps the path to your Web-accessible "stats" directory would be an easy, compact solution. But you have to decide, as pretty much all hosts configure their servers differently.

If you read our Forum Charter [webmasterworld.com], you'll find that while we're happy to get you started with examples or to help you to fix a *difficult* bug, we cannot write your code for you. Please check out the documents cited in that charter, using them to decipher the examples above, and see if you can help us help you...

Thanks,
Jim

crobb305




msg:4068731
 5:27 am on Jan 27, 2010 (gmt 0)

While we are talking about query strings, I can access my index page using /?
I can also access my internal pages with /file.htm? (or /file.htm?blah)

I have tested what I thought would work, but I am having trouble getting there:

rewriteCond %{QUERY_STRING} .
rewriteRule (.*) http://www.example.com/$1? [R=301,L]

I know I don't have it right yet, and I am probably not putting it in the right order with respect to my other redirects. Any suggestions?

g1smd




msg:4068797
 9:34 am on Jan 27, 2010 (gmt 0)

404? 403? Pfft, blame it on a late night.

Thanks. Works great!

I was going to say something ...

You should have, then I would have picked up the error rightaway. :)

I had meant to post this:

RewriteCond %{QUERY_STRING} &?ID=
RewriteRule .* /does-not-exist [L]

but the conversation has moved on a bit since then.

jdMorgan




msg:4068996
 3:30 pm on Jan 27, 2010 (gmt 0)

On review, I think it would also be a good idea to clear the query string. I'm not sure, but we might get a loop if we don't:

On Apache 1.x or 2.x

# Create 404 on all root folder requests with query strings
# appended by rewriting to a file that does not exist
RewriteCond %{QUERY_STRING} &?ID=
RewriteRule ^[^/]*$ /non-existent-file.hmt[b]l?[/b] [L]

Since the documented behavior of Apache 2.x is to discard the substitution-path, it shouldn't be necessary to modify the Apache-2.x-specific code I posted above.

---

crobb305,

If you get 'naked question mark' requests, the code above won't work because the question mark is a delimiter between the URL-path (or optional fragment identifier/named-anchor) and the query string. Therefore it is not visible in either the URL-path examined by RewriteRule or the %{QUERY_STRING} variable.

So in order to 'see' it, we have to look at %{THE_REQUEST} :

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^?\ ]*\?[^\ ]*\ HTTP/
RewriteRule ^ /non-existent-file.hmtl? [L]

Note that in both snippets above, the question mark not followed by a query string at the end of the RewriteRule substitution path serves only as a mod_rewrite operator to clear the originally-requested query string. This question mark *will not* appear in the rewritten path.

Jim

jdMorgan




msg:4069007
 3:49 pm on Jan 27, 2010 (gmt 0)

Note also that the code I just posted is still somewhat specific to the case where you want to 404 the request. I generally recommend using a 301-Moved Permanently redirect to remove the spurious question mark or query string if the URL is otherwise "good."

# If a spurious query string delimiter and/or query string is appended
# to an otherwise-valid URL, externally redirect the request to strip
# off the query string delimiter and query (else just let it go 404).
RewriteCond $1 !^(forum/index\.php¦stats/)
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^?\ ]*\?[^\ ]*\ HTTP/
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

The RewriteRule pattern shown here matches any URL; It will be necessary to exclude all scripts (or their entire directories) if your site uses server-side scripts (including "stats" and/or a "control panel"). The first RewriteCond is just an example of such an exclusion, and the broken pipe "¦" character will need to be replaced with a solid pipe character if you use this RewriteCond.

Note that as discussed in several recent threads, file- and directory-exists checks should always be done last in order to avoid wasting a lot of server resources.

Jim

crobb305




msg:4105615
 7:31 am on Mar 27, 2010 (gmt 0)

jdMorgan,

Your last post seems to be working perfectly for cases where a spurious question mark occurs after or in the middle of an otherwise legitimate filepath.

For exmaple, it works perfectly for:
example.com/filename?blahblah -or-
example.com/filename?

But I am having problems getting the question mark removed when it occurs at the beginning of the filename as in:
example.com/?filename

In this case, it redirects to the homepage.

Do you have any suggestions? I have been spending some time trying to learn this, and searching around. My fear is adding code to strip that question mark, and messing up the portion of your code that is working perfectly.

I appreciate your help.

crobb305




msg:4105616
 7:32 am on Mar 27, 2010 (gmt 0)

jdMorgan,

Your last post seems to be working perfectly for cases where a spurious question mark occurs after or in the middle of an otherwise legitimate filepath.

For exmaple, it works perfectly for:
example.com/filename?blahblah -or-
example.com/filename?

But I am having problems getting the question mark removed when it occurs at the beginning of the filename as in:
example.com/?filename

In this case, it redirects to the homepage.

Do you have any suggestions? I have been spending some time trying to learn this, and searching around. My fear is adding code to strip that question mark, and messing up the portion of your code that is working perfectly.

Also, I want to verify that I have placed this code in the proper order relative to other rules. I have placed it at the end of all canonical redirects, non-www to www, etc.

This is such a huge undertaking for me, but I have learned a lot :)

I appreciate your help.

jdMorgan




msg:4105683
 12:06 pm on Mar 27, 2010 (gmt 0)

The code is correct as posted, since a request for "example.com/?<anything or nothing>" *is* a request for the "home page." If this "doesn't work" on your site, then the problem is one of site design.

Again, you'd do well to stop for a few days, think about this very carefully, and then make a list of all "bad" URLs and their desired dispositions. Only with a very-solid list of requirements can any correct coded solution be created, and lack of solid requirements leads to too-long threads like this one.

"Searching around" is fine, as long as what you are reading are mod_rewrite and regular-expressions tutorials and documentation (such as that cited in our Forum Charter). mod_rewrite code tends to be extremely case-specific, and therefore, you may search for years before finding that the on-line resource with a solution that matches your problem most closely is... this thread.

There really is no alternative but to learn to read and write the code yourself, and that is what this forum is really intended to help you to do.

Jim

crobb305




msg:4105781
 6:17 pm on Mar 27, 2010 (gmt 0)

Well by "searching around", I do mean in the mod_rewrite tutorials and on this forum. Certainly, I wouldn't use some random code from an untrusted resource. I am trying to be very careful and meticulous.

I guess the reason the example.com/?filename was bothering me is because it stems from a legitimate request for "filename". Someone, somewhere, is (or was) linking to me this way, and Googlebot is requesting it. I just thought there was a way to strip that out "?". The code you shared above is working perfectly. I have excluded all scripts and everything is working fine. I will just allow the /?filename to redirect to the homepage and not fret over it, considering that it is the last remaining "404" I have to cover (as showing up in Google Webmaster Tools)

By the way, I didn't realize that I somehow double posted last night (worked very late). It looks like I tried to edit a post to add a sentence and make corrections, but for some reason it reposted.

Thanks again Jim for all your help.

Chris

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved