Welcome to WebmasterWorld Guest from 34.225.194.144

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

odd htaccess 301 redirect request

google spidering "non-existant" pages

     
10:02 pm on Jun 4, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 31, 2004
posts:43
votes: 0


Hello,

I have an unusual problem with google spidering pages that technically exist but do not have content on them, just header/logo and footer.

Site is php/mysql driven and lists widgets by category, 10 to a page. It uses /category.php for 1st page of widgets, .php?offset=10 for 2nd page, .php?offset=20 for 3rd page, etc. Do to poor programming, urls with negative offsets (offset=-365) are being returned with no content and spidered by google.

I need a short term fix to 301 re-direct to the correct main category page for each of these negative offsets until I can get the php/db recoded properly. (I'm not a programmer.)

I've tried this which does not work:

redirect 301 /sub-dir/redwidget.php?offset=-325 h**p://www.domain.com/sub-dir/redwidget.php

This format is working successfully with other pages/directories.

I have:

Options +FollowSymLinks
RewriteEngine On

before this directive (and others which are working).

What I basically need to do is:

redirect: /sub-dir/redwidget.php?offset=any-negative-value

to

domain.com/sub-dir/redwidget.php

I've successfully done htaccess redirects for non-www to www, old directory names, to new ones, etc. but am stumped on this one.

Suggestions?

Thanks.

10:09 pm on June 4, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Query strings are not part of a URL; They are data *attached* to a URL to be passed to the resource (i.e. script) *at* that URL. Therefore, query strings are handled separately, and are not visible to mod_rewrite's RewriteRule or to mod_alias directives such as Redirect.

See RewriteCond [httpd.apache.org] directive used with %{QUERY_STRING} server variable.

Jim

6:43 pm on June 5, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 31, 2004
posts:43
votes: 0


Jim -

Thanks for the QUERY_STRING advice, had missed that, and for the apache link. I started at apache.org but unfortunately much of that doc is above my comprehension level so I'm back to WebmasterWorld.

From your QUERY_STRING hint I've made some progress, uncovered and fixed a problem, thought of a few more issues but haven't gotten it right yet.

First, I added: RewriteRule ^\.htaccess$ - [F] because the file was exposed. That part works.

Then to address initial problem I picked up part of your regex example from forum92/830.htm post 8 and edited it. I also came to the conclusion that I should be showing google a true 404 (not custom 404 page) rather than a 301.

I've edited your regex example to the following:

RewriteCond %{QUERY_STRING}!^offset=([0-9] {1,4})$ (there's a space before the!)

I don't want negative offsets spidered and want a 404 returned, so above =

offset not positive integer, total offset length up to 4

so, rewrite negative offsets to 404:

RewriteRule ^$ [R=404]

I've tested various combinations of above to no avail. Also tested patterns like: ^\/subdir\/greenwidget\.php?offset=\-$ which also doesn't work and and then realized I'd have to do this for all widget categories anyway. Not really sure which way to go next. The part of the htaccess file addressing this issue is:

Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteRule ^\.htaccess$ - [F]
#RewriteCond %{QUERY_STRING} (SPACE)!^offset=([1-9] {1,4})$
#RewriteRule ^$ [R=404]

# return a 404 code, not custom 404 error page
# above for any offset not starting from positive 1 to 9
# for all red, blue, green, etc. widget.php files
#http://www.domain.com/subdir/redwidgit.php?offset=negative number

The pattern string may be correct but the 404 rewrite rule seems to kill the whole site so it's remmed out for now. Am I even close?

Any further advice appreciated.

Thanks,

Jim

7:21 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Yes, the best advice is, don't invent stuff that is not documented, for example "[R=404]." mod_rewrite isn't a general programming language, and it's noticeably lacking in any general "flexibility" of directives, flags, or syntax; Basically, your code has to be exactly right in order to work.

The closest you can come to a [R=404] is to use 410-Gone which is even more specific than 404-Not Found.

Also, it seems to me that you're over-thinking this problem, in that it doesn't matter at all what URL is requested; negative offset queries aren't acceptable no matter what the URL is. So, simplifying, you could use something like:


RewriteCond %{QUERY_STRING} ^offset=[^0-9]
RewriteRule .* - [G]

That will reject any request having a query string starting with "offset=" followed by anything except a digit 0-9. What follows after that need not be, and is not, specified; Anything there but a digit will trigger rejection.

There is no difference in server response signalling between a "custom 404" and a "server 404." However, a common error is that Webmasters will use a canonical URL in an ErrorDocument directive instead of the (documented) local URL-path, and that creates a 302-Found response instead of the desired 404. It's clearly documented, but...

Jim

10:55 am on June 6, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 31, 2004
posts:43
votes: 0


Jim -

Installed your tip and it's working fine -

RewriteCond %{QUERY_STRING} ^offset=[^0-9]
RewriteRule .* - [G]

So simple once you see it.

Thanks again for the advice.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members