Forum Moderators: phranque

Message Too Old, No Replies

Redirect Bad Extensions

301 .jpgamp incoming 404's to .jpg

         

kidcobra

4:02 pm on Nov 24, 2010 (gmt 0)

10+ Year Member



I have 404 errors in my log from googlebot that are caused by image URL's that have amp appended to the URL for example:

http://example.com/2008/12/picture.jpgamp

I tried to figure out where google is getting these from, but cannot, maybe some badly programmed scarfing site trying to show the photos.

Anyway, this is my best htaccess shot at a redirect, but being an amateur and not wanting to wreck my entire site and not having any handle on possible unintended consequences, I figured to ask here if it's workable and a clean solution. Any help would be greatly appreciated.


RedirectMatch (.*)\.jpgamp$ http://example.com$1.jpg

Greg

g1smd

9:20 pm on Nov 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



While RedirectMatch would work, I would use something like:

RewriteRule ^([^.]+)\.jpgamp$ http://www.example.com/$1.jpg [R=301,L]

kidcobra

10:10 pm on Nov 24, 2010 (gmt 0)

10+ Year Member



Hi g1smd, I went with your code and it works like a charm. I really appreciate you taking the time to answer my question and give me your expert advice. This is the second time you've helped me (the first was maybe 8 months ago), and I have better website because of it.

All the best,

Greg

g1smd

10:35 pm on Nov 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No problem. Threads where you post what you have already tried, are always the easiest to answer.

Make sure this redirect is ahead of any non-www/www canonical redirect code.


The reason for using RewriteRule rather than Redirect/RedirectMatch is simply making the code future-proof. If you ever need to incorporate a "rewrite" within the site, you'll have to use a RewriteRule for that. Since it's not a good idea to mix Redirect/RedirectMatch and RewriteRule directives, start by using RewriteRule for everything right from the beginning.

kidcobra

12:05 am on Nov 25, 2010 (gmt 0)

10+ Year Member



Hi. I did this rewrite without the www in front of example.com, since canonical redirects all go to the non www version. Our htaccess file is configured as follows in the order listed:

error docs

canonical rewrites

a rewrite to shorten a group of long url's and get rid of a paramater at the end (you helped me with this)

a group of 7 straight 301's to pick up individual single url issues

two more rewrites (including the big one to handle incoming % enocoded URL's after the ? and convert them to unencoded urls that match the ones we have

then I had put this one we just talked about

a rewrite to ban incoming from particular referrers

and then the usual order allow,deny to ban particular specific IP addresses.

So we don't have redirect match, but we do have the 7 specific one line single url redirects.

Two questions: First, is that mix a problem? And second, based on your last comment, I should move this rewrite to just after the canonical? Greg

jdMorgan

5:18 pm on Dec 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should arrange your rules so that the redirects are all first, in order from most-specific URLs and conditions to least-specific. Internal rewrites should follow, again in order from most- to least-specific.

Only the rewrite rules' order really matters, because code in your file is executed in per-module order, and not in the order that you write it. That is, each Apache module in turn reads your code and processes only those directives that it understands. And the module-execution order is determined by the server, not by your code. So it makes no difference if you put (for example) mod_access code (Allow, Deny) before or after mod_rewrite code -- The server will execute the code in the same order either way.

So, only the order of your directives handled by the same module make any difference.

I suggest this order for your rewriterules -- for many, many reasons:

Access control first - There is no reason to waste time redirecting unwelcome requests:
  • rule to deny incoming requests from particular referrers

    External URL-to-URL redirects
  • redirects to pick up individual single-url issues (most-specific redirects)
  • redirect to correct ".jpgamp" problem caused by bad-HTML links
  • canonical redirects (least-specific redirect)

    Internal URL-to-filepath rewrites
    (Assuming that these really are internal rewrites with no protocol, hostname, or "R=" flag specified)
  • internal rewrite to shorten a group of long url's and get rid of a parameter at the end
  • internal rewrites (including the big one to handle incoming % encoded URLs after the ? and convert them to unencoded urls that match the ones we have

    Jim
  • kidcobra

    6:21 pm on Dec 1, 2010 (gmt 0)

    10+ Year Member



    Thanks for the direction. I will get it organized in the proper way after getting the specific URL redirects moved to rewrites and come back with a post after it's done.

    Greg