Forum Moderators: phranque

Message Too Old, No Replies

Lots of 404 errors in google log

         

Tim_Mousel

7:09 am on May 18, 2009 (gmt 0)

10+ Year Member



Hi,

In google webmaster tools, the following gives a 404 error:

http://www.example.com/review/review-item/2016.php?PHPSESSID=84af3787252c1e92e458f854120fd9bc

When I paste that link into the browser it works.

Something must be wrong with the code in my .htaccess file but I have no idea what.

Any ideas?

Thanks in advance!

IndexIgnore */*
Options +FollowSymlinks

RewriteEngine On

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f

RewriteRule ^review-item/(.*).php index2.php?item_id=$1 [QSA,L]
RewriteRule ^review-category/(.*).php review_categories_yahoo_cats2.php?category=$1 [QSA,L]
RewriteRule ^reviewer/(.*).php reviewer_about.php?username=$1 [QSA,L]
RewriteRule ^comments/review_comments/(.*)/(.*).php comments/review_comments.php?item_id=$1&review_id=$2 [QSA,L]

RewriteRule ^(.+)\.html$ $1.php [QSA,L]

RewriteCond %{HTTP_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSNBOT" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "teoma" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ia_archiver" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Scooter" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mercator" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "FAST" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MantraAgent" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Lycos" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ZyBorg" [NC]
RewriteCond %{QUERY_STRING} PHPSESSID
RewriteRule ^(.*)$ $1? [L,R=301]

# Skip the next two rewriterules if NOT a spider
RewriteCond %{HTTP_USER_AGENT}!(msnbot¦slurp¦googlebot) [NC]
RewriteRule .* - [S=2]
#
# case: leading and trailing parameters
RewriteCond %{QUERY_STRING} ^(.+)&PHPSESSID=[0-9a-z]+&(.+)$ [NC]
RewriteRule (.*) $1?%1&%2 [R=301,L]
#
# case: leading-only, trailing-only or no additional parameters
RewriteCond %{QUERY_STRING} ^(.+)&PHPSESSID=[0-9a-z]+$¦^PHPSESSID=[0-9a-z]+&?(.*)$ [NC]
RewriteRule (.*) $1?%1 [R=301,L]

[edited by: jdMorgan at 1:43 pm (utc) on May 18, 2009]
[edit reason] example.com [/edit]

jdMorgan

6:15 pm on May 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm hoping someone with a lot of time can help you with this. Unfortunately, the code has dozens of errors in it at various "levels," and will require a lot of work -- both to understand what was intended and to correct the problems. So that's not highly likely, and so this will require you to do most of this work. A good place to start would be with the resources cited in our Forum Charter.

Jim

Tim_Mousel

6:35 pm on May 19, 2009 (gmt 0)

10+ Year Member



Hi,

Thanks for the reply.

I'm trying to hide the PHPSESSID in the links while rewriting the following:

RewriteRule ^review-item/(.*).php index2.php?item_id=$1 [QSA,L]
RewriteRule ^review-category/(.*).php review_categories_yahoo_cats2.php?category=$1 [QSA,L]
RewriteRule ^reviewer/(.*).php reviewer_about.php?username=$1 [QSA,L]
RewriteRule ^comments/review_comments/(.*)/(.*).php comments/review_comments.php?item_id=$1&review_id=$2 [QSA,L]

I copied the other code from various places in a misguided attempt at this.

If I just use the following, would i at least avoid the errors?

IndexIgnore */*
Options +FollowSymlinks

RewriteEngine On

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f

RewriteRule ^review-item/(.*).php index2.php?item_id=$1 [QSA,L]
RewriteRule ^review-category/(.*).php review_categories_yahoo_cats2.php?category=$1 [QSA,L]
RewriteRule ^reviewer/(.*).php reviewer_about.php?username=$1 [QSA,L]
RewriteRule ^comments/review_comments/(.*)/(.*).php comments/review_comments.php?item_id=$1&review_id=$2 [QSA,L]

RewriteRule ^(.+)\.html$ $1.php [QSA,L]

Thanks,

Tim

jdMorgan

9:58 pm on May 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are your three RewriteConds intended to apply to all of the following rules? They don't. RewriteConds only apply to the single RewriteRule that follows them.

Testing SCRIPT_FILENAME and REQUEST_FILENAME for the same condition is redundant -- The two server variable names are synonymous.

The "(.*)\.php$" construct is inefficient. Try using "([^.]+)\.php$ instead. The same applies to "(.*)\.html$".

The "(.*)/(.*)\.php$" construct is dozens or even hundreds times worse, as it will force dozens to hundreds of "back off and re-try" attempts before the matching engine can find a match. Try "([^/]+)/([^.]+)\.php$" instead.

In both cases, the function will be the same, but execution will be much faster, as the more-efficient patterns allow a match to be made (or not made) in a single left-to-right pass.

If you want the RewriteConds to apply to the four or five rules which follow, you have a couple of choices. After removing the redundant test of SCRIPT_FILENAME, reproduce the two RewriteConds for each of the four or five following rules. This isn't as inefficient as it may appear, because RewriteConds are not processed at all unless the RewriteRule pattern matches (see Apache mod_rewrite documentation). The other choice is to use a 'skip rule' and skip the following four or five rules if the required conditions are NOT met. Example:


RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [S=4]

This will skip the following four rules if the requested URL resolves to a physically-existing file or directory.

I think you can see why anyone would be hesitant to tackle the original large chunk of code -- There are simply too many problems with it for the volunteer staff and contributing members here to take on. If it's not already clear, it's not really possible to cut-and-paste .htaccess code without understanding it. Doing so is even dangerous, in that (if you are lucky) it can crash your server. If you are not so lucky, a small typo, bug, or oversight can silently eat away at your search rankings over time, possibly without leaving any hints as to the problem(s) in reports like your stats and Webmaster Tools reports. So, the important thing to realize is that despite the facts that it's obscure and difficult, and that you rarely have to 'adjust' it, this is server configuration code and you'd do well to invest the time to thoroughly understand it before risking your entire site.

Perhaps the easiest way to do this is to print out the Apache mod_rewrite documentation, and then go through the code one character at a time, and do not proceed until every charcters role and purpose is clear. Then back off and work through the thought experiment of "What happens when I request a URL that matches any one of these rules? And if it gets redirected or rewritten, then what happens. Make sure that you are happy with the answers after following each scenario through to the point where content is delivered.

And despite the fact that I must recuse myself whenever I post this statement in order to avoid a conflict of interest, if you don't have the time or inclination to research all of this, then it might be a good idea to hire a con$ultant.

Our Forum Charter contains links to useful and relevant resources, and there are some example threads in our Forum Library.

Jim

Tim_Mousel

5:36 am on May 20, 2009 (gmt 0)

10+ Year Member



Thank you very much Jim for your very informative answer. I'll take your advice....

Thanks again,

Tim

g1smd

10:21 am on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some extra comments...

When you do a 301 redirect you need to include the domain name in the target URL.

Make sure you place redirects so that they are listed before rewrites.

*** I'm trying to hide the PHPSESSID in the links while rewriting the following ***

Note that .htaccess code does not change 'links'. You change links by editing the HTML code on the page.

Tim_Mousel

4:37 pm on May 21, 2009 (gmt 0)

10+ Year Member



Thanks for the comments g1smd.

jdMorgan

6:59 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In order to simulate what a search spider would ask for, try the URL you requested (in your first post above), but omit the session ID from the query string. You can also use a "User Agent Switcher" add-on for Firefox or an online service like Wannabrowser to "spoof" Googlebot and other robot requests (copy a valid robot user-agent string from your raw server access logs into the spoofing tool).

Jim