Forum Moderators: phranque

Message Too Old, No Replies

404 errors in search engines.Mod rewrite help needed

use of mod rewrite generating 404 errors in search engine

         

cybereel

1:52 am on Nov 2, 2008 (gmt 0)

10+ Year Member



Hi guys. I have been doing some SEO work to my website over the past few months. I must say that I have come a long way, at least I would like to believe so, until I realised that google is not indexing my pages and returning 404 messages. It brought me to w3 validator, and to webmasters world where I have been reading several articles inluding [webmasterworld.com...] Excellent site guys. But I have been reading and trying several permutations of code but to no avail. I am only getting 50% of the way there.So I'm hoping that someone will help me get to 100% of the way. THis is my problem.

I have this file on my site /books/books.php

WHen you goto to the browser .../books/books.php you get the correct page.

But I want it to read /books/personal-development-bookstore/

When you type /books/personal-development-bookstore/ in the browser the correct page shows etc the site functions well with my static url.

It also performs well with my non-static url. However, when I try the site on w3 validator, there is a 404 error. When I look at live HTTP headers in firefox, I see a 404 but I am not sure what is taking place there.

WHat I want is for this link /books/personal-development-bookstore/ to not return 404 errors in the search engines. THe browser appears to work well...

This is my htaccess code
---------------
Options -Indexes +Includes +FollowSymLinks

RewriteEngine On
RewriteBase /books

RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteCond %{REQUEST_URI} ^($¦/.*$)
RewriteRule ^.* http://www.example.com%1 [R=permanent,L]
#
#Internally rewrite search engine friendly static URL to dynamic filepath.
RewriteRule ^personal-development-bookstore/$ books.php [L]
#
# Externally redirect client requests for old dynamic URLs to equivalent new static URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /books\.php\ HTTP/
RewriteRule ^books\.php$ http://www.example.com/books/personal-development-bookstore/ [R=301,L]
#
-------------------------
I read that all redirects should be placed before any internal rewrite. This throughs me in a loop.Any help will go a long way.

Once I get this basic right, I would adapt it with all the other releveant querystring variables which are on other pages.

Many upfront thanks. I would surely like to finish this element of this site and move onto my other projects.

Dale

[edited by: eelixduppy at 7:48 am (utc) on Nov. 2, 2008]
[edit reason] exemplified [/edit]

jdMorgan

2:46 pm on Nov 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Where is the .htaccess file with this code located -- at example.com/.htaccess, or somewhere else?
Why are you using RewriteBase? -- Please state the reason you had to use this directive.

Please describe, in detail, the operation of the "loop" you see when you put the redirects ahead of the rewrites. You can get the info we need from Live HTTP Headers, and include only requests/responses for the page -- We don't need all the details about the objects that the page includes right now. We need a basic event outline such as:

I request the URL example.com/xyz
I get an unexpected 301 redirect to example.com/abc!
Then I get rewritten to book.php (because I see the right page)
etc.

This is basically just the requested URL and the server response codes from the HTTP Headers tool, but it's important to include all those details -- the URLs, server response status codes, and noting expected/unexpected events, so that we can understand your problem.

Aside from the redirect/rewrite order, the exclusion RewriteCond in the domain redirect that doesn't really do anything, and a few very minor optimizations and corrections that are possible, this code actually looks good. It's very odd that you get a 404 status response along with the page content, and this may actually be a script problem, not an .htaccess problem. But we can sort that out later, once we've eliminated problems in the .htaccess file.

Jim

cybereel

5:01 pm on Nov 2, 2008 (gmt 0)

10+ Year Member



Thanks jd for the quick response.

The .htaccces is in a subfolder of the main directory known as /books/
My research and reading and following examples used RewriteBase.

In reviewing the HTTP headers, this stands out.
HTTP/1.x 404 Not Found
.....
X-Pingback: http://www.example.com/blog/xmlrpc.php
....

However, when the link is ./books/books.php, there is no such error. It reports OK.

A small update, I reversed the redirect/rewrite order and updated RewriteBase /books/ and I don't seem to have a loop problem.

jdMorgan

5:21 pm on Nov 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Rewritebase is only usually needed when the server configuration is non-standard... I'd almost like to say "when it's wrong," but that's an over-simplification.

Here's the code as I'd implement it, with a few fixes and optimizations:


Options -Indexes +Includes +FollowSymLinks
RewriteEngine on
#
# Externally redirect client requests for old dynamic URLs to equivalent new static URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /books\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^books\.php$ http://www.example.com/books/personal-development-bookstore/? [R=301,L]
#
# Final catch-all external redirect to canonicalize the hostname
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite search engine friendly static URL to dynamic filepath.
RewriteRule ^personal-development-bookstore/$ books.php [L]

This code includes all of the functions of the original, but fixes a few things. The biggest change is that it won't allow your hostname to be requested in mixed-case or uppercase; It will redirect to the all-lowercase version. It will also remove any query strings appended to the old dynamic URLs, if directly-requested by a client. Other than that, the other changes are just for coding efficiency.

Note that the domain canonicalization redirect, being the least-specific, is almost always the final redirect in a properly-ordered group of external redirect rules. That's a good rule of thumb: External redirects first, ordered from most-specific patterns to least-specific, followed by internal rewrites, again ordered from most- to least-specific.

---

I looked that your x-pingpback header snippet, but there's no necessary connection between that x-pingback header and the 404 response. We'll need to see the URL that was requested that resulted in the 404 response, and the x-pingback header may be irrelevant.

For each "log entry" in Live HTTP headers, the requested URL-path is the first line, and then you'll see the client request headers. After that come the server response headers, including the server status code. So we need to see those two items at minimum, and anything else you care to include. If in doubt, post the whole client-server header log entry for that 404 transaction, and I'll edit it if it's too long or contains sensitive info...

Jim

[edited by: jdMorgan at 5:47 pm (utc) on Nov. 2, 2008]

cybereel

8:12 pm on Nov 2, 2008 (gmt 0)

10+ Year Member



Thanks for the comments and improved code. This is the HTTP headers..

-----------
http://www.example.com/books/personal-development-bookstore/

GET /books/personal-development-bookstore/ HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.17) Gecko/20080829 Firefox/2.0.0.17
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.example.com/books/personal-development-bookstore/
Cookie: __utmz=76385800.1225589167.1.1.utmccn=(direct)¦utmcsr=(direct)¦utmcmd=(none); __utma=76385800.240774153.1225589167.1225637319.1225655932.4; __utmb=76385800; PHPSESSID=78206dc50f0012aa370106066bd51ebc; __utmc=76385800
If-Modified-Since: Sun, 02 Nov 2008 20:03:05 GMT

HTTP/1.x 404 Not Found
Date: Sun, 02 Nov 2008 20:07:22 GMT
Server: Apache/1.3.41 (Unix) mod_log_bytes/1.2 mod_bwlimited/1.4 mod_ssl/2.8.31 OpenSSL/0.9.8b
Cache-Control: no-cache, must-revalidate, max-age=0
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Pragma: no-cache
X-Pingback: http://www.example.com/blog/xmlrpc.php
X-Powered-By: PHP/5.2.6
Set-Cookie: bb2_screener_=1225656442+190.59.41.118; path=/blog/
Last-Modified: Sun, 02 Nov 2008 20:07:22 GMT
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
---------------------

jdMorgan

8:40 pm on Nov 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, I'm going to suggest that you move this code to example.com/.htaccess -- for two reasons:
First, it will simplify debugging, and second, you need the domain canonicalization (the second redirect rule) to be applied at the top level, or its benefit will be quite limited.

If the code is to work properly in example.com/.htaccess, it will have to be corrected and slightly modified:


Options -Indexes +Includes +FollowSymLinks
RewriteEngine on
#
# Externally redirect client requests for old dynamic URL to equivalent new static URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /books/books\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^books/books\.php$ http://www.example.com/books/personal-development-bookstore/? [R=301,L]
#
# Final catch-all external redirect to canonicalize the hostname
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite search engine friendly static URL to dynamic filepath.
RewriteRule ^books/personal-development-bookstore/$ /books/books.php [L]

Jim

cybereel

9:26 pm on Nov 2, 2008 (gmt 0)

10+ Year Member



Thanks for that timely response. After my last post i focused on the HTTP error. There seem to be an issue with pages outside of a wordpress blog (which these pages are). So I reconfigured the script, and all is now well. I did implement your previous code at the root and it wors well.. Seems faster as well.

Thanks a million jd.
Cybereel