Forum Moderators: phranque

Message Too Old, No Replies

htaccess stopping site from being crawled

         

karmargin

2:43 pm on Dec 7, 2006 (gmt 0)

10+ Year Member



does it make sense that my htaccess is stoping my site from being crawled?
This is my htaccess:
RedirectMatch 301 (.*)\.html$ http://www.example.com$1.php

I've changed the site to php so I needed an htaccess redirect. Since i've put that up, google webmaster tools says my robots.txt file is restricting a bunch of pages that's steadilly increasing.

this is what my robots.txt looks like:

# Disallows folders to be indexed by all robots

User-agent: *
Disallow: /css/
Disallow: /redirect/
Disallow: /waiver/

I've already lost all my ranks in google. Any idea's? Should i remove my htaccess file? All my results are now suplimental results. How can i get them back?

jdMorgan

3:49 pm on Dec 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> does it make sense that my htaccess is stopping my site from being crawled?

No, you've told the search engines to throw away all of your old .html URLs, and to use the .php URLs instead.

> I've changed the site to php so I needed an htaccess redirect.

This was the mistake. There was no need to change the URL just because the filenames changed. Given that you've redirected your whole site, it's no wonder your rankings are gone. URLs and filenames are two different things, and need have nothing in common. It is the server's job to translate URLs used on the Web to the filenames used by the server operating system's file-management routines.

From the robots.txt example you've posted, it does not appear to be a robots.txt problem at all, unless the pages now appear to reside in some or all of your disallowed directories.

It's probably too late to implement this, but the proper solution to a wholesale change in your site technology from .html to .php (or any other technology change) is to keep the old URLs and have the server do the translation of old URLs to new filenames using internal rewrites, not external redirects:


Options +FollowSymLinks -MultiViews
RewriteEngine on
#
RewriteRule ^(.*)\.html$ /$1.php [L]

Using this method, you continue to link to .html URLs, and the server passes those requests to the same-named .php files. In other words, your URLs stay the same, only the filenames change.

The question now is how much progress have the search engines in indexing your new URLs? If it's been weeks, since you implemented this change, then it's too late, and any changes you make now would only start the re-indexing clock all over again. If it's only been days, then there may be some hope.

For more information on the subject of URLs versus filenames, see this article [w3.org] by one of the inventors of the Web.

I hope this turns out OK for you.

Jim

karmargin

4:40 pm on Dec 7, 2006 (gmt 0)

10+ Year Member



Wow, thanks for the help. I've modified my htaccess so hopefully i can still salvage my site.

At least i still have msn ranks..lol

Cheers