Forum Moderators: phranque

Message Too Old, No Replies

Search Engine Optimization with mod_rewrite - problems

Google's still catching dynamic urls

         

cygnus

4:13 pm on Aug 24, 2005 (gmt 0)

10+ Year Member



Hi!

I made a website a while ago which didn't have much google succes. It contained a lot of galleries stored in a mysql-db, which could be requested via \index.php?gid=xx. Almost everything went through this script. It quickly became clear to me that these dynamic urls weren't exaclty "search engine"-friendly. The only thing I saw in the google search results is the main list of my site.

So I checked out mod_rewrite, with succes, so I thought. This is my short but effective .htaccess-file :


RewriteEngine On
RewriteRule search/(.*) /search.php?search=$1
RewriteRule gallery/(.*) /index.php?gid=$1

So /index.php?gid=200 would become /gallery/200 for the users and the spiders. I changed all my links to these new urls, so they would be used by everyone. It works for the users, but google still seems to be seeing them as dynamic urls. At the moment my main index (/) is being indexed like it's supposed to, but ALL my galleries appear like this :


(my url)/index.php?gid=(number)
Similar pages

... instead of the title of the specific gallery (and /gallery/(number) as the url). So Google somehow seems to be ignoring my mod_rewrite-rules and somehow it found my old urls (/index.php?gid=(number)) somewhere, although I don't use these anymore. I normally see only 1 seemingly uncrawled gallery url in the google search results, but when I click "repeat the search with the omitted results included" at the bottom, I get hundreds of uncrawled urls more.

Does anyone know what I'm doing wrong?
Some other facts that might be important :
- I'm using a free redirection subdomain from no-ip.com, which redirects to the subdomain of my free hosting provider.
- I wrote .htaccess almost 2 weeks ago and google's spiders visited my site several hundred times since then.

jd01

7:19 pm on Aug 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteEngine On
RewriteRule search/(.*) /search.php?search=$1
RewriteRule gallery/(.*) /index.php?gid=$1

This rewrites one way: the 'friendly' location is served the information from the 'real' location.

To keep people and spiders out of the real location, you must rewrite original requests for the 'real' location to the friendly location - otherwise both can be accessed.

The only way to accomplish this is through the use of THE_REQUEST.

RewriteRule search/(.*) /search.php?search=$1
RewriteRule gallery/(.*) /index.php?gid=$1

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /search\.php\?search=(.*)\ HTTP/
RewriteRule ^search\.php$ http://yoursite.com/search/$1

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /gallery\.php\?gallery=(.*)\ HTTP/
RewriteRule ^gallery\.php$ http://yoursite.com/gallery/$1

Hope this helps.

Justin

jdMorgan

9:23 pm on Aug 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Better stick an [R=301,L] flag on each of those two new rules... You don't want a 302 redirect and all the trouble that can bring.

Jim

jd01

9:29 pm on Aug 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Uh, yeah, that's what I meant - Thanks Jim.

Justin

cygnus

10:07 pm on Aug 24, 2005 (gmt 0)

10+ Year Member



Thanks for your help, jdMorgan and jd01, the explanation sounds logical. I updated my .htaccess and I'm now waiting for the googlebot to arrive :)

jdMorgan

11:08 pm on Aug 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can (and should) test it yourself. Just type the "unfriendly" URL into your browser, and you should see it redirect to the "friendly" URL (the browser address bar should update).

Jim

cygnus

1:31 am on Aug 25, 2005 (gmt 0)

10+ Year Member



Hmmm, it's good that you say that, jdMorgan, because something still seems wrong with my .htaccess.

Currently I have the following code (part that converts search-urls to friendly urls is left out at the moment):


RewriteEngine On
RewriteRule search/(.*) /search.php?search=$1
RewriteRule gallery/(.*) /index.php?gid=$1
RewriteRule out/(.*) /out.php?url=$1

Rewritecond %{the_request} ^[A-Z]{3,9}\ /index\.php\?gid=(.*)\ HTTP/
Rewriterule ^index\.php$ http://mysite.com/gallery/$1 [R=301,L]

When I go to [mysite.com...] I'm redirected to [mysite.com...] instead of [mysite.com...] I'm thinking it's the $1 at the end, but I tried a lot of things (%1,%2,$2,...) and I can't get it to work.

Can you help me?

jdMorgan

2:05 am on Aug 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Missing question mark in the new URL:

Rewritecond %{[b]THE_REQUEST[/b]} ^[A-Z]{3,9}\ /index\.php\?gid=[b]([^\ ]+)[/b]\ HTTP/
Rewriterule ^index\.php$ http://mysite.com/gallery/[b]$1?[/b] [R=301,L]

Also, corrected case of THE_REQUEST and improved the pattern for the gid value.

Jim

cygnus

11:10 am on Aug 25, 2005 (gmt 0)

10+ Year Member



Ok, now it seems to be working, I'm getting redirected correctly. Had to change $1? to %1? in the final url though...

jdMorgan

8:25 pm on Aug 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, my mistake. I meant to bold (highlight) the question mark, and accidentally change % to $ while doing so.

Jim

cygnus

10:52 pm on Aug 29, 2005 (gmt 0)

10+ Year Member



Meanwhile Googlebot has visited my site several times, but I still see the same old uncrawled searchresults when I search for my site.

Am I too impatient?

jdMorgan

12:18 am on Aug 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd guess 90 days. After that, things should improve.

Jim

cygnus

1:01 am on Aug 30, 2005 (gmt 0)

10+ Year Member



Lol :)
Ok, guess I'll have to wait a little longer then :)

Thanks again for the help!

jdMorgan

1:12 am on Aug 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Now you've got lots of time to get some additional incoming links -- to the 'pretty' page URLs. :)

Jim