Forum Moderators: phranque
www.example.com/forum/
I blocked one file so far, which is /forum/index.php, but invision board shows that Google is still spidering all my dynamic links. I don't want this. I want only the html pages spidered. How do I know which files to block?
There are 2 files that came with the mod, and both are stored in this directory: www.example.com/forum/
There is an htaccess file, and a php file....so does this mean I should block ALL files except the 2 mod files? Can someone help me figure out which files to block in the robots.txt. I'd be willing to write out all the directory folders and files if someone thinks they can help me. Thanks!
RewriteCond {THE_REQUEST} ^[A-Z]+\ /path_to_dynamic_pages\.php
RewriteRule ^path_to_dynamic_pages.php$ http://example.com/path_to_static_pages.html [R=301,L]
The use of {THE_REQUEST} prevents the redirect from taking place if the original HTTP request was for a static URL, but was rewritten to a dynamic URL by your existing rule.
For reference, the value of {THE_REQUEST} is exactly what you see in your raw access logs, something like
GET /widgets.php?color=blue HTTP/1.1
-------------code-------------------
RewriteEngine On
# DO THE TOPIC URLS
RewriteRule ^(.*)-t([0-9]*)-s([0-9]*)\.html(.*)$ index.php?showtopic=$2&st=$3
RewriteRule ^(.*)-t([0-9]*)\.html(.*)$ index.php?showtopic=$2$3
--------------end of code--------------------
So this changes all topic urls to html. However, the coder forgot to 301 redirect the urls! How would I integrate the 301's into this code?
Note above that I said you need to add additional rules to 301-redirect client-requested dynamic URLs to the static ones. So you need to add to what you have, not modify it or replaces it.
You'll need to do the 'reverse transform function' of the patterns in the rules you already have, and plug that into the example code I posted. Post your best effort and we can help.
The major problem I see is that it may be impossible to determine the part of the .html filepath that precedes "-t" from the information in the dynamic URL. That is a problem you'll need to solve at the page-naming level before implementing the redirect code. In order to make this work, the static and dynamic URLs must contain the same information; It can be in a different form, but each must contain all the information needed to unambiguously reconstruct the other. Maybe this is not a problem; If any dynamic URL that contains "-t" maps to a static URL that begins with "topic" then there is no problem, but I can't tell from the examples posted here.
Jim
If any dynamic URL that contains "-t" maps to a static URL that begins with "topic" then there is no problem, but I can't tell from the examples posted here.
An example of how the -t works:
The forum is labeled "Music" so thats tagged with -f htmls. So the html will look like music-f32.html
Lets say the topics in the forum are labeled as follows:
Mikes Piano Composition
Chopin's sonata
Listen to my music now
The html's will look like this:
Mikes-Piano-Composition-t45.html
Chopins-sonata-t23.html
Listen-to-my-music-now-t345.html
Now the dynamics may look like this:
index.php?showtopic=214
The 214 would be what would follow the -t. (example-t214.html)
Everything follows this format. Everything works beautifully, except now I have duplicate content. I don't know if this helps, but let me know what else I can do. Do you need more code?
PS. I sent you the URL of my site if it would help.
A mod_rewrite solution only works with static and dynamic URLs that mirror each other, so that each type of URL can be constructed, given only the other. The static and dynamic URLs must be 'symmetrical' -- easy to convert from one to the other and back again with no loss of information.
So the problem is that given "Chopins-sonata-t23.html" mod_rewrite can correctly produce "index.php?showtopic=t23".
But given only "index.php?showtopic=t23" mod_rewrite cannot produce "Chopins-sonata-t23.html"; The "Chopins-sonata" information is lost.
The simplest answer may be to move the dynamic-to-static 301-redirect function into php, where you can look up the topic number (t23) in your database, produce the full static URL (Chopins-sonata-t23.html), and then issue a 301-redirect from within PHP. You will still need to use the server variable {THE_REQUEST} and pass that to your php script, in order to avoid the rewrite-redirect loop problem. From your posted examples, it looks like you can't do it with a pure mod_rewrite approach.
Jim
This is what I am thinking to put in the htaccess file for the topics section, according to what you said:
------------htaccess code--------------------------
#Do the topics
RewriteCond {THE_REQUEST} ^[A-Z]+\ /index.php?showtopic=$2$3\.php
RewriteRule ^(.*)-t([0-9]*)-s([0-9]*)\.html(.*)$ index.php?showtopic=$2&st=$3 [R=301,L]
RewriteRule ^(.*)-t([0-9]*)\.html(.*)$ index.php?showtopic=$2$3 [R=301,L]
-------------------end-----------------
Along with this code, the corresponding php code exists...and this is it:
------------php code---------------------------
// Do the topics
$ibforums->skin['_wrapper'] = preg_replace("#index.php\?showtopic=([0-9]*)\"#ie","\$FURL->create_topic_url('\\1')", $ibforums->skin['_wrapper'],1);
$ibforums->skin['_wrapper'] = preg_replace("#index.php\?showtopic=([0-9]*)&hl=\"#ie","\$FURL->create_topic_url('\\1')", $ibforums->skin['_wrapper'],1);
$ibforums->skin['_wrapper'] = preg_replace("#index.php\?showtopic=([0-9]*)&st=([0-9]*)\"#ie","\$FURL->create_topic_url('\\1','\\2')", $ibforums->skin['_wrapper'],1);
-------------------end----------------------------
Now, from what you say, I have to modify these php lines in order to make the 301's work?
This is a slight variation of what I cam up with over in the Yahoo forum that appeared to accomplish the goal and be more efficient than the code that was being presented - By removing the (.*) catch all from the beginning of the rule, we have the ability to only verify the end of the rule for a match - we aren't passing the beginning anyway.
I had not seen the other letters (EG -f for forum) involved, so my example was not quite accurate, but I think we could accomplish the same thing in a better way for the future growth of the board:
RewriteRule ([a-z]{1})([0-9]+)\.html$ /index.php?showtopic=$2 [L]
Added: Obviously won't work the way it is, but wanted to let you know the direction I was going.
We could also use the same rules, and just remove the hard line beginning and the 'catch-all' - I am not sure why the (.*) is after the .html, but if the URL's end in .html this one is also unnecessary. I have removed the R=301 flag from below, since our goal is static and it was saying the static URL had permanently moved to the dynamic version.
RewriteRule -t([0-9]+)-s([0-9]+)\.html(.*)$ index.php?showtopic=$2&st=$3 [L]
RewriteRule -t([0-9]+)\.html(.*)$ index.php?showtopic=$2$3 [L]
Chopin - Sorry, I have been so long in getting back to you - I tried to think of a solution that you could implement without a great deal of work and learning a new language or two, but the only thing I could come up with is Jim's idea - do it in the php or you really can't do it...
My suggestion is this:
1. Make sure every link points to the static version of the page.
2. Double check and make sure all rewrites to static URL's are working.
3. Deny access to the php file with a query string using:
RewriteCond %{THE_REQUEST} index\.php\? [NC]
RewriteRule \.php$ - [F]
This will deny access to any file ending in .php, followed by a? character, so you will be able to access the file index.php, but if the is a? the access will be denied.
Since the forum is new and there probably are not too many URL's I think it's best to just send anyone who would like to access them away.
If you do not want to block access, you could always redirect back to the index, not ideal, but it would send the spiders back through the site instead of sending them away.
RewriteCond %{THE_REQUEST} index\.php\? [NC]
RewriteRule \.php$ /index.php [R=301,L]
Justin