homepage Welcome to WebmasterWorld Guest from 54.196.62.132
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
How to Prevent Dynamic Content
how can i do this without causing conflict?
jmdb71




msg:1505208
 8:49 pm on Oct 18, 2005 (gmt 0)

Hi,

I use oscommerce for my store. I have a contribution which transforms the dynamic urls into static pages. this change takes place mainly in the .htacess file, which i have included below this message. However, these pages may be sorted by price, etc. and there are multiple pages, causing a sort on the page like thispage.html?page=2 or something like that.

I want to have all spiders revert back to the base url, for example - thispage.html - when encountering any page like "thispage.html?whatever"

I think i have the code correct for this in the htaccess, but am not sure if it will conflict with the code above it. i dont want to get caught in a loop, where google will not obey my original rewrites to the static pages.

Could someone check out my htaccess for me

Thank you so much in advance!

DirectoryIndex index.php
AddType text/html asp
ErrorDocument 404 /
ErrorDocument 403 /v-web/errdocs/403.html
ErrorDocument 401 /v-web/errdocs/401.html
ErrorDocument 500 /v-web/errdocs/500.html
ErrorDocument 400 /v-web/errdocs/400.html
Options +FollowSymLinks
RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteRule ^(.*) http://www.example.com/$1 [L,R=permanent]

#change dynamic pages to static html
RewriteRule ^(.*)-p-(.*).html$ product_info.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-c-(.*).html$ index.php?cPath=$2&%{QUERY_STRING}
RewriteRule ^(.*)-m-(.*).html$ index.php?manufacturers_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-pi-(.*).html$ popup_image.php?pID=$2&%{QUERY_STRING}
RewriteRule ^(.*)-t-(.*).html$ articles.php?tPath=$2&%{QUERY_STRING}
RewriteRule ^(.*)-a-(.*).html$ article_info.php?articles_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-pr-(.*).html$ product_reviews.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-pri-(.*).html$ product_reviews_info.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-i-(.*).html$ information.php?info_id=$2&%{QUERY_STRING}

# redirect search engine spider requests which include a query string to same url with blank query string
RewriteCond %{HTTP_USER_AGENT} ^FAST(-(Real)?WebCrawler/¶\ FirstPage\ retriever) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Gigabot/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot(-Image)?/[0-9]\.[0-9]{1,2} [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mediapartners-Google/[0-9]\.[0-9]{1,2} [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*(Ask\ Jeeves¶Slurp/¶ZealBot¶Zyborg/) [OR]
RewriteCond %{HTTP_USER_AGENT} ^msnbot/ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Overture-WebCrawler/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Robozilla/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^(Scooter/¶Scrubby/) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teoma
RewriteCond %{query_string} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

[edited by: jdMorgan at 11:50 pm (utc) on Oct. 18, 2005]
[edit reason] Example.com [/edit]

 

jdMorgan




msg:1505209
 3:50 pm on Oct 25, 2005 (gmt 0)

I'm not exactly sure which type of URLs you want to strip the query string from, but if the goal is to avoid having your dynamic URLs listed in the search engines, then the following code should work:

# change dynamic pages to static html
RewriteRule ^([^-]+)-p-([^.]+)\.html$ product_info.php?products_id=$2&%{QUERY_STRING} [L]
RewriteRule ^([^-]+)-c-([^.]+)\.html$ index.php?cPath=$2&%{QUERY_STRING} [L]
RewriteRule ^([^-]+)-m-([^.]+)\.html$ index.php?manufacturers_id=$2&%{QUERY_STRING} [L]
RewriteRule ^([^-]+)-pi-([^.]+)\.html$ popup_image.php?pID=$2&%{QUERY_STRING} [L]
RewriteRule ^([^-]+)-t-([^.]+)\.html$ articles.php?tPath=$2&%{QUERY_STRING} [L]
RewriteRule ^([^-]+)-a-([^.]+)\.html$ article_info.php?articles_id=$2&%{QUERY_STRING} [L]
RewriteRule ^([^-]+)-pr-([^.]+)\.html$ product_reviews.php?products_id=$2&%{QUERY_STRING} [L]
RewriteRule ^([^-]+)-pri-([^.]+)\.html$ product_reviews_info.php?products_id=$2&%{QUERY_STRING} [L]
RewriteRule ^([^-]+)-i-([^.]+)\.html$ information.php?info_id=$2&%{QUERY_STRING} [L]
#
# redirect search engine spider requests which include a query string to same url with blank query string
Rewritecond %{THE_REQUEST} ^[A-Z]{3,9}\ /(product_info¦index¦<...>¦product_reviews_info¦information)\.php\?[^\ ]+\ HTTP/
RewriteCond %{HTTP_USER_AGENT} ^FAST(-(Real)¦WebCrawler/¦\ FirstPage\ retriever) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Gigabot/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot(-Image)?/[0-9]\.[0-9]{1,2} [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mediapartners-Google/[0-9]\.[0-9]{1,2} [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*(Ask\ Jeeves¦Slurp/¦ZealBot¦Zyborg/) [OR]
RewriteCond %{HTTP_USER_AGENT} ^msnbot/ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Overture-WebCrawler/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Robozilla/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^(Scooter/¦Scrubby/) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teoma
RewriteRule ^([^.]+\.php)$ http://www.example.com/$1? [R=301,L]

I have changed the patterns in your first set of rules to use much-more-efficient negative-match patterns; this may speed up your server noticeably if you have a busy site. I also added the [L] flag to tell the rewrite engine that it can stop processing if the rule matches and immediately call your script(s). Neither of these changes is directly related to your question, they're just efficiency improvements for people who may see and try to copy this code later.

In the second ruleset, a method must be used that will discriminate between requested URLs and rewritten URLs, in order to prevent the second ruleset from rewriting URLs modified by the first ruleset. The key to this is the use of the variable {THE_REQUEST}, which always contains the original client (spider or browser) request. The pattern there should be the entire list of your php pages -- I left some out and replaced them with "<...>" to prevent the line from wrapping on the screen.

Replace all instances of the broken pipe "¦" character above with a solid pipe character before use. Posting on this forum modifies those characters.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved