Forum Moderators: Robert Charlton & goodroi
to put it in a different perspective, if your URL can be dynamically changed to another URL which does not reflect the actual keyword you programmed for your URL and still resolves as a 200 header found, you've got a problem.
[edited by: tedster at 2:56 pm (utc) on May 8, 2010]
[edited by: tedster at 2:43 pm (utc) on May 8, 2010]
[edit reason] switch to example.com (just one spot) [/edit]
Again if I understand Dusky correctly he is saying that if using that url the page resolves and doesnt throw up a 404 page not found error, then potentially you have duplicate content issues.
[edited by: tedster at 3:04 pm (utc) on May 8, 2010]
Not quite, example.com/keyword-here/?q=spam-keyword should be example.com/keyword-here. When you have the problem is when someone pastes example.com/keyword-here?spam-here and they get the page example.com/keyword-here BUT with url in the address bar still as example.com/keyword-here?spam-here
Also, if you are seeing bogus backlinks such as example.com/keyword-here?q=spam-keyword a question that should come up is "Why would someone do that?"
Especially if you are using a common CMS like Wordpress, Joomla, PHPNuke etc, {but really, in anmy case) there is a chance that you've been hacked. In other words, your page may be hosting parasite links that are cloaked so only googlebot sees them. The hacker/spammer would be creating those backlinks trying to build some ranking power for your page - so that THEIR parasite links gain ranking power.
You can use the "fetch as googlebot" utility in WebmasterTools to quickly check if this is the caase.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /\?(.*)\ HTTP/ [NC]
RewriteRule ^/?$ /404\.shtml? [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Za-z]{3,9}\ /(.*)\.html\?(.*)\ HTTP/ [NC]
RewriteRule ^index\.php$ /%1\.html? [R=301,L]
See the bolded index.php, that's when your urls are as example.com/index.php&name=blabla&whatever=blabla
If they are as example.com/modules.php&name=blabla&whatever=blabla change it to modules.php.
The first two lines say redirect to a 404 custom error page 404.shtml
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^\?]+)?\?([^\ ]+)\ HTTP/ [NC]
RewriteRule .* /path-does-not-exist [L] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /\?([^\ ]+)\ HTTP/ [NC]
RewriteRule !. /path-does-not-exist [L] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*([^.]+)\.html\?([^\ ]+)\ HTTP/ [NC]
RewriteRule ^(([^/]+/)*)([^.]+)\.html$ http://www.example.com/$1$2.html? [R=301,L] I get you, but the 404.shtml page is a custom 404 error page and returns a 404 error not found header if done right
example.com/404.shtml may very well return a 404 status code, but the very important point here is that example.com/filename.html?duff-url-request does not return a 404 response. It returns a 301 redirect to a different URL. The 301 redirect is not a 404 status. The browser isn't told that the URL does not exist. Instead it is told to make a new request for a different URL. When it does so, it is then told that this new URL does not exist. The index.php thing, yes I explained that and said if you have index.php as the URL constructor. What I posted above works OK
[edited by: g1smd at 10:42 pm (utc) on May 8, 2010]
Just a word of caution. Some people are tempted to preserve the link juice coming from such spam backlinks, and that's why they use a 301 to the same URL without the query string.
The original rule could never work. It required the filename in the RewriteRule pattern to match index.php but needed the filename in the RewriteCond pattern to match index.html and it could never match both requirements at the same time.
[A-Z] when used with the [NC] flag is processed twice as fast as using the [A-Za-z] pattern. Note that /path-does-not-exist would redirect to the custom 404 error page if you have one anyway!
The 301 redirect is not a 404 status. The browser isn't told that the URL does not exist.
[edited by: tedster at 11:03 pm (utc) on May 8, 2010]
The easiest way out is often a 301 redirect to the base URL with a 200 OK, and then just accept the spammy link juice if Google actually sends it through.
The important thing, in my opinion, is NOT to serve the content of /keyword-title/ when /keyword-title/?q=spam-keyword is requested.
hack attack / injection attempt
you can rewrite to URLs containing a query_string
URLs and filepaths are not at all the same thing.
RewriteCond %{THE_REQUEST} ^[A-Za-z]{3,9}\ /(.*)\.html\?(.*)\ HTTP/ [NC]
RewriteRule ^index\.php$ /%1\.html? [R=301,L]