Page is a not externally linkable
- Search Engines
-- Ask - Teoma
---- entire site "mis"-crawled with appended % 20 codes.


jdMorgan - 2:58 am on Dec 29, 2007 (gmt 0)


I use something similar to the following .htaccess code to prevent abuse on Apache servers, but it also seems to put Teoma back on track after one of its errant "%20" requests:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([/0-9a-z._\-]*)[^/0-9a-z._\-](\?[^\ ]*)?\ HTTP/ [NC]
RewriteCond %{DOCUMENT_ROOT}/%1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/%1 -d
RewriteRule .* http://www.example.com/%1 [R=301,L]

This basically allows the URL-path to contain only the characters 0-9, a-z, A-Z, periods, underscores and hyphens. If any other characters are found in the URL-path, then the URL-path is truncated at that point, and --if the resulting URL resolves to an existing file or a directory-- a 301-Moved Permanently redirect to that truncated URL is invoked.

The original query-string attached to the URL (if any) is retained.

If you modify the [groups] in the pattern above, make sure that they match exactly -- with the obvious exception of the "^" negation operator in the second group.

Jim


Thread source:: http://www.webmasterworld.com/ask_jeeves_teoma/3536339.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com