Forum Moderators: goodroi
I have a site that uses Mod Rewrite in all of its urls.
I have a very large set of urls that I don't want them to be visited by bots anymore.
These urls are something like this:
/dir1/blah/blah/blah
/dir2/example/blah/
/dir3/etc/etc/etc
and so on.
In my robots.txt I disallowed robots from visiting
/dir1/
/dir2/
/dir3/
but robots keep on visiting all urls. So it seems that they don't consider those rewritten directories as real. Of course they don't exist. The real urls are something like this:
index.php?var1=blah&var2=etc&var3=whatever
I checked the robots.txt with a validator and everything was ok.
How should I proceed with Mod rewritten urls?
Thanks in advance,
Enrique
Nowhere above have I mentioned filenames. Robots don't see or know about filenames. They only use URLs. So URL-to-filename translations done in mod_rewrite have nothing to do with robots or with robots.txt. The only time mod_rewrite will affect robots is when you use it to do external redirects instead of internal rewrites.
If you are seeing well-know robots such as Googlebot, Slurp, and msnbot fetching your disallowed URLs, then it is likely that you have errors in your robots.txt file or that you have redirected the disallowed URLs instead of just rewriting them.
Use the robots.txt validator [webmasterworld.com] to check your robots.txt file, and make sure that you are using the correct syntax for internal rewrites and not redirects in mod_rewrite.
This is a rewrite (as it might appear in httpd.conf):
RewriteRule ^/dir1/([^/]+)/([^/]+)/([^/]+/?$ /index.php?var1=$1&var2=$2&var3=$3 [L]
RewriteRule ^/dir1/([^/]+)/([^/]+)/([^/]+/?$ [i]http://www.example.com[/i]/index.php?var1=$1&var2=$2&var3=$3 [[i]R=301[/i],L]
RewriteRule ^/dir1/([^/]+)/([^/]+)/([^/]+/?$ http://www.example.com/index.php?var1=$1&var2=$2&var3=$3
Jim