shazam - 3:05 am on Sep 28, 2011 (gmt 0)
So on a few wp sites I tried this global translator plugin. It was not such a good decision, so I removed it. I also removed all the directories that it made:
I then blocked all these directories in the robots.txt file.
As we all know google doesn't respect the robots.txt these days, I keep getting 126.96.36.199 poking around where it doesn't belong. The end result, since all these directories don't exist and automatically redirect to the same url with the /gl/ removed, is the pesky duplicate content.
The only option I know of, is to manually rebuild all these directories and place something like:
Deny from All
in the htaccess for these dir's.
I can't seem to find an htaccess or other method for stopping g bot without rebuilding directories that have no business being rebuilt. They were deleted because I don't want them and I don't want pesky bots poking around.
Before it gets suggested, no, I am not going to give g more data by putting these sites in webmaster tools and using their remove url tool.
I would prefer to have these directories redirect to the English pages like they already do. This offers the best user experience, but of course is in conflict with g as I am surely getting pinged for dup content.
Has anyone ever tested simply blocking 188.8.131.52 from their sites? Does g just send another IP to ignore the bot.txt? Does your site get penalized or de-indexed?