I have my server set up to 301 any loads from the non preferred domain example.com or 123.123.123.123, to the preferred domain www.example.com.
This is intended to prevent duplicate content indexing, and keep everything on a single consistent domain. This redirect has been in place for over a year, probably more like 18+ months.
This doesn't seem to have stopped Googlebot though - roughly every fourth page fetch at the moment is Googlebot trying to index the site by its IP, even though every such attempt is 301'd to www.example.com
Shouldn't a 301 response effectively remove that URL from the index? Google still has 600k+ pages showing when I search for "site:123.123.123.123" ... remember this 301 redirect has been in place for some time.
There's 3 things I can think of to do:
1) Add [
123.123.123.123...] to GWT and set a lower crawl rate. This probably won't remove the pages from the index, and crawl rate will revert to what G decides is appropriate in 90 days anyway.
2) Start returning a 404 or 410 (Gone) status when Googlebot attempts a fetch from 123.123.123.123. (I can't do this universally because G is still referring searchers to the IP-only hostname!)
3) Set up a custom robots.txt for [
123.123.123.123...] that disallows everything. I presume this will remove all [
123.123.123.123...] URLs from the SERPs, but I also presumed that 301'ing everything would too :)
Thanks in advance for any tips.