Forum Moderators: phranque

Message Too Old, No Replies

Changing IP to yield 404 for all non-google-verification URLs

mod_rewrite of IP to www now yields 404 not found

         

martinmartin

3:39 pm on Apr 3, 2010 (gmt 0)

10+ Year Member



What would happen (would it be a good idea) if you redirect Google spiders of IP to yield 404 for all non-google -verification URLs?

jdMorgan

4:46 pm on Apr 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sounds like a bad idea to me... although the description is a bit ambiguous.

I suspect you'd immediately lose all "trustrank" with this approach. I would think robots.txt or on-page <meta name="robots" content="noindex"> tags would be a better solution.

What is the end purpose you're trying to accomplish?

Jim

g1smd

6:50 pm on Apr 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I guess he doesn't want Google to know that there are verification files for other searchengines present on the server. Why he wants to do that, or what he thinks it will achieve, I have no idea.

martinmartin

11:26 pm on Apr 3, 2010 (gmt 0)

10+ Year Member



Our site has unsuccessfully attempted for one year to stop G from indexing our IP. After only recently putting in place a proper mod_rewrite redirect an advisor recommended to move the IP on our Apache server so that it would not touch any of the domain pages and thus deliver a 404 when spidered by G. Trying to find out if this is a stupid idea or okay to do. Our Google foo has dropped from 500k pages to a few hundred only today, so upsetting Google seems the least of our problems at this point.

jdMorgan

1:27 am on Apr 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A 301-Moved Permanently redirect from all non-canonical hostnames (including the IP address) to the canonical hostname would be the proper approach.

Assuming your preferred hostname is "www.example.com," then all of the following hostname variations should 301-redirect to it:

12.34.56.78 (IP address)
example.com (non-www)
<any-subdomain>.example.com (i.e., any subdomain not matching "www")
FQDN, such as www.example.com. (ends with a period)
Appended port number (www.example.com:80)
FQDN and appended port (www.example.com.:80)

For single-hostname sites, it's usually easier to test for an exact match on the canonical hostname or blank, and redirect otherwise.

But regardless of the implementation, a 301 is the proper solution. At the same time, the site should be thoroughly checked to be sure that all hard-coded on-site links refer to the correct canonical hostname. Xenu Link Sleuth (or similar) may be helpful in this regard.

You want to assertively tell the search engines what your hostname is with a 301 redirect, rather than trying to lock them out. If you *were* able to to successfully lock them out, then they would just keep those old IP-address-based URLs in their index for years, because they'd be unable to fetch them to find out that they were no good.

Jim

g1smd

4:46 pm on Apr 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One issue when IP addresses are involved is that
www.example.com
might appear at
11.22.33.44/example.com/
or similar.

In this case, the IP redirection rule needs some small modifications in order to properly redirect to the correct site:

RewriteCond %{HTTP_HOST} ^11.22.33.44$
RewriteRule ^([^.]+\.[\.]+)/(.*) ht[i][/i]tp://www.$1/$2 [R=301,L]

martinmartin

5:52 pm on Apr 10, 2010 (gmt 0)

10+ Year Member



thank you so much for your help. i have one more concern that is related i think...
every part of site is being indexed with the exception of our most important content area. Sitemaps for this area submitted, downloaded with zero indexed. We have asked numerous experts to no avail. Not a new site. Sitemaps submitted long ago. Not a new site. a real conundrum. details if you wish.

jdMorgan

8:41 pm on Apr 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check robots.txt t be sure those sections/pages are not Disallowed. Check the on-page <meta content="robots"> tags to be sure the pages are not "noindexed". Fix any canonicalization problems (any given page should have one and only one unique URL that 'works' to reach it, and any other variations whatsoever in the URL should be redirected to that canonical URL). Be sure that each page has a unique title and a unique description, and that both accurately reflect the content of that page -- The "company name", if included in either, should be last, except perhaps on the home page. Examine on-site linking to be sure that 'link juice' is distributed to those pages you want indexed, and tha link-text is relevant to the target pages. Examine inbound links from other sites to be sure that they use good and accurate link text, and that those sites that link to you are related to whatever it is that your site is about.

In addition to the above, there are several years' worth of reading for SEO-related issues in our Google, Yahoo, Bing, and other search engine forums -- Check the Library section of each forum (the link to the Library is at the top of every forum page).

Jim

martinmartin

1:55 am on Apr 13, 2010 (gmt 0)

10+ Year Member



Thanks. Funny thing is; those areas that are not indexed under domain have been indexed under our IP. We have since redirected and 404d as described above.