Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is Google crawling non-existent subdomains? - iphone. mobile. wap etc

         

bumpski

5:16 pm on Jan 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've recently discovered that Google has been indexing some of my site(s) pages under non-existent subdomains. Subdomains such as iphone.example.com mobile.example.com wap.example.com. So many pages end up indexed multiple times by Google, which of course is not good.

Unfortunately server setup and my brute force fix to this problem do not allow me to investigate whether Google is inventing domains attempting to find new content for mobile/iphones etc. The Apache server does not indicate the incoming domain/subdomain in the logs, so this problem is very easy to miss!

Fortunately the sites in question use the www subdomain prefix. Because of this the following test:
site:example.com -site:www.example.com
reveals all the pages Google has indexed under the non-existent subdomains. One reason to consider putting your primary site content under a subdomain!

Sometime in the past several months I believe my trusty webhost has enabled wildcard subdomains without my knowledge. I do not believe this is the default Apache configuration. And normally this wouldn't be much of an issue unless someone linked to or attempted to crawl pages at these non-existent subdomains.

I have not found any external sites with links to these non-existent subdomains, which is why I strongly suspect a crawler; Google itself, inventing subdomains, trying to find content for mobile devices etc.

My brute force fix was to turn off Wildcard Subdomains and I believe the pages indexed under spurious subdomains are declining. For me I felt this was the most definitive and easy to implement a fix.

Webmaster tools shows many links to these non-existent subdomains but only from my pages! I can guarantee virtually all the internal links of the sites are correct and specifically point to only a www.example.com subdomain. (Especially now with the fix in place or the sites wouldn't work well at all!)

So my main question is: Is Googlebot (mobile?) experimenting with crawling non-existent subdomains?

And I'd suggest Webmasters look carefully to see if your content might be indexed under subdomains you may not know about!

In some cases this could turn into a major duplicate content and for sure supplemental status for some of your website's pages. Also note if your pages do go supplemental your images will disappear from Google images.
Supplemental test:
-site:www.example.com/* site:www.example.com

tedster

7:46 pm on Jan 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's a good heads-up warning. Google often does various kinds of exploratory crawling for other purposes, so this would be right in line with their previous practice. Bottom line is make sure you don't have wild card subdomains active, I guess.