|Indexing new domains with no links|
How is Google doing this?
Three months ago I registered 8 new domain names of the form
Before registering I checked both "unusualword"(s) on all major SE's first and it appeared no where, not even in the contents of any other page. They are very odd "made up" words.
The .net, .org, and .us names were only originally registered for protection as "unusualword?" are also the company's names.
Having put sites up at the .com versions I left the other domains parked at the registrar as their use wasn't critical. I then setup some links to the .com's and waited for Googlebot.
All worked out fine and it indexed the .com's as expected.
But, when I searched today for "unusualword1" and "unusualword2" in addition to the expected .com's it now also returns the .net, and .org parked pages in the SERPs, but not the .US versions? It has also cached the registrar's parked pages for these domains.
This isn't really a problem as I will unpark them from the registrar and redirect them to the .com versions.
But, the question is how did Google find the .net and .org domain names? There are no known/findable links to them, and they do not appear on any web page as plain text other than the parked "coming soon" style registrar pages.
Does Google now automatically attempt to index anything that has a .net or .org version of a domain when it finds a .com?
Any other thoughts on how Google is doing this?
Most probably someone who has been at these sites uses the Google Toolbar.
The toolbar collects urls and sends them to G.
I have over two hundred domains parked that the same thing has happened to. Most are on redirects to other sites that I have developed. These domains are pointed to the other sites simply for pure type in traffic.
My best guess is that they are using the whois.
I have heard of many others having this happen to them also.
Their googles)database is awash with duplicate content.
In MHO this was a major screw-up on their part. There is no way they should spider parked domains.
Not one of the domains that I have parked even had a page up, they were simply re-directs - 301
> using the whois
Yes, for a long time now. I don't think it's a problem; new domains are unlikely to rank for anything other than unique or almost-unique words.
|My best guess is that they are using the whois. |
But I also have lots of requests for typos. These definitely come from the TB.
looks like **** is associated with them .. in my stats, before submitting the domain to google, i get whois.sc quering my site.
I just registered a deleted domain and pointed it to one of my existing domains. It now comes up with the same pr as the site it points to and comes up number 1 for the word in the domain part of domain.com. I have been getting a little traffic because it is a mispelling of a very populer website. Now I get type in traffic and SE traffic if they mispell it in an SE.
Nice to see others have seen the same issues.
Personally I am doubting the Toolbar in my circumstances as these examples are unlikely to be visited by anyone by accident. I had never even been to the parked pages myself.....but you never know ;)
Using the domain registry info seems to be the most likely answer.
The thread at [webmasterworld.com...] started a couple of hours later got me thinking (bad thoughts) about this scenerio.
I've spent the morning redirecting all previously parked domain names :)
I had same thing with domaine parked at GoDaddy.
I have a domain that has never been used that has been picked up by mirago.co.uk. Google has left it alone though.
Do you have adsense on the pages? It could be the MediaPartners bot telling Gbot?
>Do you have adsense on the pages? It could be the MediaPartners bot telling Gbot?
No, AdSense is not used on these domains. They are sites for companies that only want to sell their own products.
Hence the concern that if you type in the company name it returns the parked domains as well (albeit below) and the major concern that the Google DomainPark feature could have potentially in the future been used to advertise competitors.
Well it must be the toolbar then for I had the same thing, but with just a newsletter.html page with no inbound links, (except from a text email newsletter), and not a domain name.
maybe the hosting company has an auto submission program?
GoogleGuy swears up and down that the toolbar does not find people to be put in the index. I have seen him and Brett go back and fourth about that subject. Here is the thread [webmasterworld.com]
Well, I don't know of any other way my newsletter page could have been crawled by gbot. I didn't submit it, and I know my hosting does not do submissions. It had to be either MediaPartners bot, beause I have adsense ads at the top, or the toolbar. Of course I am open to suggestions. I have had no ref links to it except from like mail.yahoo.com, etc. And believe me, it not that good for someone to have linked to it anyway. Not the entire newsletter. The good articles, they read, change the wording, and then publish it themselves.
maybe someone got the newsletter and had the Alexia or Yahoo toolbar installed?[/edit]
Thanks to all and especially ogletree for that linked thread. I believe it throws a lot of light on this situation.
At [webmasterworld.com...] msg: #20:
>I still differ with you Brett, but feel free to mail me some examples (deep pages--none of this root page stuff ;) .
GoogleGuy seems to be pretty definitive about the ToolBar in that thread, but doesn't seem to want to confuse the situation with "root page stuff".
There could be other reasons for unlinked deep pages getting indexed and "post hoc, ergo propter hoc" is probably the case here with regards to the ToolBar. I know of several deep unlinked pages that are frequented by ToolBar users that have never been indexed....so there seems to be evidence both ways, but I personally conclude the ToolBar is not likely to be able to differentiate. It should either crawl/index all non-linked pages or none when accessed via the ToolBar.
Based upon the above, and based upon what max_rk said a few hours ago at [webmasterworld.com...] msg: #31 (was more shocking before the snip) I am now strongly swaying to Google using the whois/registry info to attempt to index parked domains (.com's, .org's and .net's but not .us's).
There has to be money to be made from selling AdWords on parked domains. The more parked domains indexed the stronger the incentive for registrars to change their parked domain pages to those selling AdWords via the DomainPark program. Not to mention Google gets to remain the biggest index on the net.
Whois info will only tell you if a domain is owned by someone, and in that case who. It will not tell you if that domain is active or not.
claus, if i got u right, whois does say if it is active or not
try using w h o i s . s c /domain.com
> GoogleGuy swears up and down that the toolbar does not find people to be put in the index.
Each time GoogleGuy has addressed the subject he's said that they did not, but he's been careful not to say that they would not.
I'm not saying that the behaviour has changed, just that it could.
seofreak you didn't get me entirely right, and i wasn't very specific either ;) The particular "whois service provider" you mention is not the same as the generic term "whois service". That particular service provider adds extra services to the "whois" service which, really, is just an owner lookup and nothing else.
The source for which domains are active and pointing at which IP's is largely independent of who owns these domains, as domains can be pointed towards this, that, or nothing, even if the owner doesn't change. Also, sale and transfer of domains can take place no matter if the domain point to an active server or not.
The Domain Name System decides which domains point at what, or not. Whois info is used to decide who is owner, admin, and the person who pays the fee for the domains. It's just two different things, really. (There's even a third thing; "does the server respond or not?" - the server can easily be down or unreachable, even if the domain is active and owned/paid for.)
Just launched a new site today, i'll give it 24 hours before visiting it with the Tbar (can't guarantee others aren't doing that though) - i'm not putting any links anywhere for a day or two. Let's see what happens ;)
I think they may have a source of getting new domains as **** does. All domains are ultimately stored into InterNIC and the other Root Zone DNS'es. What do you say? Maybe or maynotbe.....
|The Domain Name System decides which domains point at what, or not. Whois info is used to decide who is owner, admin, and the person who pays the fee for the domains. It's just two different things, really. |
Have I been imagining the name servers listed in whois?
claus, you are splitting hairs so fine, they're nearly microscopic. Talking about the whois database and the DNS system like they're two unrelated things is disingenuous. The whole point of the whois system is to associate domain names with name servers. Domain registries exist to create the zone files. No zone file, no DNS system.
That said, I doubt Google is using zone files. I'm thinking this may be an effect of domain/IP uncouplings. Remember all those people who complain that Googlebot is caching IP addresses instead of domain names? Maybe that's allowing Googlebot to accidentally find new domains who take over previously-used IP addresses.
mbauser, I can't agree. The whois changes are propagated so it is possible to keep track of changes and to see when new domains are registered.
Other DNS changes occur on individual nameservers; if I switch a domain pointing at my nameservers to a different IP then you won't know unless you ask my nameserver, the same applies to setting up www.example.com, www.calum-s.new.domain.example.com, etc.
Several Internet related services visit new domains shortly after domains are registered, Googlebot is one of them.
My above posts should really be interpreted as a guess that the DNS records might be of some value to Google and as such, they might find it valuable to use them. It might even be a nice tool to find new domains and isolated domains as well.
Then i found that i had to explain this line of thought and as i was not doing this properly, it became confused with Whois information. I'm sorry about that, here's the whole explanation. It's still just guesswork:
>> the name servers listed in whois
I'm sorry if i have caused confusion by not mentioning that the names of the nameservers are part of your whois records. Afaik, it is required that you have at least two name servers for a domain. Preferrably they should not be the same. The names of these servers are registered on your whois records, the content of them is not.
You can't in any way get DNS information from your whois records. These are two separate - albeit related - things. The zone files that you mention are simply not part of your whois records, only the names of the name servers are. The name of a server is not the same as the content that is on it.
So, when a domain is registered, it is assigned a (few) name server(s). This information is stored on the whois records. The whois records does not tell where these domains point to (if anything), as that is not what whois records are used for, they only tell you which name servers are authoritative for that particular domain.
In any case, to resolve a domain, one would query the name servers like this:
1) first ask the primary nameserver for the TLD
2) this server responds with nameservers for (second level) domain
3) ask primary nameserver for (second level) domain
4) if this one can't find the name, ask secondary nameserver
5) if this one can't find the name, ask third nameserver etc.
The whois records are usually maintained by the same entity that operates the primary name servers for a TLD. That is how they do their part of this chain of events, which is (1) and (2). Step (3) and onwards are the ones that will actually reveal if your domain is pointing towards anything, and if so, where your webserver is located.
This chain of querying is carried out each time you request a page from a domain. For a Search Engine, doing this for millions of requests might add some overhead and perhaps slow the crawling operation down. It is possible and perhaps even tempting to cache DNS records, ie. to hold a local copy. This is, however, not a good idea, as domains keep changing locations all the time, and hence you need to update that cache very frequently in order to stay updated.
Previously, Google was notoriously s-l-o-w to realize that domains had moved from one host to another, and this caused all kinds of problems. As the thread Is Google getting faster in all aspects (supporters forum(*), login req'd.) [webmasterworld.com] and others show, they have been getting increasingly faster with this during their "technology update" (the period between "Florida" and the previous SERP update). This is good, as now you don't really need to know a whole lot of technical server stuff in order to make sure that you're indexed properly; just move, wait for the googlebot, and then close down the old account - that's pretty simple.
So, what is the common denominator between moving from one host to another and setting up a new domain? Your DNS records need to change.
Now, feel free to believe anything you like, and especially do make those "reality checks" whenever you see anything posted in a public forum. Not everything you read is the whole truth, or indeed the truth, so i can only recommend that you question most if not all you read. Yes, even the things i post myself, i only welcome that as i do realize that sometimes i'm plainly wrong about things.
In this case, i might be wrong as well as in any other case. The only people that will know for sure are those employed by Google that knows about such matters. I have never seen Google confirm or deny that they used DNS records or DNS caching for any purpose, either officially or unofficially. Not even as an indication, and the same goes for whois records, by the way.
Added: this is Google's official statements about new sites as presently found on their webmaster pages:
|Google is a fully automated search engine, which employs robots known as 'spiders' to crawl the web on a monthly basis and find sites for inclusion in the Google index. Since this process does not involve human editors, it is NOT necessary to submit your site to Google in order to be included in our index. In fact, the vast majority of sites listed are not manually submitted for inclusion. |
The best way to ensure Google finds your site is for your page to be linked from lots of pages on other sites. Google's robots jump from page to page on the Web via hyperlinks, so the more sites that link to you, the more likely it is that we'll find you quickly.
(*)Note: It originated outside the supporters forum, so i did contribute to it before it was moved
Why do you two you think name servers aren't part of the DNS?
From the Holiest of Holies, RFC 1034 [faqs.org]:
The DNS has three major components:
NAME SERVERS are server programs which hold information about the domain tree's structure and set information.
Name servers are part of the domain name system. Name server locations are part of the domain name system. There's DNS info in the whois database -- it's used to create the zone files. The DNS wouldn't work without the zone files.
Apparently, the problem is that when I write "DNS", I actually mean "Domain Name System". When you write "DNS", you guys actually mean "IP number".
[edited by: ciml at 1:44 pm (utc) on Dec. 11, 2003]
[edit reason] please see sticky [/edit]
>> do you two you think name servers aren't part of the DNS?
mbauser, there's no need to be rude. I think you have misunderstood something, as i think i've been pretty clear on this issue. Of course your name servers are part of the DNS system, see step 3 and onwards in my post above.
Still whois information is not DNS records.
Added for clarity:
Whois ia a protocol (RFC 954 [faqs.org], RFC 812 [faqs.org]) used for a directory service, like a telephone book or something. It's not DNS, even though names of nameservers are listed. Ie. it may list your nameservers, but it does not do the work of your nameservers.
A week has now passed since launching that new domain from message #23 - it was launched with four pages of indexable html content, mostly text. It's still an "isolated island" in terms of inbound links, but it has outbounds.
Unfortunately the host i used does not allow me to view log files, so i'll have to move it to another host. In any case, i cannot see if googlebot should have passed by - the page is nowhere to be found in the Google SERPS (as in not indexed, not as in low ranking).
In this case i can rule out DNS as a source for new domains, or at least it takes some time for the domain to make it to the index (as i can't see from the logs if googlebot really have visited or not). Also, this is a regional TLD (a ".dk" domain, which might be differently treated than a ".com") I have visited the domain one time with the Toolbar after 5 days and will revisit it today.