Forum Moderators: open
unusualword1.com
unusualword2.com
unusualword1.net
unusualword2.net
unusualword1.org
unusualword2.org
unusualword1.us
unusualword2.us
Before registering I checked both "unusualword"(s) on all major SE's first and it appeared no where, not even in the contents of any other page. They are very odd "made up" words.
The .net, .org, and .us names were only originally registered for protection as "unusualword?" are also the company's names.
Having put sites up at the .com versions I left the other domains parked at the registrar as their use wasn't critical. I then setup some links to the .com's and waited for Googlebot.
All worked out fine and it indexed the .com's as expected.
But, when I searched today for "unusualword1" and "unusualword2" in addition to the expected .com's it now also returns the .net, and .org parked pages in the SERPs, but not the .US versions? It has also cached the registrar's parked pages for these domains.
This isn't really a problem as I will unpark them from the registrar and redirect them to the .com versions.
But, the question is how did Google find the .net and .org domain names? There are no known/findable links to them, and they do not appear on any web page as plain text other than the parked "coming soon" style registrar pages.
Does Google now automatically attempt to index anything that has a .net or .org version of a domain when it finds a .com?
Any other thoughts on how Google is doing this?
My best guess is that they are using the whois.
I have heard of many others having this happen to them also.
Their googles)database is awash with duplicate content.
In MHO this was a major screw-up on their part. There is no way they should spider parked domains.
Not one of the domains that I have parked even had a page up, they were simply re-directs - 301
Personally I am doubting the Toolbar in my circumstances as these examples are unlikely to be visited by anyone by accident. I had never even been to the parked pages myself.....but you never know ;)
Using the domain registry info seems to be the most likely answer.
The thread at [webmasterworld.com...] started a couple of hours later got me thinking (bad thoughts) about this scenerio.
I've spent the morning redirecting all previously parked domain names :)
No, AdSense is not used on these domains. They are sites for companies that only want to sell their own products.
Hence the concern that if you type in the company name it returns the parked domains as well (albeit below) and the major concern that the Google DomainPark feature could have potentially in the future been used to advertise competitors.
[edit]maybe someone got the newsletter and had the Alexia or Yahoo toolbar installed?[/edit]
At [webmasterworld.com...] msg: #20:
GoogleGuy said:
>I still differ with you Brett, but feel free to mail me some examples (deep pages--none of this root page stuff ;) .
GoogleGuy seems to be pretty definitive about the ToolBar in that thread, but doesn't seem to want to confuse the situation with "root page stuff".
There could be other reasons for unlinked deep pages getting indexed and "post hoc, ergo propter hoc" is probably the case here with regards to the ToolBar. I know of several deep unlinked pages that are frequented by ToolBar users that have never been indexed....so there seems to be evidence both ways, but I personally conclude the ToolBar is not likely to be able to differentiate. It should either crawl/index all non-linked pages or none when accessed via the ToolBar.
Based upon the above, and based upon what max_rk said a few hours ago at [webmasterworld.com...] msg: #31 (was more shocking before the snip) I am now strongly swaying to Google using the whois/registry info to attempt to index parked domains (.com's, .org's and .net's but not .us's).
There has to be money to be made from selling AdWords on parked domains. The more parked domains indexed the stronger the incentive for registrars to change their parked domain pages to those selling AdWords via the DomainPark program. Not to mention Google gets to remain the biggest index on the net.
The source for which domains are active and pointing at which IP's is largely independent of who owns these domains, as domains can be pointed towards this, that, or nothing, even if the owner doesn't change. Also, sale and transfer of domains can take place no matter if the domain point to an active server or not.
The Domain Name System decides which domains point at what, or not. Whois info is used to decide who is owner, admin, and the person who pays the fee for the domains. It's just two different things, really. (There's even a third thing; "does the server respond or not?" - the server can easily be down or unreachable, even if the domain is active and owned/paid for.)
/claus
The Domain Name System decides which domains point at what, or not. Whois info is used to decide who is owner, admin, and the person who pays the fee for the domains. It's just two different things, really.
Have I been imagining the name servers listed in whois?
claus, you are splitting hairs so fine, they're nearly microscopic. Talking about the whois database and the DNS system like they're two unrelated things is disingenuous. The whole point of the whois system is to associate domain names with name servers. Domain registries exist to create the zone files. No zone file, no DNS system.
That said, I doubt Google is using zone files. I'm thinking this may be an effect of domain/IP uncouplings. Remember all those people who complain that Googlebot is caching IP addresses instead of domain names? Maybe that's allowing Googlebot to accidentally find new domains who take over previously-used IP addresses.
Other DNS changes occur on individual nameservers; if I switch a domain pointing at my nameservers to a different IP then you won't know unless you ask my nameserver, the same applies to setting up www.example.com, www.calum-s.new.domain.example.com, etc.
Several Internet related services visit new domains shortly after domains are registered, Googlebot is one of them.
>> the name servers listed in whois
I'm sorry if i have caused confusion by not mentioning that the names of the nameservers are part of your whois records. Afaik, it is required that you have at least two name servers for a domain. Preferrably they should not be the same. The names of these servers are registered on your whois records, the content of them is not.
You can't in any way get DNS information from your whois records. These are two separate - albeit related - things. The zone files that you mention are simply not part of your whois records, only the names of the name servers are. The name of a server is not the same as the content that is on it.
So, when a domain is registered, it is assigned a (few) name server(s). This information is stored on the whois records. The whois records does not tell where these domains point to (if anything), as that is not what whois records are used for, they only tell you which name servers are authoritative for that particular domain.
In any case, to resolve a domain, one would query the name servers like this:
1) first ask the primary nameserver for the TLD
2) this server responds with nameservers for (second level) domain
3) ask primary nameserver for (second level) domain
4) if this one can't find the name, ask secondary nameserver
5) if this one can't find the name, ask third nameserver etc.
The whois records are usually maintained by the same entity that operates the primary name servers for a TLD. That is how they do their part of this chain of events, which is (1) and (2). Step (3) and onwards are the ones that will actually reveal if your domain is pointing towards anything, and if so, where your webserver is located.
This chain of querying is carried out each time you request a page from a domain. For a Search Engine, doing this for millions of requests might add some overhead and perhaps slow the crawling operation down. It is possible and perhaps even tempting to cache DNS records, ie. to hold a local copy. This is, however, not a good idea, as domains keep changing locations all the time, and hence you need to update that cache very frequently in order to stay updated.
Previously, Google was notoriously s-l-o-w to realize that domains had moved from one host to another, and this caused all kinds of problems. As the thread Is Google getting faster in all aspects (supporters forum(*), login req'd.) [webmasterworld.com] and others show, they have been getting increasingly faster with this during their "technology update" (the period between "Florida" and the previous SERP update). This is good, as now you don't really need to know a whole lot of technical server stuff in order to make sure that you're indexed properly; just move, wait for the googlebot, and then close down the old account - that's pretty simple.
So, what is the common denominator between moving from one host to another and setting up a new domain? Your DNS records need to change.
Now, feel free to believe anything you like, and especially do make those "reality checks" whenever you see anything posted in a public forum. Not everything you read is the whole truth, or indeed the truth, so i can only recommend that you question most if not all you read. Yes, even the things i post myself, i only welcome that as i do realize that sometimes i'm plainly wrong about things.
In this case, i might be wrong as well as in any other case. The only people that will know for sure are those employed by Google that knows about such matters. I have never seen Google confirm or deny that they used DNS records or DNS caching for any purpose, either officially or unofficially. Not even as an indication, and the same goes for whois records, by the way.
Added: this is Google's official statements about new sites as presently found on their webmaster pages:
Google is a fully automated search engine, which employs robots known as 'spiders' to crawl the web on a monthly basis and find sites for inclusion in the Google index. Since this process does not involve human editors, it is NOT necessary to submit your site to Google in order to be included in our index. In fact, the vast majority of sites listed are not manually submitted for inclusion.
(...)
The best way to ensure Google finds your site is for your page to be linked from lots of pages on other sites. Google's robots jump from page to page on the Web via hyperlinks, so the more sites that link to you, the more likely it is that we'll find you quickly.
[google.com...]/claus
(*)Note: It originated outside the supporters forum, so i did contribute to it before it was moved
From the Holiest of Holies, RFC 1034 [faqs.org]:
The DNS has three major components:<snip>
NAME SERVERS are server programs which hold information about the domain tree's structure and set information.
Name servers are part of the domain name system. Name server locations are part of the domain name system. There's DNS info in the whois database -- it's used to create the zone files. The DNS wouldn't work without the zone files.
Apparently, the problem is that when I write "DNS", I actually mean "Domain Name System". When you write "DNS", you guys actually mean "IP number".
[edited by: ciml at 1:44 pm (utc) on Dec. 11, 2003]
[edit reason] please see sticky [/edit]
mbauser, there's no need to be rude. I think you have misunderstood something, as i think i've been pretty clear on this issue. Of course your name servers are part of the DNS system, see step 3 and onwards in my post above.
Still whois information is not DNS records.
Added for clarity:
Whois ia a protocol (RFC 954 [faqs.org], RFC 812 [faqs.org]) used for a directory service, like a telephone book or something. It's not DNS, even though names of nameservers are listed. Ie. it may list your nameservers, but it does not do the work of your nameservers.
/claus
Unfortunately the host i used does not allow me to view log files, so i'll have to move it to another host. In any case, i cannot see if googlebot should have passed by - the page is nowhere to be found in the Google SERPS (as in not indexed, not as in low ranking).
In this case i can rule out DNS as a source for new domains, or at least it takes some time for the domain to make it to the index (as i can't see from the logs if googlebot really have visited or not). Also, this is a regional TLD (a ".dk" domain, which might be differently treated than a ".com") I have visited the domain one time with the Toolbar after 5 days and will revisit it today.
/claus