Forum Moderators: open
UK: 1635678
Germany: 1512622
France: 715570
Italy: 372153
Spain: 329572
Netherlands: 261752
Switzerland: 212205
Sweden: 166869
Denmark: 99477
Norway: 97860
Austria: 66719
Finland: 45088
Belgium: 45076
Portugal: 15912
Greece: 5432
Luxembourg: 2664
Iceland: 1359
I have not run the domains>websites routines yet. Running website checks for about 5.6 Million domains takes some time. :)
Regards...jmcc
as for Germany you might want to look at the
Denic (Germany`s cc registrar)
at [denic.de...]
there they say :
Domains gesamt (total) : 5.938.340 (and that is cc ".de" alone ..." - the real number will be much higher -
adding all the German .com's, .net's, .org's ... and all the other tld's.
What are your criteria for establishing country relatedness - whois data?
The crossover between registrars in one country and customers in another country sharing the same language is the source of a lot of the confusion. Many Irish domain owners use UK hosting/registrars and as a result, the simple algorithm tags these domains as being UK. However by developing a more complex algorithm that produces a model of a country's internet business, these errors can be reduced. The big challenge is on domains that are registered on US nameservers and are hosted on US servers. The more complex internet business model for a country will identify a lot of these but in many cases, the whois data is necessary to accurately identify whether they really belong to a country's domains. Even the whois data can be wrong - I have seen Irish domains registered with addresses of Dublin, Ireland, UK or in some cases Dublin, UK. Again this kind of iffy whois data would require a manual decision or a very good parsing of the whois data.
After a while, from building up an image based on nameservers/SOAs/website data, it becomes possible to refine the data to produce a fairly accurate set of figures for each country and patterns of domain registration become more apparent. It is probably the closest to a precise figure without individually checking the whois data for each domain. Though at 5.6 M domains, that would take a while. :)
Regards...jmcc
(John McCormac)
Domains gesamt (total) : 5.938.340 (and that is cc ".de" alone ..." - the real number will be much higher -
adding all the German .com's, .net's, .org's ... and all the other tld's.
Probe, the German cctld is probably the biggest in the world at the moment. (I am not sure how .us will grow.) The use of US hosting by registrants in each country means that these figures will be on the low side. However the patterns are clear: Where there is a reasonably priced cctld and good internet connectivity, there will be more cctld domains registered in that country. One of the best examples for that is Belgium - it has about 200K .be domains but in CNO, it only has approximately 45076 domains. The Irish (.ie) cctld is a very good example of what happens when the cost of a cctld is too high and the connectivity is poor. The official figure for .ie registrations is 32488 but only 29801 of these domains had valid SOAs. The cctld is so badly run that it had not been actively deleting dead domains.
The next step in this process is to check the delegation of the domains (has it a Start Of Authority record (SOA) and then has the domain an associated website. Then the website and other associated data is checked.) Patterns of domain registration tend to become more apparent as the data on each particular domain increases. In some respects, it is all a question of reducing the margin of error on the present dataset before going for the big US hosted dataset. By using a crawler to detect links on some of the identified websites, it is possible to pick out some of the US hosted domains/sites. It has a lot of parallels with codebreaking - most of the work is really traffic analysis. However it is the magitude of the problem that can be, at times, quite terrifying. With codebreaking, you are either right or wrong. With this kind of work, it is all about reducing the possibility of 'wrong' and increasing the probability of 'right'. :)
Regards...jmcc
When will you be able to estimate numbers of websites by country?
I should have the French figures and the smaller country counts later today (when I wake up).The larger ones (Germany, UK, France, Spain, Italy) will take up to a day each to process. The results should be completed in the next few days. At a guess, about 75% would have websites though only spidering/linkswamp [1] analysis will determine whether these websites are active.
Regards...jmcc
[1] Identifying IPs with a large number of websites. These are often 'on hold' or 'coming soon' websites or redirection sites.
France: 715570 SOA: 632788 Websites: 599228 *
Greece: 5432 SOA: 4194 Websites: 3913
Luxembourg: 2664 SOA: 2121 Websites: 2028
Iceland: 1359 SOA: 1146 Websites: 1116
These are preliminary website figures - the indepth analysis will take a while due to following website redirects,historical nameserver migration etc. The intial figure for France is a bit high due to the existence of registrar used by clients in other countries. The free hosting services packages offered by these registrars often means that a website site of a user in country A will actually appear on the servers of the free hosting server in country B. (Sorting this kind of thing out produces a headache worse than a hangover. :) )
Regards...jmcc
I (in the UK) host dozens of sites in the US (not free hosting, but a good service). The names are all registered with a UK address but the sites are on US servers. I bet there are a lot of people who do this kind of thing. How does that affect your figures?
>a website site of a user in country A will actually appear on the servers of the free hosting server in country B.I (in the UK) host dozens of sites in the US (not free hosting, but a good service). The names are all registered with a UK address but the sites are on US servers. I bet there are a lot of people who do this kind of thing.
How does that affect your figures?
It can skew the figures considerably for some countries kapow.
The effect has two extremes: the country with more sites outside its IP space than hosted locally and at the other extreme, countries with one or more registrars in its IP space. The simplistic IP/hosted sites/cctld geofiltering model used by the big SEs will be badly affected by this problem.
What I am working on is a set of algorithms and domain usage and registration models for each country/area that can work around the geofiltering issues. The sites CNO hosted in the US are the hardest to find and tie to a particular country. This is what makes it more of a codebreaking problem than a simple domains problem.
The brute force attack (BFA) method of checking all whois data for every domain is one solution but it is not the most effective way of solving the problem. There are other techniques that can be used to identify domains relevant to each country. From work on Irish owned CNO domains, certain clustering patterns emerge quickly from the data and as a result you can begin to assign probabilities to particular registrars. Some rules are pretty easy to establish such as linkswamps and regions from the domains to be checked for a particular country. I haven't enough data yet to estimate the local:non-local percentage for the UK yet. But for Ireland, the percentage is probably 40% and upwards on websites. Hopefully this research will produce better search results.
Regards...jmcc
Of those:
2,1 Mill .de
622K .com
140K .info
41K .biz
Rest is .org, .ch, .at and some .co.uk.
I'd suggest this to be pretty representative for the german market, where 1 & 1 holds some 35%.