Forum Moderators: open
I heard somewhere that SE's were going to start looking for this?
Any help is appreciated.
Thank you,
Carolyn
If the contents from step 2 and step 3 are different, then it is obvious that domain.com is using name-based virtual hosting.
As ashear already mentioned, if your 200 domains all have unique content, you can be sure that all of those are getting into Google & Co. But there is one thing which could happen: It's possible that somewhere in the future Google will ignore links from the very same IP - or even from within the same subnet - in their algorithm. So, if you intend to do heavy crosslinking with these 200 domains, be careful.
Have any idea how Google or other se's do personal
web pages or pages with 'domain.com/~name/'?
I think that these pages would require a crawler/submission approach George. Sometimes it would be possible to build a list by crawling large directories such as ODP/Dmoz or other such directories. Though over the past few years, there has been a noticable trend away from the /~ webspace towards full domain name websites (eg: www.personaldomain.com). The trigger for this was the collapse in the price of the .com domain when, with deregulation, it went from $100 to $70 for two years to $15 or less for a year.
As part of this research (hopefully I can turn SE index and county/niche index generation into a real business) I am doing to locate websites in their relevant countries, one of the elements involves identifying the IPs of all the websites in the com/net/org/info tlds. Apart from the obvious sorting on IP, it allows for the identification of linkswamps (thousands of websites pointing to a single IP) and allows the generation of a cleaner search index. The last thing that any SE operator would want to do is to have thousands of 'coming soon' sites in his index.
There is also a qualitative issue with /~ websites. As the web evolves, the cost of domains is getting cheaper and thus, a 'proper' website is more likely to have its own domain. Thus the /~ may indicate that the website is purely a hobbyist one rather than a business one. Unless it has some good PR/linkage, then it is not likely to be an 'important' site. This trend towards quality of search rather than quantity of results is going to become more apparent as Google/Yahoo/MSN all start to fight it out for the number one position.
Purely from indexing Irish websites with my own SEs, I've noticed that a significant proportion of the sites have not been modified this year. Results in the com/net/org sites also followed this pattern. I don't like extrapolating these figures to cover the full web but I would not be surprised to see that 50% or so of all websites are effectively dead/unmaintained/brochureware or speculative domain registrations.
Regards...jmcc
There is a way to determine for an engine whether the page is hosted on its own IP or if it's using name-based virtual hosting:
1. Get the IP of domain.com
This would be a good approach where an SE suspects crosslinking on a group of websites all hosted on the same IP or a number of close IPs. However if it is to be applied on a global basis, it would really be as simple as step 1 Fischerlaender. It would require a number of large database tables to correlate website IPs and identify linkswamps and potential crosslinking.
I am not sure which method Google would use though - a lot of clever people work there but the Google SE has some serious flaws (accurate geographical localisation of results for one) and an elegant general solution for crosslinking identification is probably the least of their problems.
Regards...jmcc