I've been working on a project categorising approximately 2 million .eu websites over the last few weeks. One of the final issues is determining if a website is genuinely a .eu website or a site from another TLD being served as a .eu website. The theory is that some purely other TLD site will have no .eu relative <a href= tags (.eu sites will potentially have .eu or site relative anchors). (I've also used link rel="canonical" element to identify some non-eu sites as the canonical element is supposed to be domain specific.)
While some outbound links will be to stats sites or Social Media networks, does the logic that a site with an array of what appear to be navigation links to the same non-eu website is actually a non-eu site being served as a .eu site and is therefore a duplicate content website hold up? Or would it be neccessary to compare these pages with the other TLD website page to see if they are identical and thus duplicate content?