Page is a not externally linkable
TypicalSurfer - 12:09 pm on Oct 9, 2012 (gmt 0)
discarding 'non-UK' sites
A web crawler can be designed to store all found links but only crawl those that meet certain criteria (TLD in this case), so it wouldn't be a matter of discarding documents, you just don't crawl off target pages.
crawl seed list > collected links stored in a crawl db > crawl db cleaned of unwanted TLDs > crawl selected urls > repeat