I want to clean this list out, and remove duplicate company e-mail addresses. I'm not just talking about exact duplicates, i.e. x@y.com and x@y.com, as my software does this already. What I mean, is that I want to keep only one for each server e.g. if x@company1.com has subscribed, eveything else such as y@company1.com is deleted. Furthermore, if a company tries to subscribe with another e-mail address, when it has already subscribed, this should not be permitted.
This seems simple enogh, but I've been pulling my hair out searching google and various cgi sites for ages without any results. Any ideas?
<added>
Something like this should do it (JET SQL):SELECT DISTINCT Right([EmailAddress],InStr([Emailaddress],"@")+1) AS [Domain], Left([EmailAddress],InStr([Emailaddress],"@")-1) AS [User] INTO Temp
FROM [Mail List]
SELECT User & '@' & Domain FROM Temp WHERE User IN
(SELECT FIRST (User) FROM Temp HAVING COUNT(User)>1)
Tho probably a much easier way :)
</added>
[cgi-factory.com...]
However what I ideally need is a cgi script that I could run to perform the task, i.e. I input my e-mail list(initially and then at regular intervals) and it gives me a clean one without the duplicate servers.
If this doesn't make sense, apologies, as I said I am very new to cgi/perl programming.
sort -i -t @ -k 2 -u old_email_list.txt > new_email_list.txt This assumes that your addresses are stored one per line in a plain text file. It will ignore all whitespace (-i), split each line (internally) at the seperator "@", sort along the second key (after the "@") and return a unique (-u) list, where each of the sorted keys occurs only once.
The only potential problem with this approach is that you'll have no control about which of several addresses from each domain gets through and which are discarded. But this problem will be common to all aproaches other than manual elimination.
Preventing someone from subscribing with an already registered domain would need to be integrated into your existing subscription mechanism, and will therefore be more complex. I don't expect any existing software to support this, as it is quite an unusual requirement (do you really only want to allow eg. one hotmail user?)
bird: Cheers! This worked a treat. Now that I have 2 files, old_email_list.txt and new_email_list.txt is there any way I can get a list of the differences between the 2 files(i.e. the ones discarded, for my records) using GNU Tools? Then I can check manually for things like hotmail, aol etc. addresses they have been accidently deleted.
As regards preventing this happening in the future, surely a simple cgi script could do the trick. I could have a form on my website(unrelated to the existing mailing list software), and if someone typed in their e-mail address it checks the domain against the existing list of subscribers, and if unique it adds it to file1, but if not to file 2. The unqiue ones can then be added to my mailing list software later, without having to interfere in the existing software. Is there no cgi script that will do this(mailing list functionality not needed)?
Thanks.
1. Is there any way of stopping it resorting the list back into alphabetical order by server when using 'sort.' I would prefer the list to be left as it is, with repeated instances of any given server deleted.
2. Is there any way of comparing two lists and outputing the difference using GNU tools(see above question) or not?
Thanks.
diff is used to compare files. But that will only be useful with sorted files either.