Forum Moderators: coopster
Every day I get an email landing in my inbox with a report in it. It's in XML format, zipped, and it's full of crucial data for my business.
I have this 5-minute routine that I'd like to automate. Here's how it goes:
1) open my email, see the new message
2) open the message, save the attachment (it's always called "report.zip")
3) unzip the archive, and get the one file out of it (which is always called "report.xml")
4) rename "report.xml" to "[yyyy-mm-dd]-report.xml" and archive it
5) upload the new XML file to my server
6) run a little parsing script that puts the data into a SQL database
The data isn't useful to me until it's in the SQL database. So this isn't an optional activity. Oh but I wish it were. Because much as I love programming and solving problems, I actually hate work. I'm quite lazy. But I've heard that's an admirable trait for a developer, which is good for my self-esteem. Besides, I don't have a lot of time to do this stuff.
Automating this is not going to be fun. But seeing as pretty soon I'll be getting not one report, but dozens, then by autumn potentially hundreds, every morning... Automation. Not fun, this, but definitely necessary, and soon.
I see this is going to be quite a challenge. It's full of things that I've never done before. Here are some ingredients:
a) CRON. I've got that covered.
b) conneting to email (POP3) with PHP to get the attachment
c) archive the file. That step is rather important
d) unzip the archive, get that one file outta there. Get its XML contents.
e) take that XML and parse it (that script is done)
there's a sixth requirement:
f) make sure I don't parse the same file twice, because then I'd get piles of duplicate data in the database, and I really would not like that to happen
The ones I'm just at a loss for are ingredients b) and d)
Even if I could automate steps 1-4 above, that would save me oodles of time
If you really prefer to do it asynchronously through cron, have a look at the imap/pop3 [us2.php.net] functions. You're mostly interested in open, bodystruct or fetchstructure, and fetchbody functions. Also have a look at the get_mailboxes, check, and fetch_overview functions. You could prevent duplication by logging the source and timestamp of the message as you process it.
At the top of the script you'd want to make sure it's one of your report emails and not spam or other irrelevance. If you look at the raw structure of one of the report emails you'll probably see that it's not too hard to parse out the attachment, then save it as a [temporary] file.
Then use the appropriate compression extension [us.php.net] to decompress it and archive it wherever, then it's on to the parsing.
Getting extensions installed on my server is not a trivial task for me - I'm renting a VPS half way across the globe - it requires some phone calls to the hosting company, but they did it in the past when I needed the XSLT extension, so they'll likely do it again.
I checked those links and you've pointed me in a good direction; I just hope I have the stamina to get to the finish line. Cheers ~