Forum Moderators: phranque
I've been searching for a while for a script that will go to a specified URL and take certain text from that page:
- Go to URL
- Based on user inputted HTML tags (eg. new item for every table), split the text up
- Add each piece of text to a database
So basically, I need a script that can detect a tag such as "<TABLE>" or even "Products-", take the text from that point and then stop when it gets to "</TABLE>".
This should be fairly simple to achieve with a GET command and a few loops and add to MYSQL's, but I was wondering if anybody's done anything like this before so I know where to start.
Cheers
PS: Sorry if this is in the wrong forum. I could've been more specific and put it in the *nix, Apache or PHP forum but I didn't want to rule any options out.
I've been reading about the perl module - looks quite complicated but I'll keep reading :)
About the screenscraping - I'm aware of the possible implications but it'll only be required to scrape a page or two out of hundreds, and it's essentially an affiliate site so will be providing business. Thanks for mentioning that.
Anyone know a PHP solution?
BTW I've found the same problem listed here - but I don't yet have a subscription:
I'll post if I get subscribed or an update.
[edited by: trillianjedi at 10:41 am (utc) on Jan. 3, 2007]
[edit reason] TOS [/edit]
By the way, only the first one has the solution shown - I'll try that out.
[edited by: physics at 5:24 am (utc) on Jan. 4, 2007]
[edit reason] Snipped domain [/edit]