Forum Moderators: phranque

Message Too Old, No Replies

grab news from html sites

         

Daniel_74

4:04 pm on Jul 8, 2006 (gmt 0)

10+ Year Member



Hello,
does anybody know a script that does convert news websites in html to rss or store html content into databases?
Basically what I search is what this service does: www.feed43.com

Best regagards,
Daniel

rocknbil

8:47 pm on Jul 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome aboard Daniel, first review the Terms of Service for this site, particularly #13, about posting URLs.

What you are asking for is called "scraping" and is generally discouraged, especially by the sites from which you are doing the scraping. It pumps up entries in their logs with no real visitation, skewing any statistical analysis of site traffic, and is generally annoying to the "victim" site.

So chances are you will find few resources for this or have to write a script on your own. You are better off learning a little about XML and getting your news from a site that offers it freely through RSS XML feeds. That's better anyway, you don't have to strip anything off, just parse the feed and format it the way you like.