Forum Moderators: open

Message Too Old, No Replies

How work it

         

Frank_DR

2:00 pm on Apr 25, 2007 (gmt 0)

10+ Year Member



I don't now if i can ask this question here. But i need to now how i can extact text information from a site to a other site page without retyping it.

I will transfer country info from cia factbook to my site.

I have see that some other sites make use of the subdivision of the site cia factbook html pages, look below.

<!-- FileName="Connection_cf_dsn.htm" "" -->
<!-- Type="CFDSN" -->
<!-- Catalog="" -->
<!-- Schema="" -->
<!-- HTTP="true" -->

Has someone experience with this how to extract info from a other page.

Thanks
FDR

JAB Creations

12:29 am on Apr 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not that I would do that personally but it seems like using a serverside language would make more sense?

- John

httpwebwitch

2:54 pm on May 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



this is a technique lovingly dubbed "scraping"

you should use a server-side language... doing this with Javascript is well-near impossible due to cross-browser security safeguards. Any language that can make HTTP requests and do string manipulation will do. Take your pick: ASP, PHP, Perl, Python, and about a dozen others

1) create the HTTP request (usually a GET request)
2) send the request, get the results back as a string
3) parse the string and grab the parts you want out of it (I recommend using Regular Expressions)

The technique itself is not unethical; it's done all the time when building applications that use public-facing APIs, RSS feeds, and the like. Then it's not "scraping", it's... consuming a web service.

Of course I would never personally scrape content from another site. that's just wrong.

whoisgregg

4:13 pm on May 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, the CIA Factbook can be reused. From their FAQ:

Can I use some or all of The World Factbook for my Web site (book, research project, homework, etc.)?
The World Factbook is in the public domain and may be used freely by anyone at anytime without seeking permission. However, US Code prohibits use of the CIA seal in a manner which implies that the CIA approved, endorsed, or authorized such use. If you have any questions about your intended use, you should consult with legal counsel. Further information on The World Factbook's use is described on the Contributors and Copyright Information page. As a courtesy, please cite The World Factbook when used.

[cia.gov...]

Much of the data provided on US government websites are public domain. Although you should always check to make sure since the last entity you ever want legal trouble with is the U.S. government! ;)