Forum Moderators: phranque

Message Too Old, No Replies

How do I return specific code from other pages?

         

ag_01

2:37 am on May 9, 2003 (gmt 0)

10+ Year Member



I am writing a web page in xhtml 1.0. I know little to nothing about program scripting (JavaScript, PHP, Java, Perl, etc.). What I'm trying to do is parse a known webpage for a known bit of code "<img src=" that preceeds an unknown file name that changes daily, then return the answer to my own webpage for posting. What type of language should I learn in order to do this?

Thanks,
AG

BlobFisk

11:55 am on May 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld, ag_01!

You would need a server side language (ASP, JSP, PHP, Perl etc.) to do this. You would need to scan a directory for the file and then include that in the code and then render the page.

It may be easier to hard code the image file name and then just upload a new image of the same name ever day?

FourDegreez

3:21 pm on May 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've done this in two languages, but there are many others.

Mivascript, see: [miva.com...]

Java, using this free API: [innovation.ch...]
download from here: [innovation.ch...]

I can show you a Java example if you like, which uses the above-named HTTPClient API:


public static void scrapeATW(PrintWriter pw) throws IOException, ModuleException, ParseException
{
System.out.println(" Scraping AllTheWeb...");
InputStream is = null;
HTTPResponse rsp = null;

// make the connection
HTTPConnection con = new HTTPConnection("www.alltheweb.com", 80);
CookieModule.setCookiePolicyHandler(null);
rsp = con.Get("/recentqueries");
if (rsp.getStatusCode() >= 300)
throw new IOException("scrapeATW(): "+ rsp.getReasonLine() + "\r\n "+ rsp.getText());
else
is = rsp.getInputStream();

// read the page
StringBuffer buf = new StringBuffer(4000);
byte[] b = new byte[2048];
int n = -1;
while ((n = is.read(b, 0, 2048))!= -1)
buf.append(new String(b, 0, n));

// scrape the search words
String page = buf.toString();
while ((n = page.indexOf("cs=iso-8859-1\">", n))!= -1) {
n += 15;
String query = page.substring(n, page.indexOf("</a>", n));
writeWords(query, pw);
}
pw.flush();
pw2.flush();
}

What this Java method does is:
Makes an HTTP connection to alltheweb.com
Requests page /recentqueries
Reads the page in to a StringBuffer, then converts to a String
Searches the string for occurences of the text "cs=iso-8859-1">" and gets the substring between that and the "</a>"
Calls another method writeWords that writes this substring to a file

In other words, it scrapes search terms from alltheweb.com.

(Note: This may violate alltheweb.com's terms of service, although I checked and it doesn't seem to. However I'm going to put the disclaimer: For instruction only. I take no responsibility for your use of this code.)

ag_01

8:34 am on May 21, 2003 (gmt 0)

10+ Year Member



Thanks for all the help.