Forum Moderators: open

Message Too Old, No Replies

Navigating through web pages

         

Mummiez

8:22 am on Oct 18, 2002 (gmt 0)

10+ Year Member



I am importing som data from webpages using HttpWebResponse
and HttpWebRequest. The navigation structure on this web is bad.

You can not jump from page 1 to page 49.
You have to navigate from 1,2,3....to 49
Lets say page 49 has links to 50,51,52(The numbers of links is not known until I download the page)
So to download 50,51,52 I have to navigate 1,2,3....to 49 3 times.
The code I use works fine, but is there a way I can just navigate the pages without downloading all content over and over again.
Mabe a asynchronous response that stops the downloading of the page when I got the header from the server. (The server keeps track of what page im currently at and navigates to other pages based on what page I'm navigating from)

The code I use is like this..

If MyCookieContainer.Count = 0 Then Log_on 'Logon and get cookies

myWebRequest = CType(WebRequest.Create(Url),httpWebRequest)
myWebRequest.CookieContainer = MyCookieContainer

myWebResponse = CType(myWebRequest.GetResponse(), HttpWebResponse)

sr = New StreamReader(myWebResponse.GetResponseStream())
result = sr.ReadToEnd()

myWebResponse.Close()

Xoc

12:49 pm on Oct 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mummiez, welcome to WebmasterWorld!

So let me see if I got it straight:

You have a .NET server. When a request is made to your page, you go off to another web site and screen scrape data from their pages. There are are an arbitrary number of pages to scrape, all at the same URL. The first page you scrape forces you to log in. When you request the url, it gives you the page. If you request the same url again, it gives you the next page. It is storing which page you are on in a cookie (or is storing it server-side in a session variable?).

Is this correct?

Mummiez

1:38 pm on Oct 25, 2002 (gmt 0)

10+ Year Member



Thanks :)

Almost correct,it's not a server side program. Im making a window application that does almost exactly what you describes.

>It is storing which page you are on in a cookie (or is >storing it server-side in a session variable?).
Think it's in a cookie,but I'm not sure.

>If you request the same url again, it gives you the next >page.
When I receve the first page I scrape it for links to the next pages, wich is the same address only with this varible in the address. [mainpage.com&rootId=1...]

This page could contain links to mabe 8 different pages(rootid 0-7), and these 8 pages could also contain links to 8 pages each..and so on.
So if I could "lie" to the server, and tell it that I'm currently on rootId 0,0,8,1 and didnt have to download 4 pages to get the rootId1.