Forum Moderators: open
I have a syndicate of clients. They have around 30 smaller sites and one major site which is the daddy of them all and contains the general information. Each small site has a different person in charge. All the sites are static. They approached me and asked for a site search so that people can search across all the sites. I can do the search functionality no problem and I can code it to strip HTML from the pages and so forth ... my problem is getting their pages into a database.
To my knowledge there are two ways of doing this:
1.Use a component. This has some problems. They want the site search to be CHEAP. Does anyone now where I can get a cheap or FREE component that can fetch pages. I needs to be able to run on WinXp Pro with IIS5.
2.Component-less code. Is this possible? If so, does anyone know where I can see some code on this?
btw, this is ASP and not ASP.net
Chris.
Dim objXMLHTTP, strScrape, strURL
strURL = "httt://www.TheSiteYouWantToScrape.com"
Set objXMLHTTP = Server.CreateObject("Microsoft.XMLHTTP")
objXMLHTTP.Open "GET", strURL, False
objXMLHTTP.Send
strScrape = objXMLHTTP.responseText
Set objXMLHTTP = Nothing
Then parse strScrape to extract what you want.
Onya
Woz
Set xml = Server.CreateObject("Microsoft.XMLHTTP")
xml.Open "GET", "http://www.example.com/index.htm", False
xml.Send ""
xmloutput = xml.responseText
Set xml = Nothing
<added>Damn, I was way too slow!</added>
The problem is I keep getting the time out error:
error '80072ee2'
/alex/devzone/scrapeTest.asp, line 12
I can get it to work on sites on the same server... but it times out for anything externally. Could that be to do with a firewall setting or something?
Error Type: (0x800C0005)
/cf/site_search.asp, line 11Browser Type:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)Page:
GET /cf/site_search.aspTime:
07 February 2003, 09:23:59More information:
Microsoft Support
Note, that if you have a proxy server, ServerXMLHTTP does not use the proxy settings from IE. Instead you must run proxycfg [msdn.microsoft.com] to tell ServerXMLHTTP about the proxy server
The code:
<html>
<head>
<title>Site Search</title>
</head>
<body>
<%
Dim objXMLHTTP, strScrape, strURL
strURL = "http://www.chrisfelstead.co.uk/default.asp"
Set objXMLHTTP = Server.CreateObject("MSXML2.ServerXMLHTTP")
objXMLHTTP.Open "GET", strURL, False
objXMLHTTP.Send
strScrape = objXMLHTTP.responseText
Set objXMLHTTP = Nothing
%>
Here is the page:
<br><br>
<% Response.Write(strScrape) %>
</body>
</html>
Chris