Welcome to WebmasterWorld Guest from

Forum Moderators: ocean10000

Message Too Old, No Replies

Web scraping with .NET

3:53 am on Oct 27, 2006 (gmt 0)

10+ Year Member


Does anyone know of a simple way to copy an entire website from one server to another using ASP.NET? I need something that can start at the root of the site and grab everything under it.

I have FTP information for the site I want to copy from, but either I'm misunderstanding the documentation or the built-in classes only let me copy one file at a time.

I could also use a web scraping utility, but I'm hesitant to ask my client to pay for it. If there's a free one anyone knows of, that would be a hugely appreciated alternative.

(And before anyone asks, I'm not doing anything shady. My client has an old site on a hosting service whose WYSIWYG she likes very much. I want to set her up so she can continue using that as a "development server", and import the content to her new site.)


6:41 am on Oct 27, 2006 (gmt 0)

10+ Year Member

Man, I knew I was going to end up doing this the hard way.. ;)

Here's the code for anyone who's interested. It could be improved through the use of regular expressions, but it was kind of quick and dirty.

ArrayList scraped;
protected void Page_Load(object sender, EventArgs e)
scraped = new ArrayList();
protected void btnSubmit_ServerClick(object sender, EventArgs e)
results.Visible = true;
private void scrapePage(String filename)
// keep track of what we've done

WebClient objWebClient = new WebClient();

UTF8Encoding objUTF8 = new UTF8Encoding();
String page = "";
Byte[] bytes;
bytes = objWebClient.DownloadData("http://example.com/" + filename);
page = objUTF8.GetString(bytes);
catch (Exception ex)

if (filename.IndexOf(".htm") > 0)
// make everything lowercase so we can find what we need
String pageCopy = page.ToLower();

String[] keys = { "href=", "src=", "src = " };
String[] links = pageCopy.Split(keys, StringSplitOptions.RemoveEmptyEntries);
// get rid of html up to the first link
links[0] = "";

for (int i = 1; i < links.Length; i++)
int firstSpace = links[i].IndexOf(' ');
if (firstSpace > 0)
// shorten these to include only what's before the first space
links[i] = links[i].Substring(0, firstSpace).Replace("\"", "");
// if there are no spaces in the section, this section is junk - erase it
links[i] = "";

// call this function for each non-blank link we've produced
foreach (String link in links)
if (link!= "")
if (!scraped.Contains(link)) scrapePage(link);

File.WriteAllBytes(MapPath("/") + filename, bytes);

7:11 pm on Oct 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

If you have the FTP info why not just open up an FTP client and just copy the stuff into another FTP window for the other server?
9:16 pm on Oct 30, 2006 (gmt 0)

5+ Year Member

how about xcopy?
1:01 am on Nov 14, 2006 (gmt 0)

10+ Year Member

Easy Coder, this is something that needs to be done automatically by the client.

oxbaker, that's a good idea.. Is it free?

12:40 pm on Nov 15, 2006 (gmt 0)

10+ Year Member

Have you looked at creating a batch file to ftp it? You could then get .NET to execute it, or schedule it.

For example create these two files and put them in the same directory:

New text file, save as data.ftp:

OPEN ftp.example.com
USER myusername mypassword
LCD C:Inetpub\wwwroot\
CD mywebsitefolder
PUT myfile.html

New text file, save as transfer.bat

ftp -n -s:data.ftp

Now run the batch file to ftp the stuff that has been PUT in the data.ftp file.


Featured Threads

Hot Threads This Week

Hot Threads This Month