Forum Moderators: phranque

Message Too Old, No Replies

Best website downloader

         

xFoundry

8:03 pm on Jun 19, 2015 (gmt 0)

10+ Year Member



I will be working on downloading a lot of pages like this:

domain-name.com/go/to/some-name01-here/12674747474/

I would like to start downloading from this page, include it, and also download all subdirectories, but only on this domain, and descendent, from the main ulr, like the one above. It is usually 10 or 20 pages max, and nothing else. There may be images on these pages, and I would like to include them too.

So this is basically -> enter url like the one above -> download 10-30 pages (this url, and all underneath it, only on this domain)

I've been testing ScrapeBook for Firefox, but there may be something better. I've been also trying HTTrack and Teleport Pro, but these, as far as can go back remembering, never work.

What could be the best solution for this? Something fast would be good too. I may work on 10,000 separate urls like this, lets say.

Thanks.

Terabytes

8:40 pm on Jun 19, 2015 (gmt 0)

10+ Year Member



lets say, that most members here work very hard to prevent exactly what you're attempting. (Including myself)
just sayin'

lucy24

9:15 pm on Jun 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What he said.
I've been also trying HTTrack and Teleport Pro, but these, as far as can go back remembering, never work.

I'm not familiar with Teleport, but one reason HTTrack may not work is that its default UA string includes its name, which of course will be blocked by any self-respecting website.

There's nothing to prevent a human visitor from using their browser's Save function to read offline at their leisure. But I don't think any human would have time to read 10,000 web pages, let alone 10,000 complete sites.

keyplyr

11:09 am on Jun 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



HTTrack may not work is that its default UA string includes its name, which of course will be blocked by any self-respecting website.

Would it work if the website didn't respect itself?

lucy24

4:46 pm on Jun 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, it must work for somebody, or the program wouldn't exist.

Marshall

10:09 pm on Jun 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not sure of your end purpose for downloading pages and images from websites you do not own, but you could find yourself walking on thin ice.