Forum Moderators: phranque

Message Too Old, No Replies

Spidering a Big Site

Beyond Xenu?

         

rogerd

7:28 pm on Jun 2, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I'm a big fan of Xenu for spidering smaller sites. It's a great tool to evaluate spiderability, look at page titles, and get an accurate count of spiderable pages. Its site map is a bit rudimentary, but has some value too.

I'm working on a bigger project, though, and I'm wondering if there's a ready-to-use tool that will let me spider a site with at least 100K pages. My initial objective is to see if there are more pages than have been indexed by the major SEs. Some site mapping would be handy too, though less important.

claus

7:37 pm on Jun 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



i think mnogo will do it... haven't tried it myself though. It runs on win and *nix

Easy_Coder

7:40 pm on Jun 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you know if mnogo or Xenu has the ability to crawl past NT Security if the credentials are supplied?

claus

7:57 pm on Jun 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm an Apache guy, so i don't know about NT Security, but mnogo can spider https as well as password/username protected pages, so i think it might be able to do it... just spent some time on the site... and yes, it can handle millions of pages, but you'll need some disk space and memory ;)

[edited by: claus at 7:59 pm (utc) on June 2, 2004]

TheWhippinpost

7:59 pm on Jun 2, 2004 (gmt 0)

10+ Year Member



wGet

If you're not comfortable with the command-line there is a gui for it called wGetGUI.