Forum Moderators: phranque

Message Too Old, No Replies

Would Like My Own Spider

         

bumpaw

7:29 pm on Oct 31, 2005 (gmt 0)

10+ Year Member



I've been digging around trying to find a spider to use for checking my sites functionality. The simulations fall a little short. There is a nice simulator at SEW but my meta tags are not closed to suit it. The keywords show missing, but work everywhere else. I know it takes a closing </meta> instead of /> to make it happy. (XHTML)

I downloaded a trial that looked promising but it wouldn't honor robots.txt and I couldn't set to exclude certain files. There has to be something out there. I searched here but no luck so far.

asquithea

7:33 pm on Oct 31, 2005 (gmt 0)

10+ Year Member



You should technically be able to spider your site using wget, if you're not afraid of command line utilities.

krod

7:57 pm on Oct 31, 2005 (gmt 0)

10+ Year Member



yeah, just make a simple sh script that is on cron, which executes everyfew days so it can re-index the site...

bobothecat

8:16 pm on Oct 31, 2005 (gmt 0)



Would Like My Own Spider

I've got quite a few around the house I can send you :)

... sorry couldn't resist.

bumpaw

8:34 pm on Oct 31, 2005 (gmt 0)

10+ Year Member



You should technically be able to spider your site using wget, if you're not afraid of command line utilities.

I'm not too much of a command liner but it seems like I remember that Suse has wget on it's install. I have it set up with a testing server here and will fire it up and look.

physics

8:36 pm on Oct 31, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you're on a Mac there's one called DeepVacuum that is basically a GUI to run wget. Probably are similar ones for Win.

bumpaw

10:01 pm on Oct 31, 2005 (gmt 0)

10+ Year Member



I downloaded a Wget GUI for Windows and it's not working for me. It flashes what appears to be the CMD prompt and nothing at all happens.

AlexMiles

2:06 am on Nov 1, 2005 (gmt 0)



I like Fluid Dynamics Search Engine. You can get up to a lot of spidery mischief with it.