Welcome to WebmasterWorld Guest from 54.226.241.8

Forum Moderators: martinibuster

Message Too Old, No Replies

Anyone know of crawler software that extracts web addresses?

     

internetheaven

1:33 pm on Mar 3, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm looking for something that will crawl a given website and return all the outbound URls.

Either anything that is within ahref tags or any text that starts www.

Thanks

adder

6:00 pm on Mar 3, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Check this thread: [webmasterworld.com...]

User @affiliation has suggested a great free tool that does the job. You can set it to ignore the internal URLs - thus it will only return the outbound ones.

internetheaven

8:46 pm on Mar 3, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I tried Xenu. It only scans one page. No matter what I do, (set to 999 depth) it only scans one page.

I figured it was broken?

adder

9:16 pm on Mar 3, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



That's very unusual. Maybe the links are hidden via JavaScript? Although I doubt it. You can drop me some screenshots via Sticky and I'll have a look. I've been using it for longer than I care to remember and it's always worked great.

phranque

11:20 pm on Mar 3, 2014 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



is xenu finding any links on that page?

Either anything that is within ahref tags or any text that starts www.

xenu won't show you the (unanchored) url citations on a page so you would need a separate tool to find those.

the www. pattern won't be sufficient to capture all uris on the page as even sites that are canonicalized to the www.example.com hostname often are referred to without the www.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month