homepage Welcome to WebmasterWorld Guest from 54.234.147.84
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Marketing and Biz Dev / Link Development
Forum Library, Charter, Moderators: martinibuster

Link Development Forum

    
Anyone know of crawler software that extracts web addresses?
internetheaven




msg:4650717
 1:33 pm on Mar 3, 2014 (gmt 0)

I'm looking for something that will crawl a given website and return all the outbound URls.

Either anything that is within ahref tags or any text that starts www.

Thanks

 

adder




msg:4650772
 6:00 pm on Mar 3, 2014 (gmt 0)

Check this thread: [webmasterworld.com...]

User @affiliation has suggested a great free tool that does the job. You can set it to ignore the internal URLs - thus it will only return the outbound ones.

internetheaven




msg:4650793
 8:46 pm on Mar 3, 2014 (gmt 0)

I tried Xenu. It only scans one page. No matter what I do, (set to 999 depth) it only scans one page.

I figured it was broken?

adder




msg:4650799
 9:16 pm on Mar 3, 2014 (gmt 0)

That's very unusual. Maybe the links are hidden via JavaScript? Although I doubt it. You can drop me some screenshots via Sticky and I'll have a look. I've been using it for longer than I care to remember and it's always worked great.

phranque




msg:4650820
 11:20 pm on Mar 3, 2014 (gmt 0)

is xenu finding any links on that page?

Either anything that is within ahref tags or any text that starts www.

xenu won't show you the (unanchored) url citations on a page so you would need a separate tool to find those.

the www. pattern won't be sufficient to capture all uris on the page as even sites that are canonicalized to the www.example.com hostname often are referred to without the www.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / Link Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved