homepage Welcome to WebmasterWorld Guest from 54.163.139.36
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Marketing and Biz Dev / Link Development
Forum Library, Charter, Moderators: martinibuster

Link Development Forum

    
Anyone know of crawler software that extracts web addresses?
internetheaven

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4650715 posted 1:33 pm on Mar 3, 2014 (gmt 0)

I'm looking for something that will crawl a given website and return all the outbound URls.

Either anything that is within ahref tags or any text that starts www.

Thanks

 

adder

5+ Year Member



 
Msg#: 4650715 posted 6:00 pm on Mar 3, 2014 (gmt 0)

Check this thread: [webmasterworld.com...]

User @affiliation has suggested a great free tool that does the job. You can set it to ignore the internal URLs - thus it will only return the outbound ones.

internetheaven

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4650715 posted 8:46 pm on Mar 3, 2014 (gmt 0)

I tried Xenu. It only scans one page. No matter what I do, (set to 999 depth) it only scans one page.

I figured it was broken?

adder

5+ Year Member



 
Msg#: 4650715 posted 9:16 pm on Mar 3, 2014 (gmt 0)

That's very unusual. Maybe the links are hidden via JavaScript? Although I doubt it. You can drop me some screenshots via Sticky and I'll have a look. I've been using it for longer than I care to remember and it's always worked great.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4650715 posted 11:20 pm on Mar 3, 2014 (gmt 0)

is xenu finding any links on that page?

Either anything that is within ahref tags or any text that starts www.

xenu won't show you the (unanchored) url citations on a page so you would need a separate tool to find those.

the www. pattern won't be sufficient to capture all uris on the page as even sites that are canonicalized to the www.example.com hostname often are referred to without the www.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / Link Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved