homepage Welcome to WebmasterWorld Guest from 54.204.182.118
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Marketing and Biz Dev / Link Development
Forum Library, Charter, Moderators: martinibuster

Link Development Forum

    
Need Tool to Extract Anchor Text
Kpederson



 
Msg#: 4606390 posted 1:02 pm on Aug 31, 2013 (gmt 0)

Please help, I have urls of all of the links where my site's link present

I am looking for a tool which can extract the anchor text and the corresponding landing pages of my site, because it is huge data..so tool suggestion please

 

adder

5+ Year Member



 
Msg#: 4606390 posted 1:22 pm on Aug 31, 2013 (gmt 0)

Majestic SEO - but it won't work with your data. It only works from its own database (which should have the majority of your links indexed anyway).

Or alternatively, if you insist working with your own list, post a job on oDesk and get somebody to write you a script. If I understand you correctly, this is a very simple job for somebody who knows PHP and Regular Expressions.

Nichita



 
Msg#: 4606390 posted 10:21 pm on Sep 16, 2013 (gmt 0)

The best option is SeoSpyglass. You can import all your WT backlinks and to analyze them.

brotherhood of LAN

WebmasterWorld Administrator brotherhood_of_lan us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4606390 posted 10:38 pm on Sep 16, 2013 (gmt 0)

>PHP

Would be a relatively simple script.


$domain = 'example.com'; // Domain you're interested in
$dom = new DOMDocument; // Use DOM Document class
$dom->loadHTML('htmlfile.htm'); // Load HTML up, you didn't mention whether you have the pages fetched yet

$elements = $dom->getElementsByTagName('a'); // Get all <a> tags

// Iterate through <a> tags
foreach($elements as $e) {

if($e->hasAttribute('href')) { // Only consider <a> tags with an href attribute

$href = $e->getAttribute('href');
$parsed = parse_url($href); // Parse URL into its components
if(!isset($parsed['host'])) // Ignore relative URLs, badly formed URLs, javascript etc
continue;

if(preg_match("'(^|\.)$domain$'i",$parsed['host'])) // Match the domain and any subdomain
echo $href,"\t",$e->nodeValue,"\n"; // echo the href and anchor text
}
}



It gets a little more complicated if yo want to look for <area> tags and also parse different formats, like PDFs.

Fetching the actual pages should be relatively straightforward also.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / Link Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved