Welcome to WebmasterWorld Guest from

Forum Moderators: martinibuster

Message Too Old, No Replies

Need Tool to Extract Anchor Text

1:02 pm on Aug 31, 2013 (gmt 0)

New User

joined:Sept 14, 2012
posts: 15
votes: 0

Please help, I have urls of all of the links where my site's link present

I am looking for a tool which can extract the anchor text and the corresponding landing pages of my site, because it is huge data..so tool suggestion please
1:22 pm on Aug 31, 2013 (gmt 0)

Preferred Member from GB 

10+ Year Member Top Contributors Of The Month

joined:July 25, 2005
posts: 400
votes: 11

Majestic SEO - but it won't work with your data. It only works from its own database (which should have the majority of your links indexed anyway).

Or alternatively, if you insist working with your own list, post a job on oDesk and get somebody to write you a script. If I understand you correctly, this is a very simple job for somebody who knows PHP and Regular Expressions.
10:21 pm on Sept 16, 2013 (gmt 0)

New User from RO 

joined:June 12, 2012
votes: 0

The best option is SeoSpyglass. You can import all your WT backlinks and to analyze them.
10:38 pm on Sept 16, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 30, 2002
votes: 6


Would be a relatively simple script.

$domain = 'example.com'; // Domain you're interested in
$dom = new DOMDocument; // Use DOM Document class
$dom->loadHTML('htmlfile.htm'); // Load HTML up, you didn't mention whether you have the pages fetched yet

$elements = $dom->getElementsByTagName('a'); // Get all <a> tags

// Iterate through <a> tags
foreach($elements as $e) {

if($e->hasAttribute('href')) { // Only consider <a> tags with an href attribute

$href = $e->getAttribute('href');
$parsed = parse_url($href); // Parse URL into its components
if(!isset($parsed['host'])) // Ignore relative URLs, badly formed URLs, javascript etc

if(preg_match("'(^|\.)$domain$'i",$parsed['host'])) // Match the domain and any subdomain
echo $href,"\t",$e->nodeValue,"\n"; // echo the href and anchor text

It gets a little more complicated if yo want to look for <area> tags and also parse different formats, like PDFs.

Fetching the actual pages should be relatively straightforward also.

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members