homepage Welcome to WebmasterWorld Guest from 54.226.43.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / JavaScript and AJAX
Forum Library, Charter, Moderator: open

JavaScript and AJAX Forum

    
Matching Internal Page Links
ocon




msg:4592218
 7:58 pm on Jul 11, 2013 (gmt 0)

I am writing a script to identify and perform an action on internal page links. I've written my script below but I'm not comfortable with how it identifies internal links.

for(i = 0; i < document.links.length; i++){
if(document.links[i].href.indexOf(window.location.host+window.location.pathname+'#') !== -1){...}}


I need it to match the following urls:
#
#anchor
http://www.mysite.tld/file1.html
http://www.mysite.tld/file1.html#
http://www.mysite.tld/file1.html#anchor


But not:
file2.html
http://www.mysite.tld/file2.html
http://www.othersite.tld/file3.html
file2.html#anchor
http://www.mysite.tld/file2.html#anchor
http://www.othersite.tld/file3.html#anchor


Is there a better way to match internal links than:
document.links[i].href.indexOf(window.location.host+window.location.pathname+'#') !== -1

 

daveVk




msg:4592325
 1:45 am on Jul 12, 2013 (gmt 0)


But not:
file2.html
h t t p://www.mysite.tld/file2.html

h t t p://www.othersite.tld/file3.html
file2.html#anchor
h t t p://www.mysite.tld/file2.html#anchor

h t t p://www.othersite.tld/file3.html#anchor


What aspect of red entries do you consider makes them NOT "internal page links", I to do see pattern ?

phranque




msg:4592355
 5:37 am on Jul 12, 2013 (gmt 0)

What aspect of red entries do you consider makes them NOT "internal page links"

i think ocon means on-page links - as in links to the url of the page itself or links to fragment identifiers on the current page.

daveVk




msg:4592381
 7:33 am on Jul 12, 2013 (gmt 0)

Sorry somehow I read it as same site links.

What you have looks fine but will not match links in the trivial case of no '#',

assuming you are on page
http://www.mysite.tld/file1.html

will not match

http://www.mysite.tld/file1.html
http://www.mysite.tld/file1.html?id=7


etc

If site uses query strings, the query part of the the URL needs to be added to check.

It will also catch cases like

/file1.html#
file1.html#

and other relative URL notations

Readie




msg:4592581
 8:19 pm on Jul 12, 2013 (gmt 0)

You can regex check it - seems a bit more reliable than what you've got there (Assuming I've understood your request properly).


Disclaimer: Not tested this code, may not work out-the-box.
function regQuote() {
return arguments[0].replace(/([\-\.])/, function(m) {
return '\\' + m[1];
});
}

function checkLink() {
var href = arguments[0];
var regexp, check;

if(href.match(/^\#/)) {
// Starts with a hash. Just an anchor on this page
return true;
}

if(href.match(/^https?:\/\//)) {
// If it starts with http(s)://, we need to check the domain
regexp = new RegExp('^https?:\\/\\/' + regQuote(window.location.host) + '($|/|\\#|\\?)');
if(!href.match(regexp)) {
return false;
}
}

if(href.match(/^\//)) {
// Starts with a slash. Need to validate against entire path
check = window.location.pathname
} else {
// Does not start with a slash. Just want the end segment
check = window.location.pathname.split('/').pop();
}

regexp = new RegExp('^' + regQuote(check));
if(!href.match(regexp)) {
return false;
}

return true;
}

/* USAGE:

if(checkLink(document.links[i].href) === true) {
// Link to this page
}

*/

daveVk




msg:4592633
 12:11 am on Jul 13, 2013 (gmt 0)

My understanding if that document.links[ i ].href will return the fully qualified URL regardless of how href is written. Original code should be reliable.

After some tests, it appears that my_a_dom_node.href always return the full-qualified URI, including the [domaine.name,...] which should be okay for what I want.

[edited by: whoisgregg at 6:45 pm (utc) on Jul 29, 2013]
[edit reason] rv link, please see sticky mail :) [/edit]

Readie




msg:4592768
 2:27 pm on Jul 13, 2013 (gmt 0)

Ahh. I did not know that. Never mind then :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / JavaScript and AJAX
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved