Issue w/ grabbing content from another page w/ AJAX + Jquery - JavaScript and AJAX forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Issue w/ grabbing content from another page w/ AJAX + Jquery

Directory structure gets in the way when grabbing links

dcaspian

10:26 pm on Dec 15, 2009 (gmt 0)

10+ Year Member

Hi there,

I'm using the following method to grab content from another page (in my case, to integrate CMS's)

From the source page (here called sourcepage.html) I have something like:

<ul>
<li><a href="link1.html">link1</a></li>
<li><a href="link2.html">link2</a></li>
</ul>

and so forth...

Then on the destination page I use the following code to pull that list of links and drop it into the page:

<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<title>Ajax and Jquery: Remote Pages</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js" type="text/javascript" charset="utf-8"></script>
<script type="text/javascript" charset="utf-8">
$(document).ready(function(){
$('#list').load("sourcepage.html ul"[smilestopper]);
});
</script>
</head>
<body>
<h2>Here is the list from the other page:</h2>
<div id="list"></div>
</body>
</html>

ie from the source page, the AJAX pulls the ul, then drops it into a div called "list"

The problem I am having is that if the destination page is in a sub-directory, when I pull a link from the source page, such as "domain/link1.html"
on the destination page within the sub directory it is written as
"domain/sub_directory/link1.html"

How do I make it so that it pulls the content properly without attempting to put it into the same level directory as the destination page?

I am sorry if this message didn't make sense; I am new to AJAX and javascript and possibly used some incorrect terminology.

I can link to the article where I found this message if that is permitted.

Best Regards,
Daniel

whoisgregg

8:48 pm on Dec 16, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I'd recommend simply changing your links to be absolute links... In other words, these are bad:

link1.html
link2.html

These are better:

/link1.html
/sub_directory/link2.html

Some folks always include the domain even for internal links (for when scrapers grab their content):

http://www.example.com/link1.html
http://www.example.com/sub_directory/link2.html

That way, there's never any confusion about where a link points. Once you get in that habit, you'll find many linking problems just go away. :)

dcaspian

11:29 pm on Dec 18, 2009 (gmt 0)

10+ Year Member

Thanks for the response.

That's definitely the best way of doing it.

The problem is that I'm trying to grab links that have been generated by a proprietary CMS - so I unfortunately can't force it to publish complete links that include the domain.

Any ideas?

Thanks again,
Daniel

Fotiman

3:07 am on Dec 19, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

How about setting the <base href="">? It will apply to all links on the page, however, which may not be desirable. Alternatively, you could use jQuery to fetch those links after you've loaded them, and then get the href values and replace them with the correct path?

jdMorgan

4:52 am on Dec 19, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You need to parse those links instead of just grabbing them.

It the link is "http(s)://example.com/whatever" just use it.
If the link is "/whatever", add the domain name and use the result.
If the link is "./whatever, remove the period, add the domain name, and use the result.
If the link is "whatever", take the URL of the page from which you are grabbing this link, remove everything starting from the end of that URL back to the last slash in that URL, add the link ("whatever") after that last slash, and use the result.
If the link is "../whatever, then take the URL of the page from which you are grabbing this link, remove everything starting at the end of the URL back to the second-to-last slash, add the link value to that, and use the result.
If the link is "../../whatever, then take the URL of the page from which you are grabbing this link, remove everything starting at the end of the URL to the third-to-last slash, add the link value to that, and use the result.
... I trust you can see the pattern here. This describes how canonical, server-relative, and page-relative links are resolved by browsers, and there are only two additional loops involved in coding it -- the slash counter, and the "../" counter.

Jim

Fotiman

4:07 pm on Dec 19, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

jdMorgan, sounds good in theory... however, when you try to get the href property from a link, it will return the calculated full URL, and not just the text that is in the href property. So for example:
<a href="page1.htm"> - href = "currentprotocol://currentbasepage1.htm"

<a href="page1.htm">
href = http://www.example.com/path/to/current/page/page1.htm
<a href="/page2.htm">
href = http://www.example.com/page2.htm
<a href="../../page3.htm">
href = http://www.example.com/path/to/page3.htm

jdMorgan

4:24 pm on Dec 19, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

In that case, the "href property getter" function you're using is broken. Report the API problem for future users' benefit, and in the meantime you'll need to grab the raw link data and parse/resolve the linked URLs yourself.

The method I outlined above describes the way in which *all* HTTP user-agents must resolve links; There's no wiggle-room to it at all. So if the function you're calling doesn't work that way, then it's broken (assuming that it is the correct function to be using for this application).

Jim

Fotiman

3:01 pm on Dec 20, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I'm referring to DOM methods... so I guess you're saying the DOM is broken?

In other words, if I do this:


<html>
<head>
<title>Test</title>
</head>
<body>
<a href="page1.htm">Page 1</a>
<a href="/page2.htm">Page 2</a>
<a href="../page3.htm">Page 3</a><script type="text/javascript">
var aList = document.getElementsByTagName("a");
for (var i = 0; i < aList.length; i++) {
  alert(aList[i].href);
}
</script>
</body>
</html>

It will show me the computed value of the href.