Forum Moderators: coopster

Message Too Old, No Replies

how to fetch Javascript content using PHP?

         

neoapproach

3:55 pm on Apr 15, 2005 (gmt 0)

10+ Year Member



$url_page = "http://www.somewebsite.com";
$user_agent = "Mozilla/4.0";
$string = curl_string($url_page,$user_agent);

now $string holds all the HTML/Javascript contents

how to i use PHP to extract the Javascript content?

for example Javascript:
var topic_MGID = new Array("1567564", "1568929", "1568951", "1561037", "1568755", "1534196", "1561747", "1567650", "1568941", "1568950", "1566588", "1549010", "1565487", "1567621", "1568940", "1566540", "1568899", "1568844", "1557462", "1568945", "1568745", "1567921", "1567565", "1568839", "1568057", "1568912", "1568382", "1568375", "1568760", "1568867", "1568626", "1511621", "1567480", "1567207", "1566951", "1568739", "1567096", "1560900", "1567283", "1568833");

how do i use PHP to extract those Javascript Array content
thanks

[edited by: jatar_k at 4:22 pm (utc) on April 15, 2005]
[edit reason] fixed sidescroll [/edit]

neoapproach

4:00 pm on Apr 15, 2005 (gmt 0)

10+ Year Member



maybe using regex?

to match anything in between:
'var topic_MGID = new Array(' and ');'

?

jatar_k

4:25 pm on Apr 15, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld neoapproach,

well, looking at that string you could split it on commas and the chop the front off. You could probably use a regex as well.

get everything between the ( and ) and then split it on comma and strip the double quotes after if you like, it really depends on what you want to do with it.

neoapproach

4:33 pm on Apr 15, 2005 (gmt 0)

10+ Year Member



the thing is
i have entire HTML on $string

not just this line
how do i trim it down to this line?

can you give me some examples?

sorry
i'm very n00b on php

iceman22

5:07 pm on Apr 15, 2005 (gmt 0)

10+ Year Member



I'm not exactly sure what the scenario is, but if you are trying to get the JavaScript out of an HTML page in the variable $string, you could extract the data in the <script> tags like this:

preg_match("'<script>(.*?)<\/script>'",$string,$java);

print_r($java);

That will display the match, $java[0] will be the match with the tags included, $java[1] will be the the match inside the <script> tags.

If you have more than one occurance of <script> tags, you could use preg_match_all instead of preg_match, it will make another level in the array so look at the print_r($java).

neoapproach

5:53 pm on Apr 15, 2005 (gmt 0)

10+ Year Member



this is my original string: "1568032", "1569051", "1557462", "1568479", "1569061", "1567564", "1569064", "1569062", "1568617", "1568929", "1569057", "1566951", "1539172", "1568836", "1540625", "1567650", "1568987", "1561747", "1569058", "1568970", "1568988", "1567565", "1568755", "1567480", "1557378", "1568239", "1569054", "1567163", "1568976", "1477551", "1569025", "1567262", "1568968", "1568057", "1532183", "1569011", "1511621", "1561037", "1569041", "1566903"

and i use split() function to split them

$array_MGID1 = split ("\".\"", $topic_MGID1);

i got this:
Array
( [0] => "1568032 => 1569051 [2] => 1557462 [3] => 1568479 [4] => 1569061 [5] => 1567564 [6] => 1569064 [7] => 1569062 [8] => 1568617 [9] => 1568929 [10] => 1569057 [11] => 1566951 [12] => 1539172 [13] => 1568836 [14] => 1540625 [15] => 1567650 [16] => 1568987 [17] => 1561747 [18] => 1569058 [19] => 1568970 [20] => 1568988 [21] => 1567565 [22] => 1568755 [23] => 1567480 [24] => 1557378 [25] => 1568239 [26] => 1569054 [27] => 1567163 [28] => 1568976 [29] => 1477551 [30] => 1569025 [31] => 1567262 [32] => 1568968 [33] => 1568057 [34] => 1532183 [35] => 1569011 [36] => 1511621 [37] => 1561037 [38] => 1569041 [39] => 1566903" )

you see on [0]... there's "1568032 and on [39]...there's 1566903"

how do i take those two quotation marks off as well?

split ("\".\"", $topic_MGID1); <-- that's how i split them

[1][edited by: jatar_k at 6:09 pm (utc) on April 15, 2005]
[edit reason] fixed sidescroll [/edit]

iceman22

6:15 pm on Apr 15, 2005 (gmt 0)

10+ Year Member



Without changing your current code much you could replace this line:

$array_MGID1 = split ("\".\"", $topic_MGID1);

with this:

$array_MGID1 = split ("\".\"", substr($topic_MGID1,1,-1));

What that does is ignores the first and last characters of the original string, in this case the quotes.

You could also use:
$array_MGID1 = explode("\",\"", substr($topic_MGID1,1,-1));

It produces the same result, it is quicker though.

neoapproach

7:26 pm on Apr 15, 2005 (gmt 0)

10+ Year Member



thx
i got one more question

i'm writing RSS.. so it has to be in XML form

but i'm parsing a Chinese Big5 web site
it has all those funny symbols which not compatible with XML

is there any way to take out all the non-compatible characters and replace it with something else?

so it will be 100% XML form?
thanks

neoapproach

1:41 am on Apr 16, 2005 (gmt 0)

10+ Year Member



&#9733; <-- like those symbol
how do i use regex to replace them? so it will be okay on XML form?

neoapproach

3:02 am on Apr 16, 2005 (gmt 0)

10+ Year Member



&#9733; <-- like this?
how do i make it XML compatible?

dmmh

7:52 am on Apr 16, 2005 (gmt 0)

10+ Year Member



in a hurry are we?

jatar_k

4:09 pm on Apr 16, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



hehe, well, I have no clue but try some searches [google.com]