Forum Moderators: coopster
$contents = <<<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href='/pb/resources/xsl/rss.xsl'?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:wp="http://www.washingtonpost.com/wp-namespace" version="2.0">
<channel>
<title>Politics</title>
<atom:link href="http://www.washingtonpost.com/pb/politics/?resType=rss" rel="self" type="application/rss+xml"/>
<link>http://www.washingtonpost.com/pb/politics/</link>
<description>Post Politics from The Washington Post is the source for political news headlines, in-depth politics coverage and political opinion, plus breaking news on the Obama administration and White House, Congress, the Supreme Court, elections and more.</description>
<language>en-US</language>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<item>
<title>Bannon wants a war on Washington. Now he’s part of one inside the White House.</title>
<link>https://www.washingtonpost.com/politics/bannon-wants-a-war-on-washington-now-hes-part-of-one-inside-the-white-house/2017/04/06/ec4a135a-1ada-11e7-9887-1a5314b56a08_story.html</link>
<dc:creator>Ashley Parker</dc:creator>
<description>The escalating fight pits the self-described nationalist against Trump’s son -in-law, Jared Kushner. <media:content medium="image" type="image/jpeg" url="https://img.washingtonpost.com/rf/image_90w/2010-2019/WashingtonPost/2017/03/31/National-Politics/Images/Botsford170331Trump13470.JPG" width="90"/> <media:content medium="image" type="image/jpeg" url="https://img.washingtonpost.com/rf/image_606w/2010-2019/WashingtonPost/2017/03/31/National-Politics/Images/Botsford170331Trump13470.JPG" width="606"/> <media:content medium="image" type="image/jpeg" url="https://img.washingtonpost.com/rf/image_1024w/2010-2019/WashingtonPost/2017/03/31/National-Politics/Images/Botsford170331Trump13470.JPG" width="1024"/> </description>
<media:thumbnail url="https://img.washingtonpost.com/rf/image_606w/2010-2019/WashingtonPost/2017/03/31/National-Politics/Images/Botsford170331Trump13470.JPG" width="606"/>
<media:group>
<media:content medium="image" type="image/jpeg" url="https://img.washingtonpost.com/rf/image_90w/2010-2019/WashingtonPost/2017/03/31/National-Politics/Images/Botsford170331Trump13470.JPG" width="90"/>
<media:content medium="image" type="image/jpeg" url="https://img.washingtonpost.com/rf/image_606w/2010-2019/WashingtonPost/2017/03/31/National-Politics/Images/Botsford170331Trump13470.JPG" width="606"/>
<media:content medium="image" type="image/jpeg" url="https://img.washingtonpost.com/rf/image_1024w/2010-2019/WashingtonPost/2017/03/31/National-Politics/Images/Botsford170331Trump13470.JPG" width="1024"/>
</media:group>
<guid>https://www.washingtonpost.com/politics/bannon-wants-a-war-on-washington-now-hes-part-of-one-inside-the-white-house/2017/04/06/ec4a135a-1ada-11e7-9887-1a5314b56a08_story.html</guid>
<wp:uuid>ec4a135a-1ada-11e7-9887-1a5314b56a08</wp:uuid>
</item>
<item>
<title>At Mar-a-Lago, Trump welcomes Chinas Xi in first summit</title>
<link>https://www.washingtonpost.com/politics/at-mar-a-lago-trump-to-welcome-chinas-xi-for-high-stakes-inaugural-summit/2017/04/06/0235cdd0-1ac2-11e7-bcc2-7d1a0973e7b2_story.html</link>
<dc:creator><![CDATA[David Nakamura]]></dc:creator>
<description><![CDATA[The meeting at the presidents winter estate will be dominated by talks on North Korea, trade, officials said.]]></description>
<media:thumbnail url="https://img.washingtonpost.com/rf/image_606w/2010-2019/WashingtonPost/2017/04/06/National-Politics/Images/Trump_US_China_15990-3ee27.jpg" width="606"/>
<media:group>
<media:content medium="image" type="image/jpeg" url="https://img.washingtonpost.com/rf/image_90w/2010-2019/WashingtonPost/2017/04/06/National-Politics/Images/Trump_US_China_15990-3ee27.jpg" width="90"/>
<media:content medium="image" type="image/jpeg" url="https://img.washingtonpost.com/rf/image_606w/2010-2019/WashingtonPost/2017/04/06/National-Politics/Images/Trump_US_China_15990-3ee27.jpg" width="606"/>
<media:content medium="image" type="image/jpeg" url="https://img.washingtonpost.com/rf/image_1024w/2010-2019/WashingtonPost/2017/04/06/National-Politics/Images/Trump_US_China_15990-3ee27.jpg" width="1024"/>
</media:group>
<guid><![CDATA[https://www.washingtonpost.com/politics/at-mar-a-lago-trump-to-welcome-chinas-xi-for-high-stakes-inaugural-summit/2017/04/06/0235cdd0-1ac2-11e7-bcc2-7d1a0973e7b2_story.html]]></guid>
<wp:uuid><![CDATA[0235cdd0-1ac2-11e7-bcc2-7d1a0973e7b2]]></wp:uuid>
</item>
</channel>
</rss>
EOF;
echo $contents;
// Yah, so far so good
$rss = xml2array($contents);
print_r($rss);
// prints a blank array?
// XML -> array
function xml2array($contents) {
if (!$contents) return array();
$xml = simplexml_load_string($contents);
$json = json_encode($xml);
$xml_array = json_decode($json, TRUE);
return $xml_array;
} $contents = preg_replace('#<!\[CDATA\[(.+?)\]\]>#s', '$1', $contents);
$contents = htmlspecialchars($contents, ENT_IGNORE, 'UTF-8');
$rss = xml2array($contents); $contents = preg_replace('#<!\[CDATA\[(.+?)\]\]>#s', '$1', $contents);
$contents = htmlspecialchars($contents, ENT_IGNORE, 'UTF-8');
$contents = str_replace('<', '<', $contents);
$contents = str_replace('>', '>', $contents);
//// Is this faster or better than two str_replace() commands?
// $contents = str_replace(array('<', '>'), array('<', '>'), $contents);
$rss = xml2array($contents); <item>
<title>Trump confronts the contradictions of his foreign policy rhetoric - Washington Post</title>
<link>http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNHVD0Mmv0ZKMl4--9psCuv2sK1ASg&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779449477367&ei=KDboWMDAKYzM3gHZrYHoCw&url=https://www.washingtonpost.com/politics/trump-confronts-the-contradictions-of-his-foreign-policy-rhetoric/2017/04/07/c1a32dfe-1bc4-11e7-855e-4824bbb5d748_story.html</link>
<guid isPermaLink="false">tag:news.google.com,2005:cluster=52779449477367</guid>
<category>Top Stories</category>
<pubDate>Sat, 08 Apr 2017 00:09:14 GMT</pubDate>
<description><table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNHVD0Mmv0ZKMl4--9psCuv2sK1ASg&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779449477367&amp;ei=KDboWMDAKYzM3gHZrYHoCw&amp;url=https://www.washingtonpost.com/politics/trump-confronts-the-contradictions-of-his-foreign-policy-rhetoric/2017/04/07/c1a32dfe-1bc4-11e7-855e-4824bbb5d748_story.html"><img src="//t0.gstatic.com/images?q=tbn:ANd9GcSE2FX3L6llZ5Skqf03IYv4UMvkvL1WUCJahyDywUJAZLtetigyLwQs5wynLNd0GPicAhfEcgwp" alt="" border="1" width="80" height="80"><br><font size="-2">Washington Post</font></a></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br><div style="padding-top:0.8em;"><img alt="" height="1" width="1"></div><div class="lh"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNHVD0Mmv0ZKMl4--9psCuv2sK1ASg&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779449477367&amp;ei=KDboWMDAKYzM3gHZrYHoCw&amp;url=https://www.washingtonpost.com/politics/trump-confronts-the-contradictions-of-his-foreign-policy-rhetoric/2017/04/07/c1a32dfe-1bc4-11e7-855e-4824bbb5d748_story.html"><b>Trump confronts the contradictions of his foreign policy rhetoric</b></a><br><font size="-1"><b><font color="#6f6f6f">Washington Post</font></b></font><br><font size="-1">President Trump found himself in unfamiliar territory Friday, generally praised by members of the political and foreign policy establishments but attacked from some quarters of Trump nation for seeming to betray the “America First” pledges that carried <b>...</b></font><br><font size="-1"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNEn2tZ9HTnHOtA4m2-mF3m2iIzK3A&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779449477367&amp;ei=KDboWMDAKYzM3gHZrYHoCw&amp;url=https://www.nytimes.com/2017/04/07/us/politics/syria-bombing-republicans-trump.html">GOP Lawmakers, Once Skeptical of Obama Plan to Hit Syria, Back Trump</a><font size="-1" color="#6f6f6f"><nobr>New York Times</nobr></font></font><br><font size="-1"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNHHWWBs7v-JtysRr6W89XdMiXvxgw&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779449477367&amp;ei=KDboWMDAKYzM3gHZrYHoCw&amp;url=http://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN1782S0?il%3D0">Russia warns of serious consequences from US strike in Syria</a><font size="-1" color="#6f6f6f"><nobr>Reuters</nobr></font></font><br><font size="-1"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNGrX8kv5-VxMR2cgPlX-H9sM-0W6w&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779449477367&amp;ei=KDboWMDAKYzM3gHZrYHoCw&amp;url=http://www.politico.com/story/2017/04/trump-syria-strikes-debate-237025">Inside Trump&#39;s three days of debate on Syria</a><font size="-1" color="#6f6f6f"><nobr>Politico</nobr></font></font><br><font size="-1" class="p"><a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNHirqwm4oRjERX3T6sqIIM9nclc3g&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779449477367&amp;ei=KDboWMDAKYzM3gHZrYHoCw&amp;url=http://www.foxnews.com/us/2017/04/07/ghastly-mages-syrian-attack-led-to-trump-about-face.html"><nobr>Fox News</nobr></a>&nbsp;-<a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNF6l61PY93HX1-LuWqL7sP0n2eIuw&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779449477367&amp;ei=KDboWMDAKYzM3gHZrYHoCw&amp;url=http://www.huffingtonpost.com/entry/syria-assad-supporters_us_58e7ff42e4b05413bfe316df"><nobr>Huffington Post</nobr></a>&nbsp;-<a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNEJuRv6-lxPgwcU1Ip0x99SB3Xy9w&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779449477367&amp;ei=KDboWMDAKYzM3gHZrYHoCw&amp;url=http://www.nydailynews.com/news/national/assad-u-s-missile-strike-fuel-syria-civil-war-article-1.3029925"><nobr>New York Daily News</nobr></a>&nbsp;-<a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNFWRWxiqiySOcf4fg0M2bB2LtB81A&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52779449477367&amp;ei=KDboWMDAKYzM3gHZrYHoCw&amp;url=http://www.latimes.com/politics/la-fg-pol-syria-analysis-20170407-story.html"><nobr>Los Angeles Times</nobr></a></font><br><font class="p" size="-1"><a class="p" href="http://news.google.com/news/more?ncl=d1izKv1bgpL3uTM_0aja7DTaWKFHM&amp;authuser=0&amp;ned=us&amp;topic=h"><nobr><b>all 10,241 news articles&nbsp;&raquo;</b></nobr></a></font></div></font></td></tr></table></description>
</item> <description>&lt;table border=&quot;0&quot; cellpadding=&quot;2&quot; cellspacing=&quot;7&quot; style=&quot;vertical-align:top;&quot;&gt;&lt;tr&gt;&lt;td width=&quot;80&quot; align=&quot;center&quot; valign=&quot;top&quot;&gt;&lt;font style=&quot;font-size:85%;... $contents = preg_replace('#<!\[CDATA\[(.+?)\]\]>#s', '$1', $contents);
$contents = htmlspecialchars_decode($contents);
$contents = htmlspecialchars($contents, ENT_IGNORE, 'UTF-8');
$contents = str_replace('<', '<', $contents);
$contents = str_replace('>', '>', $contents); I invite you to strongly consider upgrading to PHP 5.6 or better yet PHP 7.0. Any version earlier than 5.6 is being degraded with no security support.
he’s (instead of he's), so for some reason it's coming from the source weird.
a feed from Fox News included a link with .html#&_whatever and the & broke the function, returning blank. I changed it to & and it worked fine... but what's weird is the exact same feed had a separate & in it that didn't cause a problem!
By separate do you mean free-standing? Thats correct html behavior: if the & is immediately followed by other stuff, its interpeted as an entity; if its sitting by itself, it remains a literal ampersand.
<?xml version="1.0" encoding="iso-8859-1" ?> <?xml version="1.0" encoding="UTF-8"?> $contents = preg_replace('#<!\[CDATA\[(.+?)\]\]>#s', '$1', $contents);
//// When I update to PHP 5.6.x
//$contents = htmlspecialchars($contents, ENT_SUBSTITUTE, 'UTF-8');
// For now
$contents = htmlspecialchars($contents, ENT_IGNORE, 'UTF-8');
$contents = str_replace('<', '<', $contents);
$contents = str_replace('>', '>', $contents);
$rss = xml2array($contents);
// Later in the script, when I'm reading the results of the XML feed
if ($rss['channel']['item']) {
foreach ($rss['channel']['item'] as $key) {
$key['title'] = htmlspecialchars_decode($key['title']);
$key['link'] = htmlspecialchars_decode($key['link']);
$key['description'] = htmlspecialchars_decode($key['description']);
// do whatever
}
} Google doesn't specify an encoding.
My site is UTF-8 encoded, but oddly, Washington Post's feed includes "weird" characters like “deceptive”.