Forum Moderators: coopster

Message Too Old, No Replies

chunk split mangling "•" character into •

         

whoisgregg

9:20 pm on Nov 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<?php
echo chunk_split('•'); // outputs •
?>

I'm assuming this is because it's not multibyte safe, but I can't find an mb_chunk_split function anywhere.

Can anyone confirm this is a multibyte issue and/or point me towards a mb_chunk_split function? Thanks!

whoisgregg

10:05 pm on Nov 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I believe I have a working mb_chunk_split function. I would really appreciate if someone could look over it to see if I've missed anything. :)

if(!function_exists("mb_chunk_split")){
function mb_chunk_split($str, $chunklen = 76, $end = "\r\n", $encoding = null){
if(!function_exists("mb_strlen")) return false;
if($encoding == null) $encoding = mb_internal_encoding();
$mb_length = mb_strlen($str, $encoding);
$output = '';
for ($i = 0; $i < $mb_length; $i += $chunklen) {
$output .= mb_substr($str, $i, $chunklen, $encoding) . $end;
}
return $output;
}
}

coopster

12:58 pm on Nov 5, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



You may want to check this first ...

if($encoding == null) $encoding = mb_internal_encoding();

Although it looks like PHP currently has an internal default it will set, that could always change in future releases. I would check the value returned by that function (mb_internal_encoding) first and if for some odd reason it returned a "false" value, set a default. Maybe something like this ...

if(!function_exists("mb_strlen")) return false; 
if (!mb_internal_encoding()) {
mb_internal_encoding("UTF-8");
}
if($encoding == null) $encoding = mb_internal_encoding();

Also, are you validating the encoding? You could check the value passed against an array (PHP5) ... mb_list_encodings().

coopster

1:04 pm on Nov 5, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Also, did you consider using mb_split [php.net]?

whoisgregg

4:56 pm on Nov 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good call with the mb_internal_encoding check. Added. :)

I can't see with mb_split how to pass a length for the chunks... It looks like it would be a good multibyte substitute for explode() and split() and could be used as part of a multibyte safe wordwrap(), but I don't think it helps for chunk_split.

coopster

3:04 pm on Nov 7, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I can't see with mb_split how to pass a length for the chunks

The optional 3rd parameter ... no?

whoisgregg

4:15 pm on Nov 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The 3rd parameter limits how many breaks will occur... So, given the following example:

<?php
$string = 'Split this •sentence• by spaces';
$mb_split = mb_split("[ ]+", $string, 3);
print_r($mb_split);
/* Output:
Array
(
[0] => Split
[1] => this
[2] => •sentence• by spaces
)
*/
?>

Whereas I want output like this:

<?php
$string = 'Split this •string• into equal chunks';
$mb_chunk_split = mb_chunk_split($string, 6);
echo $mb_chunk_split;
/* Output:
Split
this •
string
• into
equal
chunk
s
*/
?>

coopster

3:43 pm on Nov 15, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Understood. I read the third parameter as limiting on bytes/characters rather than "words" ...

limit
If optional parameter limit is specified, it will be split in limit elements as maximum.

My emphasis was added to the term "elements".

Outside of that the function looks pretty sound. Let us know if you run into any issues with it. BTW, did you make mention of your function and put in a feature request with the PHP developers? Something to consider.

henry0

6:43 pm on Nov 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



whoisgregg, very nice, the masked scavenger will arise :)

Out of curiosity; if I understand you set the cut at 76, is it a clean cut or possibly a mid str/word split? Or I did not read you loud and clear!

Couldn’t you mingle in it something (I know it's heavy, but I am happy with it) like that:
<<<<
function strtrim($str, $maxlen=125, $elli=NULL, $maxoverflow=15) {
global $CONF;

if (strlen($str) > $maxlen) {

if ($CONF["BODY_TRIM_METHOD_STRLEN"]) {
return substr($str, 0, $maxlen);
}

$output = NULL;
$body = explode(" ", $str);
$body_count = count($body);

$i=0;

do {
$output .= $body[$i]." ";
$thisLen = strlen($output);
$cycle = ($thisLen < $maxlen && $i < $body_count-1 && ($thisLen+strlen($body[$i+1])) < $maxlen+$maxoverflow?true:false);
$i++;
} while ($cycle);
return $output.$elli;
}
else return $str;
}

>>>>

vincevincevince

4:05 am on Nov 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your functions look good - but they are rather long. I suggest you use something like this:

function mb_chunk_split($str, $chunklen = 76, $end = "\r\n"){
preg_match_all("/.{1,".$chunklen."}/ismu",$str,$m);
return implode($m[0],$end);
}