Forum Moderators: coopster

Message Too Old, No Replies

how to safely display unsafe userinput as normal text?

         

Dunjohn19

9:37 am on Sep 6, 2020 (gmt 0)

5+ Year Member



I am trying to clean user input for safe display but as unencoded text. htmlspecialchars produces a safe, albeit, ugly output. I was trying to prevent script from executing while creating a prettier output. I tried html entity commands like the following:

html_entity_decode($variable, ENT_QUOTES, 'UTF-8');
htmlentities(html_entity_decode($variable, ENT_QUOTES, 'UTF-8'), ENT_QUOTES, 'UTF-8');


The above code works. The above code allows me to display a script alert as code but it does not execute the code. Is this the correct way to accomplish this task?

I am asking because it seems that decoding should convert encoded characters to normal characters, thus allowing execution of code(?). However, this isn't happening on my xampp installation. The code is displayed as text and not executed. It is quite nice but i worry that it isn't the correct method of safely displaying user input.

example: Google search accepts script code searches and displays the code as text not ugly htmlspecialchars output, so is the above method how Google is doing it? I'd really like to display any code unencrypted without executing it.

Thank you and Best wishes.

Dunjohn19

9:51 am on Sep 6, 2020 (gmt 0)

5+ Year Member



Hello again,

I was just thinking about this problem on a cigarette break. I think i have a solution if the above code is not a good method:

use an array with explode because individual characters cannot be executed/have no meaning. right? so less than is just less than without the whole line. Then foreach array value echo. Thus, an entire line of code can be displayed but not executed.

Is this right?

edit example code:

$d = 'phpinfo();';
$e = str_split($d); unset($d);
print_r($e); echo '<br>';
foreach ($e as $char) {
echo $char;
}


the above code works. phpinfo() is not executed and ugly htmlspecialchars is avoided.

Best wishes.

robzilla

10:37 am on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



use an array with explode because individual characters cannot be executed/have no meaning. right? so less than is just less than without the whole line.

It doesn't matter how you prepare the text in PHP, it's the output that's embedded into the HTML page that's going to be interpreted by the browser. PHP does not execute HTML code, or vice versa. Whether you do something complicated (and unnecessary) like the above to output <script>alert('Hello')</script> or you simply echo the whole line makes no difference because the HTML code produced is identical.

I'm not sure I understand what the problem is with htmlspecialchars(). It converts characters that have special meaning in HTML, such as < and >, into character entities like &lt; and &gt;, so that the browser shows <script> to the user instead of interpreting it as a part of the HTML page (and whatever's between the <script></script> tags as javascript).

Entities like &lt; being "ugly" seems irrelevant considering that your users will not see them?

Google also uses these HTML entities to show you code snippets in the search results:

The &lt;script&gt; <b>tag is used to embed a client-side script</b>...

...shown in the browser as...

The <script> tag is used to embed a client-side script...

[edited by: robzilla at 10:43 am (utc) on Sep 6, 2020]

robzilla

10:40 am on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



phpinfo() is not executed and ugly htmlspecialchars is avoided.

You're confusing HTML and PHP. phpinfo() is a PHP function that cannot be executed by the browser. The server interprets the PHP code, sends the HTML output to the browser, which in turn interprets the HTML to display the web page.

The only way the user inputted string "phpinfo()" could be executed by the server as PHP is if you explicitly (and dangerously) run it through the eval() function [php.net] (note the cautionary message).

Dunjohn19

10:58 am on Sep 6, 2020 (gmt 0)

5+ Year Member



the problem with htmlspecialchars: i use it to encode input before inserting that input into the database. Thus, the output is entity encoded: &lt;script&gt; so even if the text is unsafe code, i'd like to reverse the encoding without it being executed.

the array str_split is to handle code within the confines of php but being certain that phpinfo() is not executed. I'm not sure if a string will be executed in certain usages or not. In other words, does php execute php commands in a string comparison or not? i want to be certain that no code is executed.

robzilla

11:33 am on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



does php execute php commands in a string comparison or not?

No. A string is a variable and variables are never executed as code unless you feed them to eval().

the problem with htmlspecialchars: i use it to encode input before inserting that input into the database. Thus, the output is entity encoded: &lt;script&gt; so even if the text is unsafe code, i'd like to reverse the encoding without it being executed.

You don't necessarily need to encode the input before storing it in the database, you just need to make sure the string doesn't break the SQL query (using mysqli_real_escape_string(), for example). An encoded string takes up more space, after all. Anyway, the output produced by htmlspecialchars() can be decoded using htmlspecialchars_decode(). Of course, a decoded string containing HTML elements will be interpreted as such by the browser if you output them to the HTML page.

Dunjohn19

11:41 am on Sep 6, 2020 (gmt 0)

5+ Year Member



Hello robzilla,

I am not a programmer or i'm a novice at best. I am certain that there are hackers much smarter than me, so i dare not challenge them or help them via stupidity. I have read that input should be validated and output be escaped but i am not even sure what a hacker can accomplish with simple text unescaped and inserted into a db. So i started encoding the data into and out of the db. Thus, my text or string fields are encoded twice, which is quite ugly.


actually, i use htmlspecialchars on all text inserted into my db and again htmlspecialchars on all data returned from the db before i use it.
i don't even trust data coming from my db. maybe i didn't put it there and it isn't escaped.

//imagine $a is inserted into my database
//i also use htmlspecialchars to encode results from a select query as in $aa
$a = '&lt;script&gt;alert("see?")&lt;/script&gt;';
$aa = htmlspecialchars($a); //thus, $aa is the $result column data ($a) from a query
//now the code is html escaped twice, which is quite ugly. I need it cleaned and displayed without execution.
$char = trim(html_entity_decode($aa, ENT_QUOTES, 'UTF-8'));
$char = html_entity_decode($char, ENT_QUOTES, 'UTF-8');
$char = htmlentities(html_entity_decode($char, ENT_QUOTES, 'UTF-8'), ENT_QUOTES, 'UTF-8');
echo $char;
echo '<br><br>';

the above code works in every browser that i've tested: ie11, edge 44, edge chromium, firefox 50-80, chrome etc
the script is displayed but not executed. is this method safe and correct?


so you suggest that i do not sncode the data for insertion into a db? just use htmlspecialchars when i output the data?

Best wishes.

robzilla

12:32 pm on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The code you posted produces the following output:
&lt;script&gt;alert(&quot;see?&quot;)&lt;/script&gt;

Which is practically the same as the original string stored in $a, so your code doesn't really do anything helpful. You're just encoding and decoding the same string over and over. It's easier to make mistakes when your code is unnecessarily complicated, so there's some risk involved in doing it this way.

Given an input string of <script>alert("see?")</script>, if we wish to store that as-is in the database, we'll want to escape the string first, because in this case the quotation marks can have special meaning in SQL (you risk SQL injection [en.wikipedia.org] if you don't escape certain characters). So you run the string through a function like mysqli_real_escape_string(), or the PDO equivalent if that's what you're using, before you insert it into the database. This will temporarily produce a string like <script>alert(\"see?\")</script>, which tells MySQL to not treat those quotation marks as SQL. The backslashes won't be stored in the database field, so what you'll actually store is the original input string.

When you pull that string from the database to show on a web page, you'll first have to feed the string through htmlspecialchars() so that <script>alert("see?")</script> is converted to &lt;script&gt;alert("see?")&lt;/script&gt;. This way, the browser won't recognize the script elements and won't execute its contents as javascript.

In summary, we take the following input string:
<script>alert("see?")</script>

...escape it to avoid SQL injection (don't do this manually, use a specialized function like mysqli_real_escape_string)...
<script>alert(\"see?\")</script>

...which is stored in the database as...
<script>alert("see?")</script>

...so when we output it we run it through htmlspecialchars() to produce...
&lt;script&gt;alert("see?")&lt;/script&gt;

...which the user (not the browser) will see as...
<script>alert("see?")</script>


Whereas in your code, the string goes through this process:
<script>alert("see?")</script>

&lt;script&gt;alert("see?")&lt;/script&gt;

&amp;lt;script&amp;gt;alert(&quot;see?&quot;)&amp;lt;/script&amp;gt;

&lt;script&gt;alert("see?")&lt;/script&gt;

<script>alert("see?")</script>

&lt;script&gt;alert(&quot;see?&quot;)&lt;/script&gt;

...ultimately also seen by the user (not browser) as...
<script>alert("see?")</script>


The end result is the same, the code just doesn't make much sense :-)

I'd recommend learning a bit more about PHP, SQL and HTML before putting your code live. It should only take about an hour or so to learn the basic security fundamentals involved in processing user input safely.