Forum Moderators: coopster

Message Too Old, No Replies

Are .php files "scrape-proof" by default?

I can't find a definitive answer anywhere...

         

ronin

5:09 pm on Aug 19, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is probably going to sound like a very dumb question to anyone who knows a thing or two about PHP, but I really want cast-iron clarification and, despite searching the web numerous times, have yet to find it.

Here goes:

I have a some data saved as a .php file:

mydata.php


<?php

$title = 'This is a page title';
$paragraph = 'This is a paragraph that will go somewhere below the page title, probably right beneath it.';

$dataSetPrimary = array('Data-Item-10','Data-Item-11','Data-Item-12');
$dataSetSecondary = array('Data-Item-20','Data-Item-21','Data-Item-22');

function shuffleDataSet ($dataSet) {
shuffle ($dataSet);
array_map('strtolower',$dataSet)
return $dataSet;
}

?>


Now... I know that if I upload any .html / .css / .js / .gif / .jpg file etc. to my website's webserver, anyone who knows the address can use a browser to access that file.

And I know that if I try to use a browser to access

http://www.mysite.com/mydata.php


then I'll just see a blank white screen.

But... is it nevertheless possible for someone to use a browser, a headless browser, a crawler, any other type of UA to access and read mydata.php?

Or is mydata.php forever invisible to everyone in the world who doesn't have FTP access to my server?

penders

6:25 pm on Aug 19, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



... is mydata.php forever invisible to everyone


Yes, providing your server / PHP engine is working correctly then someone sat at their browser and accessing mydata.php over HTTP will not see your PHP code. The browser only sees the response that your PHP might output, not the PHP itself.

Note that I mention that "if the server is working correctly", which is why it is a good idea to store critical data, such as database connection information (username, password, etc.) outside of the web root altogether, so even if there was a minor catastrophe, your data is still hidden.

not2easy

7:31 pm on Aug 19, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I would be concerned about other means that an individual file could be exposed. Something anyone with a browser can do is save files.

ronin

8:14 pm on Aug 19, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Something anyone with a browser can do is save files.


Ah. That doesn't sound so good.

But... given what penders wrote, they can only save the output of the .php file (above) - which in this case is a blank, white page - right?

Currently, when the browser is pointed at:

http://www.mysite.com/mydata.php


it doesn't appear to fetch and retrieve the .php file in any meaningful way, it just displays nothing - which is the .php file's output.

So does that mean (I need to clarify this) that when the browser is pointed at an .html file or a .css file, it does actually retrieve and display the file, (which can then be copied, cut, pasted, saved etc.)...

...but when the browser is pointed at a .php file, it never retrieves it. It only retrieves and displays the text output of the .php file but (confusingly) still calls it mydata.php (in the browser address bar), when it is, in fact, more accurately labelled as something like mydata.php.output?

penders

9:38 pm on Aug 19, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Something anyone with a browser can do is save files.


If a user is able to download the raw PHP file then the server is misconfigured/broken (as mentioned above). Or there is some other vulnerability with a script on your server.

...it doesn't appear to fetch and retrieve the .php file in any meaningful way


When you request a PHP or HTML/CSS/JS file, the "same thing" happens. The file is requested, retrieved and the server builds a response which is returned to the browser. The exception with a PHP file is that the PHP interpreter on the server first processes all the <?php ... ?> blocks in the file before returning the response.

[edited by: penders at 9:47 pm (utc) on Aug 19, 2014]

not2easy

9:44 pm on Aug 19, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'm using FireFox and if I go to the menu and click save file as... I get some options. You might want to try viewing the page where the output is showing and try that, choosing web page complete option.

Years ago in IE I could do the same thing and it did save every file needed to create the page. I can't try your page, but you can. You can also view any link to the page in question and "Save Link as.." then view the location where you saved the file and see what's there. There may be differences between platforms and browsers, I don't go around trying to save websites to see what I get so I'm no expert about this, but I've seen it give me all the files needed to recreate the html page, even when images were in a separate folder. That's why I suggest testing if your file contains sensitive information.

A php file is server preprocessed, so you can't view its contents if you navigate to the .php page. If you view source code on the page which includes the .php file, you can only view the output.

phranque

5:59 am on Aug 20, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



assuming it's apache, there's an AddHandler directive in your configuration that specifies how .php files are "handled".

the php script handler (in server filespace) prevents the raw text file from being served.

lucy24

5:40 pm on Aug 20, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Option C (or possibly option W, because I've lost count) is a line I actually use in some htaccess files:

RewriteCond %{THE_REQUEST} \.php
RewriteCond %{REQUEST_URI} !(name1|name2|dir3/name3)
RewriteRule \.php - [F,NS]

Anyone who asks for a php file by name is barred at the gate unless it's one of the rare pages whose actual URL ends in .php. The elements [NS] and {THE_REQUEST} may seem redundant; it's so the server doesn't have to stop and evaluate conditions every time it processes an SSI (currently once or twice on all pages).

As a bonus, this slams the door on any passing robot who asks for nonexistent wp-admin and similar files, even if I've never heard of them before and they're using a humanoid UA.

aspdaddy

3:54 pm on Aug 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Gosh that sounds complicated :)

Yes - PHP is a server side tech so your files are safe from being downloaded over http, only the processed output (HTML, JQuery etc) is available

londrum

4:57 pm on Aug 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



if you really want to be safe then you should prevent the errors from appearing to. if the script fails for some reason, then it can put an explanatory error up on the screen, maybe with some of the variable names and script paths, which you don't want the user to see

(put a deliberate error into the script... just to see what the output shows)

creeking

6:49 pm on Aug 21, 2014 (gmt 0)

10+ Year Member



(not a coder)

let's say I stuff some text onto a page by using a "php include".

can anyone or anyBOT tell the difference between that, and a page with standard/static text?

lucy24

8:00 pm on Aug 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



can anyone or anyBOT tell the difference between that, and a page with standard/static text?

Quick answer: No.

Longer answer: Not unless (a) you've done something seriously wrong or (b) it's a clever robot (or a snoopy human) who can recognize php-generated code on sight because it's all run-in and minified and simply looks different from any hand-rolled html on the same page.

And that's why my php includes explicit
echo "\n"

and the like, even though there's absolutely no functional reason for it ;)

If a bunch of pages all have exactly the same footer, one might reasonably conclude that it's php-- or, at least, server-side something. But then again, it may just be an old-fashioned coder.

penders

9:49 pm on Aug 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



can anyone or anyBOT tell the difference between that, and a page with standard/static text?


There's not really any difference in the body of the document, except for what lucy24 mentions. However, by default,
expose_php
is On in php.ini and this sets the "X-Powered-By" HTTP response header with the value of the PHP version, like: "X-Powered-By: PHP/5.3.26". So, unless this is Off, or you manually override this header, then it is easy to see that the page at least came through the PHP engine.

creeking

12:53 am on Aug 22, 2014 (gmt 0)

10+ Year Member



thanks for the responses.

I was not thinking of hiding the fact that it is a php page. (pagenames.php)

I just want it to NOT be obvious to anyBOT that a paragraph of text (with a link or two) is an advert that I change regularly.

And that's why my php includes explicit
echo "\n"


newline? what is the usefulness of that?

also, thanks for the tips about similar locations and formatting.

penders

1:34 am on Aug 22, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



newline? what is the usefulness of that?


To make it "look like" (to the casual source-code reader) it has been written by hand and is perhaps static html, not generated from script. Although some do like to pretty print their script generated HTML anyway - simply because it "looks nicer". If you are generating your HTML from script it is generally easier to omit the newlines. (In fact I have seen code, in these very forums [webmasterworld.com], that goes over the top trying to pretty print the output - this is very silly IMO, primarily because it unnecessarily complicates the source code, and this DOES matter.)