|Spam bot PHP question|
| 4:17 pm on Apr 16, 2004 (gmt 0)|
New to PHP question. Can spammers harvest email addresses from PHP files? (Formmail, etc.) I have just got my first real hosted domain and have finally got email at my domain name and am concerned about giving spam bots access to my email addresses.
I am also running PHPBB forum and am also concerned for my users although the data from that is in a MYSLQ DB. Can they get to that also?
| 4:33 pm on Apr 16, 2004 (gmt 0)|
A spider is simply a user agent just like a browser. Therefore, as long as your email addresses don't show up in the page source it is impossible to get them.
If you want to allow people to send you email from your website use web forms and send the form contents to a php file that emails this content (with the mail() function you can). In this way it is impossible for a spider to harvest your email addresses and people can still send you email.
If you really want your email address to be visible on the page I would place your address in a picture and have that picture link to an email form or something.
This has been a real issue for me since I received TONS of spam.
Edit: concerning PHPBB. PHPBB places email addresses as text in your member profile. It would be nice to have PHP's GD (graphical) library turn them into pictures. Perhaps there is a mod available at phpbb.com for this kind of function?
| 3:23 am on Apr 17, 2004 (gmt 0)|
So am I right in thinking that if you put the address of a php file into the web browser that what ever shows up in the browser (source code) is what a spider can see? Not the php code that makes up the file?
I am using a php mailer like the one you described which contains the email address and emails out a web forms output. I chose it for its spam protection benefits. I was just wanting to be sure that the mailer.php itself was spam resistant. I am also having trouble trying to make some of the multiple selection fields mandatory.
I do have a lot to learn!
| 4:03 am on Apr 17, 2004 (gmt 0)|
Yes. PHP files are not seen by the browser, bots or anything but the server under normal conditions.
The exception is if your server is not set up to parse PHP and then it shows the raw code.
Either way, what your browser sees is what is available to robots. Nothing more, nothing less (note that your browser sees more than you do, because it gets the http headers, but in terms of code, no the bots can't see it.
| 5:41 am on Apr 17, 2004 (gmt 0)|
I feel safer with my users and my email details now.
| 8:11 pm on Apr 17, 2004 (gmt 0)|
|Edit: concerning PHPBB. PHPBB places email addresses as text in your member profile. It would be nice to have PHP's GD (graphical) library turn them into pictures. Perhaps there is a mod available at phpbb.com for this kind of function? |
PHPBB has a option to change this so that only images that say email appear.
| 2:01 am on Apr 18, 2004 (gmt 0)|
You are right about the email being displayed as an image, the source code for the image contains:
<img src="templates/subSilver/images/lang_english/icon_email.gif" alt="Send e-mail" title="Send e-mail" border="0" /></a> <a href="http://www.mydomain.com.au" target="_userwww">.
This seems to be OK but in the link for private message beside it there is this code:
<img src="templates/subSilver/images/lang_english/icon_pm.gif" alt="Send private message" title="Send private message" border="0" /></a> <a href="mailto:email@example.com">
This seems to defeat the purpose of the picture links idea?
I would assume that the spam bots are smart enough to find this email address.
Maybe I should disable private messaging to protect against spam bots, although I don't think the posting html pages are access able from my public_html folder anyway. This would also mean that my forum is not being indexed by SE spiders which is another issue which I need to investigate.
| 8:37 pm on Apr 19, 2004 (gmt 0)|
Probably the best way to do it otherwise is use the "RIDDLER" provided at Dynamic Drive. It will turn your e-mail address and a subject line into code that spam harvesters have still yet to crack.
You can find the e-mail riddler at [dynamicdrive.com...]
I still say that a PHP E-mail form is the absolute best method of protection. Plenty of free ones about too.
| 3:35 am on Apr 20, 2004 (gmt 0)|
Thanks heaps Kami,
I have been using a similar java encoder here <snip> it seems good too. I'll check out your though, The php formmail I'm using is limited.I would like one that is more configerable with more control over compulsory fields and the ability to send automated replies of my choice. Probably a bit much to ask out of a free open source offering!
[edited by: jatar_k at 5:40 am (utc) on April 20, 2004]
[edit reason] removed url [/edit]
| 4:34 am on Apr 20, 2004 (gmt 0)|
Take a look at this PHP spider trap [webmasterworld.com] option. Catch them before they get to the email address, it's pretty easy to implement.
<edit> I left the bad link in case someone feels like figuring out why it isn't working.
[edited by: isitreal at 5:17 am (utc) on April 20, 2004]
| 4:53 am on Apr 20, 2004 (gmt 0)|
That link doesn't work for me, I think it goes via the subscribed members area
| 5:10 am on Apr 20, 2004 (gmt 0)|
That's weird, when I clicked that link it takes me to the subscribed members area page too, but it's not from that section, it's in php, last entry march 18. I reposted the link not using the [ url = http: and so on] and it works fine, seems like a bug or something in the linking system. Then I put it back into [ url =....] form and it works. Go figure. Anyway, to be on the safe side, here's the link in plain text:
The only thing I'd be careful about on that script is to make sure to upload the updated robots.txt a week before you implement the script to be on the safe side.
the .htaccess file only needs 606 rw--rw permissions to work.
This script works, by the way, I've been testing it. As usual Birdman did a slick job of programming.
[edited by: isitreal at 5:18 am (utc) on April 20, 2004]
| 5:17 am on Apr 20, 2004 (gmt 0)|
no worries found it with google. I posted there!
| 5:22 am on Apr 20, 2004 (gmt 0)|
In answer to your .htaccess question, it's easy to check if you have .htaccess support, just make a plain text file, type some random characters into it, upload it in ASCII mode, saved as .htaccess, if your site or folder crashes you have .htaccess enabled. Obviously if it's a big site you want to do this in a test folder first.
with the gibberish, when you try loading yoursite.com/test/index.htm the server should give a 500 error
| 5:40 am on Apr 20, 2004 (gmt 0)|
Ok I'll try it just for a laugh but my web host says that is isn't so does that mean that any htaccess file will not work. They are present in the a site search script that I've installed. They have this in them:
# NOTE: No one should be able to access this directory from the web.
AuthName "DGS Search"
| 5:49 am on Apr 20, 2004 (gmt 0)|
Ok just tested it and got the internal server error like you said so this means that .htaccess is configered and working right?
| 3:52 pm on Apr 20, 2004 (gmt 0)|
Yes, that means it's working.
add this to robots.txt, change 'path' and 'file' to the folder and filename for your php trap page.
This needs to be first on your .htaccess file, put the rest of the .htaccess contents below this line, what will happen is that the script will prepend the blocked ip addresses to the .htaccess file, while preserving everything that comes after that. Make sure to give write permissions to the 'other' group, in other words, permissions on the .htaccess file need to be 606 or better, that's rw--rw.
.htaccess file, above all current contents
SetEnvIf Request_URI "^(/site/403\.htm¦/robots\.txt)$" allowsome
deny from env=getout
allow from env=allowsome
php trap page, assuming the file is in your site root folder, otherwise replace $_SERVER["DOCUMENT_ROOT"] with the full server path to your .htaccess file. I changed birdman's version slightly to automatically put in the path to the primary .htaccess file at your site root. Link from all pages on your site using the path in the robots.txt file, use something like a trasparent gif, 1px, or a link with css property display:none; so only spiders will see it. Before adding these links make sure your robots.txt has been up for at least a few days, a week is better. Before adding link, test script by going to it, see if you get blocked with 403. First visit should give the text below, second visit the generic 403 error page. If you also set a 403 error page in the .htaccess file you can get even more precise blocked messages.
[ like: ErrorDocument 403 /site/403.htm ). The .htaccess file will allow access to only /site/403.htm at that point.
$filename = $_SERVER["DOCUMENT_ROOT"] . "/.htaccess";
$content = "SetEnvIf Remote_Addr ^" .
$handle = fopen($filename, 'r');
$content .= fread($handle,filesize($filename));
$handle = fopen($filename, 'w+');
// change youremail@yourdomain and firstname.lastname@example.org to your real
// address and real domain name, leave 'trap@' so you know it's from the spider trap
"The following ip just got banned because it accessed the spider trap.\r\n\r\n" .
$_SERVER["REMOTE_ADDR"] . "\r\n" . $_SERVER["HTTP_USER_AGENT"] . "\r\n" .
$_SERVER["HTTP_REFERER"] ,"FROM: email@example.com");
$page = '';
// note: some site downloaders will also trigger the script
$page .= "<h1>You have been permantly blocked from the site</h1>";
$page .= '<p>We don\'t allow site downloads or email spiders ' .
'of any kind, sorry. If you feel this is a mistakes, ' .
' please send us an email with your IP address and we\'ll ' .
'remove your IP address from the blocked list.</p>';
add email constructor [webmasterworld.com] here if desired (post 4 in thread)
| 1:51 am on Apr 21, 2004 (gmt 0)|
Thanks heaps for that full on tutorial.
I've added this in the 2nd line because I think you guys deserve credit for this masterpiece.
// Written by Birdman and Isitreal from webmaster world
A couple of final questions and comments.
1. Birdman mentioned this:
--- Replace the broken pipe(¦) with a solid one in .htacces snippet.---
Do we need to do this?
2. I think it would be wise to redirect to an error page with the same message as the 1st occurrence for the average user who might not be able to get back to the message and won't understand the server error in the subsequent visits. (in case the spider uses a non static IP and then a real user comes in on the same IP).
Could you help with the code for this?
4. Most average users won't know how to find their IP address to email to the webmaster so I added
--- If you feel this is a mistakes, ' .
' please send us an email with your IP address or date and time of occurrence and we\'ll ' .
'remove your IP address from the blocked list.---
so that it will be easier to figure this from the email headers(maybe not so important since most users will be on random IPs and they will have access on there next session, but worth a thought).
5. And last but not least does the htaccess ban blocked IPs from the entire site or just the home page?
| 1:52 am on Apr 21, 2004 (gmt 0)|
Wouldn't leave out the email constructor for the world!
| 4:13 am on Apr 21, 2004 (gmt 0)|
It's birdman's code, he deserves full credit for it, all I did was add a little tidbit to the text output.
|1. Birdman mentioned this: |
--- Replace the broken pipe(¦) with a solid one in .htacces snippet.---
Do we need to do this?
yes, absolutely, it's the 'or' operator, the unbroken pipe that is, I forgot to say that, thanks for catching that.
|2. I think it would be wise to redirect to an error page with the same message as the 1st occurrence for the average user who might not be able to get back to the message and won't understand the server error in the subsequent visits. (in case the spider uses a non static IP and then a real user comes in on the same IP). |
Could you help with the code for this?
Add this line under the .htaccess code given:
ErrorDocument 403 [yourdomain.com...]
It's important that it is in /site/403.htm because this is the only file that a user will be permitted to see, that's the 'allowsome' part of the .htaccess code, in other words, if the file uri contains either 'site/403.htm' or '/robots.txt' it's ok for apache to serve that file to that ip address, otherwise it's blocked.
Make a page called 403.htm. Put the same error message you got on the initial warning page.
You can add a form to that page if you were really worried about it they can use to email you that will automatically detect the ip address on submission and send that along with the email. Since it's a form, your email address would be invisible. However, the only people who will get caught are spammers and people trying to download your site with bad software, so you don't really have to worry that much about it.
I'll add the code for that form maybe tomorrow, your question helped me figure out a problem I'd been having, I'd like to offer that option too, especially for more friendly, non commercial sites I do where maybe somebody just tried downloading it.
|so that it will be easier to figure this from the email headers(maybe not so important since most users will be on random IPs and they will have access on there next session, but worth a thought). |
email headers won't necessarily tell you anything, only getting their ip through a script will, but again, this script catches only the guilty, almost always, except for very rare occasions.
|5. And last but not least does the htaccess ban blocked IPs from the entire site or just the home page? |
it blocks all accesses to the directory it is in and everything below that directory. If it's in your root directory, it blocks all access to your site.
If it's one level above your root directory, in a server folder, it blocks all access to all sites under it, such as in a shared web directory, also to any 403 pages unless you set the 403 path including the domain name, like [yoursite.com...]
Test this very carefully before trying it on commercial sites, or on virtual directory type things, since it could affect a lot of sites.