Forum Moderators: phranque

Message Too Old, No Replies

Policy Breach Notice: Block Email in GET Request

         

dolcevita

9:15 pm on Feb 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hello guys,

This is really urgent because i have about 2 weeks to fix this kind of issue.
So let's start from begin.

10 days a go i;ve received policy breach notice from Google about passing PII
information through my website. There were a couple examples where everywhere as
URL sample is: ?email=redacted@example.com and
Record sample: GET /pagead/ads...............%3Femail%3Dredacted@example.com&.......................HTTP/1.1


So what i did was block email request via http request through htaccess:
RewriteCond %{THE_REQUEST} \?(.*)@(.*)?\ HTTP [NC]
RewriteRule .? http://www.mydomain/forbidden.php? [R=302,L]


Code above works as well and any request through http which contain @ is blocked.
But it seems not to be enough for blocking GET request because today i have received another email that my issue is still not fixed.

To be honest i thought that this rule which works:
RewriteCond %{THE_REQUEST} \?(.*)@(.*)?\ HTTP [NC]
RewriteRule .? http://www.mydomaind/forbidden.php? [R=302,L]


will be enough but not....

I guess that there should be probably something with RewriteCond %{THE_REQUEST} ^GET that block passing of email through GET request but i have not any idea how to block email in GET request.

Please help. It is really realy urgent.

Thank you.

dolcevita

9:59 pm on Feb 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Searching thought the web i've found this:
RewriteCond %{THE_REQUEST} ^(GET|POST) /.*?(s|search)=(.+) HTTP/ [NC]
RewriteRule .* /search/%3/? [R=302,L,NE]


as probably solution. Can i implement it as:

RewriteCond %{THE_REQUEST} ^(GET) \?(.*)@(.*)?\ HTTP [NC]
RewriteRule .? http://www.mydomain/forbidden.php? [R=302,L]



where any GET request contain email will be blocked?

Is this rule above right to block GET request for email?

Thank you

lucy24

11:34 pm on Feb 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: cough-cough ::

RewriteCond %{THE_REQUEST} @

and that's all. Unless you've got URLs whose "path" component contains literal @ symbols (legal but wildly unlikely), you don't need to get any fancier. Alternatively,
RewriteCond %{QUERY_STRING} @

Either way there is no need for [NC] since @ is a non-alphabetic character. You don't need to say anything about request methods-- and in any case those are always CAPITALIZED.

:: detour to test site to confirm that @ does not form part of legitimate POST request to a Contact page, so you would never see it anywhere ::

But you should try to constrain rules to situations where they would actually apply. For example, I doubt you could inject anything nasty by appending an email address to a jpg request. So instead of expressing the pattern in the rule as .? where the server has to evaluate the condition on every single request, ever, try something like
(^|/|html)$
or even
^[^.]*$
depending on your URL structure.

A more serious question is: why on earth are you responding with a 302 redirect to some other page? The request doesn't deserve anything more than a flat 403.

phranque

11:48 pm on Feb 11, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



why not use the [F] flag instead of a temporary redirect?

are you still linking to urls with PII?

dolcevita

11:48 pm on Feb 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



lucy,

thank you for reply but can you please post full rule.

Thanks

btw

Do you understand how is possible that after this:

RewriteCond %{THE_REQUEST} \?(.*)@(.*)?\ HTTP [NC]
RewriteRule .? http://www.mydomain/forbidden.php? [R=302,L]


i stil got message that there is email in GET reqeust which broke PII:

URL sample is: ?email=redacted@example.com and
Record sample: GET /pagead/ads...............%3Femail%3Dredacted@example.com&.......................HTTP/1.1


I really do not understand.


What is then difference between:

RewriteCond %{THE_REQUEST} ^GET\ ?(.*)@(.*)?\ HTTP [NC]
RewriteRule .? http://www.mydomain/not-allowed.php? [R=302,L]


and

RewriteCond %{THE_REQUEST} \?(.*)@(.*)?\ HTTP [NC]
RewriteRule .? http://www.mydomain/not-allowed.php? [R=302,L]


btw

302 is just to tell some of regular user what kind of issue they have. Anyway 302 page is nofollow, noarchive, noindex and blocked through robots.txt

[edited by: dolcevita at 12:33 am (utc) on Feb 12, 2015]

dolcevita

11:50 pm on Feb 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



phranque,

There is not any linking to PII url. They come in their testing after that url.
But i have already blocked it in my opinion with:

RewriteCond %{THE_REQUEST} \?(.*)@(.*)?\ HTTP [NC]
RewriteRule .? http://www.mydomain/forbidden.php? [R=302,L]


After that they still show me that there is email in GET request. And i really do not understand.

How is possible that via http is not possible to get any url with email and with Get REQUEST they still got it.

dolcevita

12:07 am on Feb 12, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My hoster told me about:

RewriteCond %{THE_REQUEST} ^GET\ ?(.*)@(.*)?\ HTTP [NC]
RewriteRule .? http://www.mydomain/stop.php? [R=302,L]


Tested with many instances and verified that it is working regarding blocking of any email in request.

i have not implemented this above but:

RewriteCond %{THE_REQUEST} \?(.*)@(.*)?\ HTTP [NC]
RewriteRule .? http://www.mydomain/not-allowed.php? [R=302,L]


Lucy say that there is not any difference ( 'You don't need to say anything about request methods').


But how then Google still got email in GET request?

lucy24

4:40 am on Feb 12, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But i have already blocked it

No. You haven't. You have issued a temporary redirect to "forbidden.php" presumably your custom 403 page. A human will probably not know the difference, because they'll see the page with their eyeballs. But to a robot-- including search engines-- there is a huge difference.

The one thing you have got right is the ? at the end of the rule target. That gets rid of the query string in any redirect.

:: uneasily expecting the worst ::

Does your htaccess have any ErrorDocument directives?

But how then Google still got email in GET request?

I don't understand exactly what you are saying here. What did google get-- and why were they even trying? I've never seen the googlebot asking for anything unkosher. Silly, yes. Trayf, no.

I would make the rule something like this (replacing 'html' with whatever you really use for your page URLs):
RewriteCond %{THE_REQUEST}@
RewriteRule (^|/|html)$ - [F]

Group it together with any other RewriteRules that lead to the [F] flag.

To prevent infinite loops, make sure you also have this rule if you haven't already got it, again replacing "not-allowed.php" with the actual name of your 403 page:
RewriteRule not-allowed\.php - [L]

It goes before any rules with [F] flag.

dolcevita

6:14 am on Feb 12, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




But i have already blocked it


No. You haven't. You have issued a temporary redirect to "forbidden.php" presumably your custom 403 page. A human will probably not know the difference, because they'll see the page with their eyeballs. But to a robot-- including search engines-- there is a huge difference.

The one thing you have got right is the ? at the end of the rule target. That gets rid of the query string in any redirect.

:: uneasily expecting the worst ::

Does your htaccess have any ErrorDocument directives?


You have right i have temporary redirect (302) to "forbidden.php" and that's page without any Google code including Adsense. I guess that Robot and human both will see redirection and then page "forbidden.php" without any Google service code

Regarding ErrorDocument directive I have this

ErrorDocument 403 default



But how then Google still got email in GET request?

I don't understand exactly what you are saying here. What did google get-- and why were they even trying? I've never seen the googlebot asking for anything unkosher. Silly, yes. Trayf, no.


What i'm try to saying here is that they (Google policy breach from Adsense) send me email with url sample and Record sample and that i do not understand how it that possible because any request with sample redacted@example.com is was already redirected to 302 page without any code there. So they can not reach and GET what they say in Record sammple

Url sample: http://www.mydomain/whatever.php?whatever=redacted@example.com

Record sample: GET /pagead/ads?client=ca-pub-*****myid******2&output=html&h=280&slotname=3277054996&adk=3441799868&w=336&lmt=1423334957&flash=16.0.0&url=http%3A%2F%2Fwww.mydomain%2whatever.php%3Fwhatever%3Dredacted@example.com
&dt=1423334957786&bpp=4&bdt=126&shv=r20150203&cbv=r20141212&saldr=sa&prev_slotnames=3335930592&correlator=5132476212332&frm=20&ga_vid=*&ga_sid=*&ga_hid=*&ga_fc=0&u_tz=-300&u_his=11&u_java=0&u_h=768&
u_w=1366&u_ah=728&u_aw=1366&u_cd=24&u_nplug=6&u_nmime=14&dff=trebuchet%20ms&dfs=13&adx=272&ady=207&biw=1349&bih=655&eid=317150304&oid=3
&ref=http%3A%2F%2Fwww.mydomain.org%2Fwhatever.php%3Fip%3Dwhateverever.net.&rx=0&eae=0&fc=24&brdim=%2C%2C-8%2C-8%2C1366%2C0%2C1382%&vis=1&rsz=0%7C0%7C%7C&abl=CS&ppjl=f&fu=0&bc=1&ifi=2&xpc=tzTzMuLIXX&p=http%3A//www.mydomain.org&dtd=234 HTTP/1.1



I would make the rule something like this (replacing 'html' with whatever you really use for your page URLs):
RewriteCond %{THE_REQUEST}@
RewriteRule (^|/|html)$ - [F]
Group it together with any other RewriteRules that lead to the [F] flag.

To prevent infinite loops, make sure you also have this rule if you haven't already got it, again replacing "not-allowed.php" with the actual name of your 403 page:
RewriteRule not-allowed\.php - [L]
It goes before any rules with [F] flag.


Thus making my not-allowed.php page and then

RewriteRule not-allowed\.php - [L] 

RewriteCond %{THE_REQUEST}@
RewriteRule (^|/|php)$ - [F]


Thank you.

Still i'm afraid that i will again get email as in example above from Google showing me url sample and Get Record and that is what i do not understand. How if it is not possible via http request to get it. My rules were not good but they worked and redirected to right page preventing anyone to get example from their examples.

btw

I got internal server error with code
RewriteCond %{THE_REQUEST}@
RewriteRule (^|/|php)$ - [F]


If i do this then it works
RewriteCond %{THE_REQUEST}@ [NC]
RewriteRule (^|/|php)$ - [F]

dolcevita

11:02 am on Feb 12, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



lucy24,

I have finally implemented this rules:


RewriteCond %{THE_REQUEST} ^.*(\\r|\\n|%0A|%0D|@|%3D|%40).*
RewriteRule .* - [F]


Is it right?

I thought it will be better to make more similar things there blocked such as %40 what is almost same as @ and %3D.

I have placed just standard errordocument
ErrorDocument 403 default

Tested all possible symbols from the rule above and everything works. If i get now again email from their team telling me that they still see

Url sample: http://www.mydomain/whatever.php?whatever=redacted@example.com

Record sample: GET /pagead/ads?client=ca-pub-*****myid******2&output=html&h=280&slotname=3277054996&adk=3441799868&w=336&lmt=1423334957&flash=16.0.0&url=http%3A%2F%2Fwww.mydomain%2whatever.php%3Fwhatever%3Dredacted@example.com
&dt=1423334957786&bpp=4&bdt=126&shv=r20150203&cbv=r20141212&saldr=sa&prev_slotnames=3335930592&correlator=5132476212332&frm=20&ga_vid=*&ga_sid=*&ga_hid=*&ga_fc=0&u_tz=-300&u_his=11&u_java=0&u_h=768&
u_w=1366&u_ah=728&u_aw=1366&u_cd=24&u_nplug=6&u_nmime=14&dff=trebuchet%20ms&dfs=13&adx=272&ady=207&biw=1349&bih=655&eid=317150304&oid=3
&ref=http%3A%2F%2Fwww.mydomain.org%2Fwhatever.php%3Fip%3Dwhateverever.net.&rx=0&eae=0&fc=24&brdim=%2C%2C-8%2C-8%2C1366%2C0%2C1382%&vis=1&rsz=0%7C0%7C%7C&abl=CS&ppjl=f&fu=0&bc=1&ifi=2&xpc=tzTzMuLIXX&p=http%3A//www.mydomain.org&dtd=234 HTTP/1.1



then i really do not have idea what to do.

Because it is simple not possible to request with GET method anything which contain @ or %40 or %3D. In their exmaple is email in url passing PII.

THanks

lucy24

5:28 pm on Feb 12, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have right

Memo to self: Non-native speaker*; use short sentences.

RewriteCond %{THE_REQUEST} ^.*(\\r|\\n|%0A|%0D|@|%3D|%40).*

Well, that should cover it :)

Note however that you don't need the initial
^.*
part-- or, for that matter, the final
.*
The condition only needs to say "if the request contains any of these characters anywhere".

Are robots really putting nasty things like
%0A|%0D|%3D|%40
in their query strings? Ugh. But, wait, %3D? That's just an escaped = equals sign. You might legitimately get those if there's a second-order query, where the query string contains something like an URL which would then have to be escaped. (Analytics is the likeliest place.)

If you're much plagued by robots asking for bad things, you might express the Request part as
(\\r|\\n|%[01]|@|%3D|%40)
because everything from 00 through 1F is a control character and doesn't belong in a request. I understand \\r and \\n though the \\r may be redundant: How often do you meet a malign robot who assumes you're on a Mac?

This is the rare case where the [NC] flag is appropriate: it will cover \\R \\N and %3d.

Incidentally, some sites might benefit from applying a similar exclusion to the user agent, since it's the mark of an inept robot.

I once got a request that really did have \r\n (or maybe just \n) at the end. Hahaha. It was a new robot.


* Best guess: hai ragione. My fingers initially wanted to type "hai torto", which is all too often applicable.