Forum Moderators: phranque
It's big enough that when My Site Name (with or without quotes) is searched in Google, they come up now.. my site is nowhere to be found. I contacted Google but no response yet, the proxies host isn't responding either (waited 4 days so far).
So, I decided to try to at least protect my own sites, here's the info:
Their url looks like http://www.example.com/o.php?logid=http%3A%2F%2Fmysite.com%2F and they do leave a trackable IP in my server logs.
So with that I tried blocking by IP:
<Limit GET HEAD POST>
deny from ###.###.###.##
allow from all
</LIMIT>
The usual code I have for removing query strings has no effect at all:
RewriteCond %{THE_REQUEST} [?]
RewriteRule ^(.*)$ http://example.com/$1? [R=301,L]
Here's my very beginner attempt at combining rules, it gave a 500 error (ouch):
RewriteCond %{REMOTE_ADDR} ###\.###\.###\.## [?]
RewriteRule ^(.*)$ http://example.com/$1? [R=301,L]
And a simpler attempt to redirect them away:
RewriteCond %{REMOTE_ADDR} ###\.###\.###\.##
RewriteRule (.*) http://example.com/ [R=301,L]
That left my URL in the address bar and still inside the proxy (links altered, ads etc.) but showed the contents of example.com. Headers still returning a code 200 response (not the 301 I'd expected).
I've been searching all day for a way to escape this thing but I've gone as far as I can with almost no knowledge of htaccess (I am trying) and without knowing what else to search for. So again I have to ask for help :S
I'd fix that problem first. Make sure your custom error document is defined with a local URL-path, and not as a canonical URL.
ErrorDocument 403 http://www.example.com
Also, defining Deny from x/Allow from y is dangerous is you don't specify an Order (see mod_access).
Jim
[edited by: jdMorgan at 9:26 pm (utc) on Aug. 23, 2006]
I tried the code you just wrote, same thing is happening.
- Their url stays in the address bar
- The 403 page does show up, but a link I added to test is altered to stay within their proxy
- Their ads are still there.
- Still returns a 200 as well (all other 403's are returning fine)
(I've cleared all private data and restarted the browser between tests, so that isn't it either.)
It seems nothing I try can escape this thing... serving up error pages, feeding them their own url to chew on.. no luck.
I've tried these variations seperatly as well:
RewriteCond %{REMOTE_ADDR} ###\.###\.###\.##
RewriteRule (.*) http://example.com/ [R=301,L]
#
RewriteCond %{REMOTE_ADDR} ###\.###\.###\.##
RewriteRule (.*) http://example.com/$1? [R=301,L]
#
RewriteCond %{REQUEST_URI} ^logid$ [NC,OR]
RewriteRule (.*) /$1? [F,L]
Same results.
Base hrefs do nothing as well. When caught in the proxy they are rewritten as:
<base href="http://scraper.tld/o.php?logid=http%3A%2F%2Fmysite.com%2F">
[edited by: LunaC at 11:39 pm (utc) on Aug. 23, 2006]
<form method="get" action="http://www.scraper.tld/o.php">
<input type="hidden" name="logid" value="http://mysite.com/cgi-bin/search.cgi">
the original is:
<form method="get" action="http://mysite.com/cgi-bin/search.cgi">
Contact forms are altered the same as well.
What I'm most worried about is this spreading and personal information being stolen. They are proxying wikipedia and the ODP (and altering all links etc. to keep everything in the proxy).
I'm far from the only one this could be affecting :S
Still no real response from their host other than that I'm in the queue to be looked after.
Either way, the answer is to serve 403s, and not worry about the address bar. Once you are successfully blocking them, then you can possibly cloak your site so that they get useless pages from your site. If you serve them pages with no links at all, they can't very well modify them or impersonate your real site.
If they are actually grabbing and keeping copies of your pages, DMCA them. Otherwise, things get a bit more tricky, but you can still report this to search engines through their webmaster contact addresses (if you can find them).
Jim
That all sounds like they're trying to get hold of usernames and passwords so they can hack into other sites.
As for cloaking, I'll look into that tomorrow as well. That's an area I've completely avoided... scares me almost as much as this :S
I've already contacted Google, but they are known for canned (if any) responses.
I'm out for tonight, thanks for your help and have a good night.
All you need to do is serve them blank or alternate pages, so they will no longer rank for your keywords, and so that "your" visitors don't get scammed on their site.
That's your first priority, and it's really the only major concern here.
I wouldn't woory too much about the cloaking aspect of it. The cloaked pages are only to be served to the proxy site, and you are making no attempt to fool search engines or your visitors.
If you think cloaking is always seen as bad, you should check out major sites like CNN, The Washington Post, etc. They all serve alternate content to search engines for different reasons, but make no attempt to deceive anyone. And that is OK. It is cloaking with intent to deceive search engines or visitors that is frowned upon by search engines.
Jim