|Proxy Websites & Google|
I recently came across a Proxy Website that has copied every webpage in my site, 100k+ pages. On this site they state that this service is to get around content filters @ work, schools etc.
All the pages they have copied are word for word, image for image, with one exception. All of the links have been changed to their own link structure. Its like looking at a mirror of my website.
My Questions is, Are the search engines catching onto this and banning them? And what kindda of damage is being done to our own websites with dup content, etc?
I have tried contacting this one proxy website in perticular, But from the looks their whois information is bogus. Is Google gonna protect us from this? or we going go down in flames? It's not just this one proxy website, these type of sites seem to be springing up all of over the place.
Hmmm, no feedback on this one, guess people are just not concerned about websites copying their entire websites?
There has been tons of comment on WebmasterWorld about indexed open proxies and the like.
Is the site you are talking about indexed?
For sites hurt by duplicate content adding a proxy to the mix could hurt it even more.
In order for some proxy pages to even get spidered by Google someone has to alert Google that those pages exist.
Therefore if the proxy uses a "Recent Searches" script to see "recently proxied searche requests" then it could potentially hurt you as Google reads a 200 "Found" response with your sites content - duplicating it. If the proxy does not have recent search, then someone intentionally directed it to your site to help their efforts.
Its no secret that Google is overly aggressive with this filter, we need to consider that SE's are paranoid and feel Everyone is out to game them. So consider them a "Paranoid Grandma" and that the potential of a duplicate proxy page is a 90% change of hurting your website.
Check the DNS of the Site and ban the IP of the domain and the IP of the NameServers through htaccess or IIS administration. This should now cause a 403 Forbidden directive when you access the proxy sites copy. If the proxy is not using any kind of frame then use the Google removal tool to remove the site. If it uses a frame, you may want to add a link to the newly created page to try and get it spidered again without your content as Google will NOT remove the site if it uses a frame or iframe on the proxy results page.
Maybe Matt or GoogleGuy can clarify this better for us. But until they do, always assume that it can hurt you.
The best solution for dealing with proxy sites like this is to figure out the IP address they use to request pages from your site to proxy and then ban this IP. One trick I do sometimes is place a hidden comment in my pages and have PHP write the IP address of the requesting client into that hidden comment. This allows me to view the source of the proxied pages and quickly ad the offending IP address to my banned list. It works like a charm.
Yes, This has me very worried. This perticular proxy website isn't showing my website through a frame or anything like that. Imagine Google cache, but for every page. They have crawled everything & recreated it on their server. Except for 3 things
(1) They replaced all the links with their own link structure
(2) They have replaced all of my Google Adsense blocks
(3) They have included pop up & pop under ads, which my site is set dead against doing.
I did contact their ISP, But was told they would only do something about it if I had a laywer do a cease & desist letter. These letters can run $1000 dollars or more. I am a small time programmer with a very small budget due to the fact I offer a free service! Why is that when someone steals your hard work, The law only protects fortune 500 compaines & those with a nice wad of cash?
"They have crawled everything & recreated it on their server. Except for 3 things"
Maybe, maybe not.
You don't need a lawyer to send them a cease and desist letter, if the server is in the U.S. you can draft a DMCA (digital millennium copyright act) take down notice and the hosting provider has to take down the offending material or risk being culpable in any future legal actions (e.g. be a co-defendant in a lawsuit resulting in fines of around $150,000 plus lawyer fees). I have yet to find a U.S. based web hosting firm that didn't immediately jump when sent a signed DMCA take down notice. It really scares the doo doo out of them.
Still since most proxies don't really cache pages only dynamically request them from your server it is much easier to track down their IP address and block their access to your pages from your server. Blocking their IP address usually resolves the issue immediately.
[edited by: KenB at 8:47 pm (utc) on Nov. 28, 2006]
I like feeding them their own home page if they are actually indexed.
Those urls tend to get mighty long and sometimes crash the "proxy".
Not to mention sometimes making them show up as duplicate content spam to Google.