Forum Moderators: not2easy
I contacted the owner who told me that the site was a proxy server, that did not store any data from other websites. The owner said they would block my site from users from now on.
My material is still accessible at their url (complete with my ads).
I have a couple of questions. First, if the proxy site doesn't store data, how come (a) I found my own material on their site through Google, and (b) why was it all still there when I visited their site? A Google "allinurl" search showed a ton of material from all around the web on their site.
Second, is this something I should worry about? On the one hand, they had my adsense showing. On the other hand, this is not a content sharing arrangement that I was told about, nor did I agree to it.
Is/was this a storm in a teacup? Did I do the right thing asking them to take my material down?
A proxy server is like a pipe -- The user connects to the proxy server, the proxy server connects to your site, and then 'pipes' your content back through to the user. That's why the owner said he didn't store any of your content on his server.
Proxies can be bad or good, and there's quite a bit of grey between those black and white distinctions.
An example of a 'good' proxy is that you can use a proxy in a foreign country to check localized search results for that country, by-passing the search engines' feature that detects your location and localizes results for your country; By using a proxy, you appear to be a user in the proxy's country, not your own.
Another 'good' example is that of a user in a country which restricts access to global information sources, Myanmar being a current example. By using a proxy, the user can possibly hide his/her own identity, and bypass the security wall around the country.
Proxies can also be used for 'bad' -- For example, a site-scraper 'bot crawling your site through one or more proxies to hide its identity and activity from you, the Webmaster. Or a proxy with a script added on top of the pure 'data piping' function, so that the proxy connects to your site to get your content as described above, but substitutes its own ads for yours.
However, in all but the last example --good and bad-- the function of the proxy hasn't changed at all -- A pure proxy server just acts as an intermediary between the user and your site -- as implied by the definition of the word "proxy."
Jim
Have you reviewed this recent thread?
That's why the owner said he didn't store any of your content on his server.
So yes, it "appears" that your site is on the proxy server, but it's not -- The proxy simply requests stuff from your site and shows it to the client (browser or SE robot) accessing the proxy.
Some of these proxies may temporarily cache the content they fetch but if they get high traffic, they won't be able to cache it for long and you should see the proxy server accesses to your site in your raw server access logs. Using that logged information, you can then block the proxy, --again by IP address, hostname, or proxy-specific HTTP request headers-- using mod_rewrite, ISAPI rewrite, or even a common PHP header script included on your pages.
Jim
Although the descriptions above of a proxy server are accurate, this doesn't sound like one.
Proxy servers don't normally make your content appear to be on another site. As stated above, they are a "conduit".
Traditionally, proxy servers are used by configuring your web browser (in it's setup pages) to connect to the proxy.
The browser bar will show YOUR site, not the proxy site.
There are "web 2.0" type proxies now as well, essentially a "browser within a browser". The proxy server shows it's own URL bar. Again, though, it should show your site name, only in the "browser within a browser".
Proxy sites don't normally show-up in Google or other search engines, either.
Can you example-ize the URLs on this "proxy" site where your content is found and post here?
The point being, the content is available only when it's requested, so it isn't stored in any traditional sense. When you check to see if it's available (or Google do) then it is, because they requested it...
Can you example-ize the URLs on this "proxy" site where your content is found and post here?
example.com is a proxy server, including a translation service.
further Google search showed that every page from my site was cached in the Google index as being from example.com. e.g.,
www.example.com/folder/www.mysite.com/page1.html
www.example.com/folder/www.mysite.com/page2.html
etc.
This wasn't a disaster, since my own site is also indexed, so in almost all cases, those copied pages of my content were supplemental.
I looked into the site a little further with the "allinurl" operator in a Google search.
allinurl showed that Google has cached hundreds of thousands of pages owned by sites all over the internet from www.example.com.
Similar results for yahoo siteexplorer.
The search engines have indexed a massive amount of content from many, many sites as coming from www.example.com.
http://www.example.com/index.php?q=aGR0cEovL427b38uY28ueWs%3D
which is the exact copy of my site, all of the links are changed so they go through similar urls to get exact copies of other pages on my site.
I have no problem with people using proxies but in this case the proxy is indexed in Google and by default it turns Javascript off so my bandwidth is being used up but none of my javascript is working, ie analytics is not recording the traffic, adsense ads are not being displayed etc.
I have blocked the IP of this proxy for the time being but do wonder if there is a better solution.