Forum Moderators: Robert Charlton & goodroi
I have a site with (currently) more then 6000 backlinks, a year or what ago it had a pr7 (for a month or so), but went down to pr6.
I has a picture gallery, a forum and a blog.
I didn't pay much attention to the site, and it build up a lot of spam comments (in the gallery). I also had DP Co-op links on it...
After my brother (his pic is in the gallery) kindly reported the spam I removed those comments and looked at rankings and pr of the site. I also removed the DP Co-op.
I then saw that the visible pr on (only) some pages had gone to zero.
And.. the search term where I used to rank high for was taken by a proxy site, listing my content. Even if one searches for my family name, which is the domain name, the proxy site shows up.
This proxy site lists about 7000 pages from other sites in Google.
The proxy site doesn't seem to do any 302's at the moment.
I blocked the ip addresses belonging to the proxy site, filed spam reports, added a sitemap to my site and did a reinclusion request telling about removal of the porn spam and DP co-op.
Until now no luck.
The current pr update seems to zero my pr for most pages (but not all) on my site.
I wonder, if this proxy site is doing this on purpose, (looks like they have (had) some form of 302 in effect at a certain point)
Or maybe there is a nasty "competitor" using a hole in both the proxy and Google (other sites from me are also listed in the proxy / Google, but with little effect)
I am lost....
[edited by: tedster at 3:23 am (utc) on Sep. 30, 2006]
In my case what made it very odd is that search history results in the program generated by real user searches created a string that was indexed by Google. I have no idea how (many some members can pitch in here)
Effectively, the cache date, design and content of this website was my content.
I contacted Google, and did notice that a search for my content two weeks later showed 0 results in google, so I am assuming was done.
Regards,
Todd
The proxy has effectivly taken my place in the serps.
Another strange thing is that I used one of the sites that checks PR over various datacenters, and this PR-DC site also returns an url with its reported PR, I assume a reply from the datacenter.
For all my sites, the url I enter in the PR-Checker is also returned, but for the particular site that has its content listed in the proxy the PR checker returns the page "caught" by the proxy...
I asked the webmaster how the PR checker works, but had no reply yet...
regards,
Bert Vierstra
[edited by: Vienix at 4:03 am (utc) on Sep. 30, 2006]
It has worked for me on many proxy sites. You need to do this before the listing goes supplemental or you will have a duplicate page indexed in Google for a year or so.
Otherwise there is nothing that can be done if the server is using a URL based application. Something that can look like this:
pico/cache.php?domain=
Google cannot protect you from some of these sites unless you get lucky and they respond to a request. This is something they will hopefully put a fix to, but seeing as how we have waited since Googles inception for fixes like 302 redirects and canonical issues... You can figure the rest out for youself, we call this Legacy Code.
Personally I hate proxy's for this hijack reason. I am curious to see if the proxy passes a 302 redirect or not.
Another possibiliy is to use the URL removal tool, but with 7000 pages this would be painful. I have found that it will remove individual pages if they return 403 Forbidden.
[edited by: tedster at 3:04 pm (utc) on Sep. 30, 2006]
Your banning the Proxy, may actually cause you to loose traffic from your search results in this case, because the site would get your content even though it is on their server. Cutting it off also cuts you off. Its a difficult decision.
You need to contact Google I would try and you may get lucky, but you have no choice now. If that fails, the only way to recover is by blocking them and re-submitting your site.
You are definitely 302 hijacked! As I tested this and it passed a 302 redirect.
This is one of those unfortunate cases where you have been hurt by a 302. Time may resolve this if the URLs time out and cause the Invalid.php page, but the Google cache still has this page, so you need to try to get in touch with them.
It may also have been intentionally done if someone submitted the URL to Google via a text link or their add url submission form. I do not see any way for Google to find it from their home page. So how did it get there?
NOTE: I have also had luck submitting sites with issues to Google's Add URL and have been able to add comments, that are actually read.
Use PHP to output robots.txt depending upon the requesting IP - use your normal robots.txt for everyone apart from the offending proxy - use a disallow everything robots.txt when the file is requested from the proxy.
You can then use the Google removal tool without problem or risk and remove all those proxy pages.
Its a matter of time (short I hope) and Google will ban the proxy domain.
The domain lists a lot of "stolen" content, most of it listed as supplemental (like the Adbrite page), but some of their "stolen" stuff, like my site, not...
This is considered by some a Black Hat approach to delisting a URL. First replace the site (duplicate it and over power it) get it supplemental, then dump the listings and point it to an error rather than a 404 using another redirect.
It does not redirect to the home page it goes to Invalid.jsp, Google may respider the URL and then see Invalid URL. But here's the kicker. It gets to the Invalid URL from the indexed URL that is now passing a 302 to the Invalid.jsp page.
The 302 first says Found, so Google thinks the URL is still there, you are already punished and may no longer be considered the original owner of the content, its hard to get out of supplemental hell.
So the circle continues via the 302 directive that Google has never resolved most likely due to Legacy Code in the programming, or so people still have something to talk about on Webmasterworld.
:)
Your best bet is to simply contact Google, it may clear up on its own, but with the session IDs timing out and causing additional redirects, you may be hurt even more, if it disregards your URL and does not properly reassess or find your URL you could be hurt by never returning.
IMHO, its better safe then sorry. If your site is clean, just send in an email. I think you will be justified by them with this. Its definitely a 302 hijack, and seeing as how when we go to the proxy site there is no way to find any proxied searches, it may be an intentional move to try and hurt you since you rank #1 for your terms.
Good Luck, please keep us informed.
I'm the kproxy aministrator.
There isn't any intention on hurt you or your sites. I have no idea that you have this problem until someone has written to me this weekend. If you have problems, you can write to me in the kproxy forum or email me, support[AT]kproxy.com
All direct kproxy requests to your pages will redirect to invalid.jsp. I have made this change some weeks ago. People have to go to KProxy main page to surf. I had some problems with some sites because I had KProxy opened to any request.
That means that google now will never find your sites with kproxy requests.
Sorry if you had some problems.
Best regards.
Or maybe there is a nasty "competitor" using a hole in both the proxy and Google (other sites from me are also listed in the proxy / Google, but with little effect)
They are pretty arrogant about it. Try searching your site name and see who is using your site or company name in the title of there advert.
Also they "solved" my problem with the spammers.
While the spam sites showed up when you did a :related, they also have dissappeared there....
I think they sort of did a "reset" for my site.... Hope I don't get sandboxed because of that :)
This looks like a straight duplicate content "problem",
One that shouldn't cause as much trouble as it appears to.
The base code is availible online and looks like a simple pass through operation with url subsitution. It is late for me to be parsing java and maybe I missed something.
I'm cruising for examples of current duplicate content issues. Now I'll butt out of your thread.
[edited by: theBear at 3:26 am (utc) on Oct. 2, 2006]
First you have to identify how they are accessing your site by using the proxy to access your site and then locating those accesses in your raw server logs.
Then a variation of:
RewriteCond %{REMOTE_HOST} ^WW.XX.YY.ZZ
RewriteRule . [G,L]
the WW.XX.YY.ZZ is the IP address that the proxy used to access your site from(note this could be a range of addresses)
would do the trick.
My favorite would be to reflect the proxies home page back at them using the RewriteRule.
[edited by: theBear at 12:01 am (utc) on Oct. 3, 2006]