|Blocked hijacking proxy IP - Now what?|
Losing over 2500 visits a day from block - need advice
I am in dire need of some advice.
About three weeks ago I noticed that one specific proxy site had hijacked over 25 of my sites more popular key-phrase placements on Google. I immediately began a mad search for ways to address the problem and have since read through loads of great info on this forum and others. Unfortunately for me, most of the advice I read dealt with scripting and .htaccess to address multiple proxy attacks. Since my situation deals with a single proxy site with one static IP address, my solution was to block that specific IP from my site. This has worked in the sense that the proxy has not been able to hijack any more of my SE placements. I am losing over 2,500 unique visits a day (not to mention loads of revenue), but I feel it was an action I had to take to stop the hijackings from spreading through my placements like a virus.
My problem is I donít know where to go from here and I was hoping someone could offer advice on the following questions:
1. Since the IP is blocked (essentially blocking the hijacked Google placements), and returning a 403 error, how will Googlebot react to this? e.g., will it just weed out the incorrect URLís/placementís over time?
2. Is there anyway to speed up this process? e.g., more frequent updates of the affected pages.
3. Could I be penalized by Google for the incorrect placements and/or for blocking the IP addresses and returning a 403?
4. Does anyone think I did the wrong thing in blocking this proxy site and in the process blocking traffic to my site?
5. Would a custom 403 page with info explaining the block & links to the correct pages make any sense here and if so, how would it be viewed by Google?
Any help on the above would be greatly appreciated. Thanks!
[edited by: tedster at 7:30 am (utc) on Oct. 3, 2007]
I would check to make sure you are not inadvertently blocking a range of IP's.
No proxy should allow itself to be crawled by any SE spider. Blocking them was a good idea, and since you know the specific IP involved there's little chance of collateral damage.
I've done the same thing a half-dozen times. I'm fairly confident Google will handle things properly, but it's an interesting question whether it would be faster or better to return a custom "request blocked" page with status 200.
Did your sites lose placements in SERPs? If so, you should file a Spam report by Gogole lettign them know.
I went throguh this last month myself. This has been ocurring a lot more this year.
Sorry for the delay in responding to the posts regarding my request for advice. I got tied up with another problem yesterday.
First, thanks to all who responded.
Jomaxx Ė Glad to hear from someone that I did the right thing here Ė Thanx. Your status 200 comments where of great interest to me. Itís the first time (as I recall) itís been mentioned in relation to this issue. Iím wondering what your thinking was? Did you mean a custom 403 that returns a status 200 with modified GET HEAD POST and/or TRACE info to the spider? Donít know enough about modifying status codes to even know if this is feasible. However, just the thought of being able to send correct, modified info to the bot is very intriguing.
SEOold Ė My placements were totally replaced by the offending proxy, e.g., my url was appended with the proxy sites url in front of it like this: www.mysite/mypage.html was replaced with www.offendingproxyurl/nph-page.pl/000000A/http://www.mysite/mypage.html. My listings were not duplicated in the SERPs. In all instances my placements were totally replaced by the proxy listing example above and still are.
As for filing a spam report with Google, the other threads mentioned that this would not do any good due to the fact that the proxy is not really stealing my content or my placements. The posts I read stated that this is a glitch on Gís end in spidering the proxy and most likely thinking it is a redirect of some kind.
Have you had success in filing a spam report with Google on this? If so, did G act and return your placements to normal, and in what time frame?
I was suggesting using mod_rewrite to show certain IP addresses a valid "request denied" page, that could also incorporate a META NOINDEX tag.
The reason for this is that I've noticed that "error" pages sometimes hang around for quite a while in Google's index. (404's, server not responding, domain does not resolve, etc.) I was thinking this might flush the duplicate content out faster than serving up error codes. It's just speculative, though. I just mentioned it in case someone knew why this might be or not be a good idea.
|SEOold Ė My placements were totally replaced by the offending proxy, e.g., my url was appended with the proxy sites url in front of it like this: www.mysite/mypage.html was replaced with www.offendingproxyurl/nph-page.pl/000000A/http://www.mysite/mypage.html. My listings were not duplicated in the SERPs. In all instances my placements were totally replaced by the proxy listing example above and still are. |
Hi Rep, The reason I asked because I wanted to see if we had the same issue and after hearing your answer this was exactly what happened with my site beginning of September. We lost all our placements and found that the we were highjacked by a proxy. My replacements were also replaced by a proxy in this manner www.offendingproxyurl/nph-page.pl/000000A/http://www.mysite/mypage.html.
I blocked the proxy but you must file a Google Spam report so the proxy could be removed from the index and your results can return in SERP's. The reason you were dropped is due to duplicate content. This is only with Google for some reason. They can't figure out that the original site is the main content site.
So anyways after filing the report, Google removed the proxy site from the index and within a week they returned us back in the SERPs. When you fill out the form there is a drop down to select my site dissapeared .. At the same time you will need to file a re-inclusion request explaining exactly what happened and let them know you also filed a spam report. I recommend you do this a day after filing the spam report.
"So anyways after filing the report, Google removed the proxy site from the index and within a week they returned us back in the SERPs."
SEOold - Thanks for the info. Couple of questions: Did you do this for each individual placement/key phrase or was it just a single report for the proxy IP/url? I have over 30 on Google as of this post. How much detail did you give when explaining what happened? If individual, do you think I can do the 30 all in one day, or should I break them up in say 10, 10, 10 etc.
Again, thanks much for the help!
rep i'm not sure what u mean by 30? Are you saying 30 keyword phrases or 30 URL's that are dropped?
"rep i'm not sure what u mean by 30? Are you saying 30 keyword phrases or 30 URL's that are dropped?"
SEOold- Sorry if I sound like an idiot here. This is the only way I know how to explain. G is now showing over 30 of our keyword phrases that are hijacked/appended with the proxy URL (effectively dropping us out of the SERP's for that keyword phrase).
The 30 keyword phrases point to 6 unique page URL's on our site. (keyword phrase examples: widgets, green widgets, black widgets = 1 unique page URL. kid widgets, teen widgets, baby widgets = 1 unique page URL and so on.) Hope that makes sense.
I guess what I was really asking was: Do I have to file a spam report for each keyword phrase placement that was hijacked or just one for each unique page URL?
Also, How much info do I give in the comment box in the spam report. Do I explain that the page is being served through a proxy, that I've 403'd the IP etc. Any tips would be great as I've never filed one before.
Now I know I'm an idiot lol. Anyway, thanks for all the help.
Hi rep, you're not an idiot, its jsut that you haven't experienced this before.
Ok this is what you should do: When you file a spam report you explain as much in detail as possible as to what happen. Don't mention specific keywords. Provide the page URL's that have fallen of Googles index only.
Let them know exactly what happened with specific dates to when this occurred. Tell them a bit about your site and that it. When you file for re-inclusion you have to select the site URL. you tell them exactly what happened and tell them that you also filed a spam report. Also give the URL's that have specifically have been dropped.
Make sure that you are not doing unethical things on the site before you file the re-inclusion request. If you still have question let me know.
SEOold- Thanks! That's exactly what I wanted to know. I'll follow your instructions and let you know how it all turns out.
You can easily get rid of your own content in Google by redirecting anything from the proxy IP address to a page of nonsense text unrelated to your site, give them a nice 200 response even, who cares. The concept here is to replace the text previously crawled with something else so they no longer rank for your content as Google seems to take longer to dump content with 403 errors and such. A simple content replacement works fast as soone as it gets crawled and is indexed almost immediately.
Now that you have the redirected content page in place, attempt to get Google to crawl those pages again by building a new page and drop links to your hijacked pages via the proxy as such:
Add your new page that contains those links to the proxy in your Google sitemap and see what happens.
This may take a while to fix or it will quickly resolve itself quickly, hard to say as it all depends on how quickly Google re-crawls the proxy links.
Another trick to try if that doesn't work quickly, is to just MOVE your hijacked pages to a new page name and 301 redirect the old page to the new page and see if it ranks again as this sometimes does the job in a pinch.
[edited by: incrediBILL at 12:41 am (utc) on Oct. 17, 2007]