I've used the manual removal tool to remove redirect URLs, but only after using robots exclusion for the redirect URL (which requires control of the URL or cooperation from its owner).
An interesting aspect is that, according to your experience, Google are removing the submited URL and not the destination URL. This does make sense, given that Google fixed the "remove competitor's home page" exploit last Summer.
The next question, is how quickly the benefit of the backlinks will be applied to the rightful URL.
Yesterday I removed a hijacker using the method you described. Since the hijacker was redirecting to a slightly different URL than the URL I have indexed (e.g. mysite.com/index.html vs. mysite.com/), it was a low-risk move. I probably wouldn't have done it if the hijacker had been redirecting to a URL I needed to keep in the index.
Can you confirm that you were able to remove the hijacker without inflicting any collateral damage on your own URL? If so that's great news and we finally have a decent way to fight this.
Yes...you can remove a url that redirects to any page on your site without causing harm to the intended url for that page.
To do this, you use the removal tool and set the meta robots tag to "noindex" just long enough to get the url submitted. Then, instantly return the metatag to "index". If you forget to change the tag back, you obviously risk having the intended url removed next time Googlebot checks your site. When you submit a url for removal via the removal tool, the program will instantly check to make sure the tag is set to "noindex" (for your protection), but it will not check again. That is why you are able to immediately return the tag to index after you get the submission "success".
The only thing I am not sure about is if Google still knows about the url(s) that are removed and uses them in ranking calculations. Does Google only remove the url from visible index? If Google removes the urls from visible index but retains the url/information somewhere else for it's own purpose, then our efforts to remove the urls and help Google clean up it's horrific mess are in vain. This would not surprise me in the least.
Please update this thread periodically to let us know how your site is doing in the search results; have you lost ground? And how long does it take you to get back to where you should be?
This is a *very* interesting post. It's the first thing I've seen about 302's that seems like it actually would work. Here's hoping.
Well I have a bit of information regarding the success of the url removal.
In November 2004, there were as many as 20 urls that were NOT mine showing in a site:mysite.com search. These urls were mostly tracker2's. But, after Google had associated those tracker2s with my site, it then began associating all redirects it found with my site. Incidentally, the site: search is supposed to show only urls that are truely part of your site. If it shows unrelated urls (and certainly 20 unrelated urls) then there is a problem.
So, I began submitting those urls to Google Removal tool. Since the redirects ultimately landed on my page (destination page), I had control of it's removal using robots metatag. The last one was removed in late January. Unfortuantely, nothing has changed. My site is still MIA. There are still some unrelated urls showing in the site: search since the redirect was removed prior to my learning about the removal tool. Those remaining urls were last cached on Nov 2, and until Googlebot revisits them it will never know they no longer redirect to me. I am convinced that there is nothing we can do. Google is just broken and they don't care. When I search for my company name, my home page is no where to be found. Rather, dozens of scraper/directory style sites with 0 pagerank are listed. Very pathetic and sad.
Chris: Have you tried prompting Googlebot to visit those no-longer-existent redirects by submitting their URLs to [google.com...] and/or linking to them from a frequently-spidered page?
Been lurking here for years, but with this 302 fun, I just gotta join in.
By searching for text unique to my sites, I found 5 URLs in Google (not mine) using my page title and having my info in the cache. I was able to use methods detailed above to initiate removal for 3 of these URLs. However, one of these URLs is not just a 302, but also uses a meta refresh set to 0, so the Google removal tool is not seeing the meta no index tags which I temp. added to my page (Google sees a page that has nothing but the hijacker's meta refresh tag on it). The second URL I could not remove goes to a page that says "This Account Disabled...," the 302 no longer redirects, but the URL is in the SERPS with my cache.
Is there a Google e-mail or other method that can help rectify these problems?
I have submitted those urls AND I have linked to them from various pages. But, Googlebot has yet to revisit and update it's cache. I linked to them because when the webmasters removed the redirect to my page, they simply redirected the url back to their own homepages. But, as I said, Googlebot is not interested in visiting, and Google would rather have old, outdated cache in it's database.
My situation is almost exactly the same - redirects from old pages with old caches.
On one of them, the removal tool wouldn't work because it couldn't recognize the characters. I think it got hung up on the %2F. I don't know why Google can index and display a URL but can't remove it.
macdave, that seems like a good suggestion. I would be interested in knowing if anyone has had success with this.
The URL I removed had the %2 in it also but this wasn't the problem. I initially got an error message from Google saying not valid becuause of the character " "
This turned out to be a hard to see space in the offending URL. When I removed it and resubmitted it worked.
The last one was removed in late January. Unfortuantely, nothing has changed.
I read somewhere that the duplicate content panalty ranges from 30-90 days depending on how long the dupe content exists. You should be getting close to the 90 day mark. I've got my fingers crossed for you.
In addition to removing all those urls, I rewrote the content. You are right, we are approaching the 90 day mark. But, it is not right for Google to penalize an innocent site. It is not my fault that webmasters copied my content, or set up malicious redirects to my homepage. Google should have a way of manually removing penalties such as these.
I still have my doubts that anything will change in the next 6 months.
When I do a "inurl:mysite.com" search on google it will only show me 1,000 of 9,900 results! (My site only has 1,700 pages, so I know that there are more hijackers then the few I see in the first 1,000 results....
Does anybody know how to get to the other 8900 results?
I can't use this remove feature if I can't see all of the results!
Don't know how to see them, but I agree that there are probably alot more bogus URLs effecting our rankings than those that can be found by methods posted here.
I think that this problem is far bigger than it appears to be on the surface.
|Can you confirm that you were able to remove the hijacker without inflicting any collateral damage on your own URL? |
I just got confirmation from the Google control panel that the offending URL removal is "complete."
An "allinurl:www.mydomain.com" search shows the hijacker gone and it shows my index page alive and well.
Does anybody know how to get to the other 8900 results?
inurl:yousite.com "unique text from your home page"
That should narrow it down to your home page and anyone else 302'ing to it.
>> I added the "noindex" meta tag
Just dropped by to say that you can also serve a 404 or 410 code, that works just as fine. (no need to serve it to all, just do a little .htaccess magic and serve it to Google for a few minutes, then take it down again)
Q: What was the IP and/or User-Agent of the script that checked your page (ie. the URL removal tool)?
Altough i'm glad that you managed to get some redirect scripts removed, i'm also a bit worried as this should really not be the responsability of the hijacked webmaster. This does not fix the problem. Google should plainly fix this, so that we could get on with building and maintaining our sites in stead of fixing their errors for them.
[edited by: claus at 11:26 pm (utc) on Mar. 17, 2005]
Idaho, it didn't work in the past for crobb305 but it worked for you just now. Could this mean that Google did at least a partial fix in the meantime?
pay attention to the Case of template your are using on your site, this might be your issue.
Template.html and tempLATE.HtML at 2 pages in the G's Index.
[www:mydomain.tld...] count as 2 pages.
Trying to understand...
Not following "case of template.." I use SSI headers and footers but not getting your meaning.
I typed https://example.com into my browser and my web host's front page comes up with that. G shows no listing for https://example.com, but does list [example.com...]
[edited by: ciml at 8:07 am (utc) on Mar. 18, 2005]
[edit reason] Examplified [/edit]
|Altough i'm glad that you managed to get some redirect scripts removed, i'm also a bit worried as this should really not be the responsability of the hijacked webmaster |
While I too am happy to see the Google removal tool working to an extent it simply is impractical.
I have over 20 sites which have all been hijacked by over 100 web sites all using the same template. each site has been hijacked a 100 times, so I would have to submit 2000 pages to the Google removal tool and temporarily remove my sites or place the no index tag in there and then put it all back to normal again afterwards.
Let me rephrase that.
What I mean is that
That would take a good deal of time.
I'm curious how Google lets someone who does not own an offending domain remove it from their index?
What's to stop me from getting them to take all of my competitor's pages out of their index? How are you being authorized to remove a page from their index?
It doesn't sound like there is any authorization going on at all ... just requests for removal of someone else's page being granted. Worrisome.
|Idaho, it didn't work in the past for crobb305 but it worked for you just now. |
I'm not saying my site has come back in the SERPs. Judging from what Crobb305 has said, this probably won't happen until:
1. Google re-indexes my page;
2. The next update; or
3. The next update after some duplicate content penalty expires.
All I'm saying is that I sucessfully removed the offending url without removing my own page from Google's index.
You submit a page to be removed, G checks to see if it has a noidex tag and if it does, it removes the page. So, yes, you could submit URLs not your own, but only pages with a noindex tag get dropped -- so you couldn't do it maliciously to anyone who hasn't included that tag on their page.
It works in this case, because G thinks the page is yours -- G fetches YOUR page with the noindex tag, when it checks the "hijacking" URL.
|What was the IP and/or User-Agent of the script that checked your page (ie. the URL removal tool)? |
220.127.116.11 "googlebot-urlconsole" ...in one case.
This is a case of removing a page that is using a 302 redirect to one of YOUR pages, stealing your content and pretending that it is theirs.
By temporarily removing your page and signaling to Google that you'd like to remove the hijacker's page, Google mistakenly believes that the hijacker's page is no longer online and wants it to be removed.
The same flaw in the algorithm which is responsible for indexing the hijacker's page (which is stealing your content) also can remove it.
The problem here is that first off it's difficult to discover all the pages that are hijacking your pages, and secondly it requires a substantial amount of time directly proportional to the number of total pages to be removed.
StupidScript; if you follow the original post you'll understand.
It works like this:
Google has indexed my own page as belonging to some other site. This is called a hijack. It's really just a Google glitch because it's really my page, it resides on my site; I control the content, etc, but Google thinks it belongs to some other site.
So what you do is put a meta tag on the page that tells Google not to index the page. Then you have Google go have another look at the page through the offending url. When it does, it sees the "noindex" meta and removes the offending url. It doesn't remove the page because you told it to, it removes the page because the author of the page has a meta tag on the page that says to remove it.
After Google looks at the page, it removes the URL from its cache of pages belonging to the other site. The trick is to remove the meta tag before Google comes along through your URL and notices the tag. If this happens it will also remove it from your site.
Yes, I see what you are saying, and I understand the "trick": You put the "noindex" instruction on YOUR page, not the offender's page. G looks at your page via the redirect page ... and removes the redirect page only? But ... YOUR page is the one with the "noindex" on it.
The 302 is resolving to YOUR page, hence either G has the 302 page AS BEING your page or it doesn't. In either case, the site is NOT within a domain you are authorized to manage, and G doesn't ask you for any authorization ... does it?
In the latter case, G sees BOTH pages ... and you are authorized to manage only YOUR page. In the former case, G sees only the 302 page ... which you are not authorized to manage.
How can G remove a page at your request when you are not authorized to manage that domain? Are we in agreement that G is too stupid to realize what it has indexed and who is asking it to take a page out of that index?
The "trick" described above only works if G does not validate authorization to manage the offending page's domain. If they go ahead and remove a page from someone else's domain from their index because you ask them to, that just doesn't sound right.
The ends do not justify the means, and this leaves a lot of issues on the table ... issues far more serious than what the 302 perpetrator did in the first place.
Yes, it is a glitch with Google. We all agree that its a huge problem that Google needs to fix. It is looking at one page and indexing it as two pages; one for me and one for the hijacker.
All the tool does is tell google to look at the page and read the meta tag. If the meta tag is there it removes the page. If it isn't there it won't remove the page. You couldn't possibly use the tool on a competititor's page to remove his content unless you could somehow get your competitor to also include the meta tag.
It does seem interesting to think that maybe you could use the tool to reindex your page back into Google's index by setting the metas to "index."
| This 277 message thread spans 10 pages: 277 (  2 3 4 5 6 7 8 9 ... 10 ) > > |