Forum Moderators: Robert Charlton & goodroi
My site was doing very well in the SERPs. For over 2 years it had been on the first page for a competitive term (1.2 million listings). Then during the first week in January my site disappeared and traffic tanked for no obvious reason.
When searching for "site:www.mydomain.com" I noticed that my index page often wasn't listed or it appeared on about page 3 or 4 of the results after all my supplimental pages.
A search for "allinurl:mysite.com" often didn't show my index page at all but instead showed somebody else's domain (located in Turkey). When I clicked on this link, my site came up. When I clicked on the cached version of the site, it showed a very old cache of the page. This same site also showed up after all my results when doing a "site:www.mydomain.com"
Using a header checker tool on the site's URL I was able to see it was using a 302 link to my site.
Last night after reading some posts by crobb305 and others I went to Google.com and clicked on "About Google." Then I clicked on "Webmaster Info." Then I clicked on "I need my site information removed." Then I clicked on "remove individual pages." Where I found instructions on how to remove the page.
(Here's the exact page where I ended up. If mod needs to remove then snip away:) [google.com...]
I then clicked on the "urgent" link.
Then:
1. I signed up for an account with Google and replied back to them from an email they sent me;
2. I added the "noindex" meta tag according to their instructions and uploaded it to my site;
3. Using the instructions to remove a single page from the Google index, I added the hijacker's URL that was pointing to my site. (copy and paste from the result found on "allinurl" search)
This didn't work the first time because I had to remove a space from the url to get it to work.
4. I got a message back saying that the request would be taken care of within 24 hours. The URL that I entered showed on the uppper right hand part of the screen saying "removal of (hijacker's url)pending."
5. I then removed the "noindex" meta tag from my page and re-uploaded it to my site.
This morning the google account still shows the url removal as "pending" but when I do "site:" and "allinurl" searches the offending URL is gone and my index URL is back.
Conclusions and Speculations:
At some point last September, Google cached the hijack page's url pointing to my site. In January, Google penalized my site for duplicate content because it found both URL's and compared them. Mine got penalized because it was the only page that really existed. The hijacker's page didn't get penalized because it only existed as a re-direct to my site.
Because my index page was now penalized, it dropped almost completely from the SERPs. (Some of my suppliement pages showed up for obscure searches) but none of my money terms.
Because I haven't been able to get a response from the hijacker's webmaster, the 302 is still in place but it is buried deep in his site and the last Google cache of the page was sometime in September. Therefore with some luck Google won't re-index it any time soon.
Will my site return to the SERPs? I don't know. Any thoughts?
BTW, a new redirect just showed up for my site in the form of a subdomain like mysite.jm8.net
Using removal tool, it is now gone. But, for a while, it was ranking #1 for my company name and for snippets of text from my home page. As I have said before, absolutely pathetic.
What is really funny (and still pathetic) is that Google is listing a bunch of bad urls in the form http/http://example.com/%20
And even though they don't exist, you can't submit them to Google for removal because the removal tool doesn't recognize the format. If the removal tool knows that the url is not correct, how does it get indexed in the first place? Why is Google loaded down with crap like this, fake urls, etc.
Sad.
The action was refused because Google could not fetch the page to read the meta tags, nor could it fetch any robots.txt file either.
How can I add a temporary robots.txt file to a domain that does not exist? Google hangs onto those old cached pages like glue.
Databases get all kinds of trash in them if you don't use some real edits right up front.
The devil is in the details, plenty of folks can dream up things.
It takes someone very picky to write good edits.
Unfortunely a lot of wetware malfunctions when doing this. You see examples of the failure all the time. That is why there are things like buffer offerflow exploits, executable code insertion exploits, etc ... etc ...
Just look at IE, XP, Moz, Firefox, Linux, or you name it.
There are failure modes because of not checking validity of data prior to acting on the data.
These are seperate from doing the wrong thing with the data.
(he only gets 20-30 uniques per day and the site was only down for a min, he was happy the link is gone)
Could there be a rational purpose for this that isn't obvious? Nah, probably not. Most likely, it's just screwed up and nobody cares. At this point Alta Vista is looking pretty sophisticated.
When I treid removal of the url in google (w/o the yahoo mess) it said page not found or something, when I copied the actual url link with yahoo stuff in it, it seemed to work. However, of the 4 hijackers, this page still shows up while the others are gone.
Dang I thoguht I might finally get these penalties off of me. Also, one site that is now a 404 did a 302 hijack to me and no site exists so google now gives me a no URL found error but still shows as a supplemental result.
I love you google.
I don't understand why Googlebot stubbornly refuses to crawl all of these old pages that haven't been cached since September or November. I've tried submitting them and linking to them. It's almost as if Google is purposely refusing to solve this problem.Could there be a rational purpose for this that isn't obvious? Nah, probably not. Most likely, it's just screwed up and nobody cares. At this point Alta Vista is looking pretty sophisticated.
Vec_One,
I have a theory. When I submitted some split site urls to google, no matter what it would never update the cache of the pages, then it dawned on me. Everytime googlebot visits my server was returning a "not modified" code so google wouldn't see a need to re-fetch the pages and update the index. I went through and did a global search and replace on those pages to re-save them with a new date. Now I've noticed double entries in my logs where the files get hit, then 301'd to the proper location.
Now I'm thinking thats why google isn't updating those php redirects that used to go to my site but are now directed to their homepage. Anybody know if a php script like that would return "not modified"?
they need 8 billion pages :)
I doubt they matter though...unless Google can index, them I don't think that they cause problems. I hope so anyway
it worked for my 4 first ones. One guy couldn't care less. Sent him 3 emails. Maybe I should return the favor and link him ;)
the problem is removing inactive links or if someone has a copy of your page on another server.
I suspect that these guys like having these dead links layin around, maybe because the cache is never updated they still count somehow, I don't know.
I have one in my site: that was removed from the directory that hosted it but Googles URL tool won't remove it because the php file is still there but does not recognize the id# for that link, returns an empty header. They won't talk to me either, got banned for poking around investigating my link (and others). Joke is on them though because they banned my home IP (not my site).
I am currently trying to have two of these links removed. If anyone knows a better way, *please* let me know!
"Emmett, although other people have tried submitting and linking to such a page, I haven't heard any success stories. AFAIK, the only solution is to convince the webmaster to completely remove the link, and then submit it to the Google URL removal tool."
Since there has been NO comments on this topic in the press or forums from google, they are not interested.
"What is really funny (and still pathetic) is that Google is listing a bunch of bad urls in the form http/http://example.com/%20"they need 8 billion pages :)
I doubt they matter though...unless Google can index, them I don't think that they cause problems. I hope so anyway
I agree. This whole mess is because if google fixes it then they loose a lot of pages indexed-and the amount they loose will tell us how many pages were affected too.
re the example.com%20
I've seen it under the site command. So Google is indexing them
To see for myself I did a site: search on the Top Level country domain. Even in a couple of thousand results I was amazed at the amount of malformed rubbish that has got into Google's database. Those sort of URLs shouldn't be in their database at all, never mind actually turning up in search results.
Only problem with removing the link is their go.php script will return to their home page for any invalid id #. I doubt I could get them to change the programming of their script.
How about do a DOS attack to bring their server down - that way you'll get a 404 so you can remove your link.
Just a thought...
Cant Google just ask Yahoo if they can use Yahoo results until this is fixed - or perhaps ask - Ask Jeeves to use Teoma results (might be a little less embarrasing and they have a contract re Adwords.)
Well Google have certainly confused SEOs and Webmasters if that was the plan - confused us by asking why so many problems are not getting fixed.
What if i have a PR 6 site, then created a blank directory with a htaccess 301 redirect to the most popular file/page on a site with a PR 2?
eg:
redirect /mydir/mysite.html h**p://somesite.com/somefile.html
The googlebot would also see this as duplicate content, and also think that the content it directs to belongs to my site.