Forum Moderators: Robert Charlton & goodroi
Many site owners and webmasters who do not know why their website has dropped from google's index may find that their website has been duplicated by google itself and penalized.
Don't think that the 302 hijack problem is over.
In fact, it is worse than ever before.
An inurl: command along with an inachor: may indeed reveal that your website has been duplicated by google's algorithm.
Google's patented duplicate content filter may have imposed a penalty to your website.
How to check;
In google's search box do this;
inurl:www.siverwidget.com
then do;
inurl:www.siverwidget.com
Do the same again but this time
inanchor:www.siverwidget.com
then do;
inanchor:siverwidget.com
There are a few more complex methods but the above two methods can display anomalies to your website.
Your Uniform Resource Location is 100% unique.
Only your website should show in the result. Don't listen to GoogleGuy that other sites can show up. That is absolute crap.
A URL that does show up with your listing can only be a potential unhealthy link that points towards your website via a temporary serverside redirect.
Only your website should show. NO OTHER. If a URL does show and its HEADER LOCATION displays your website and your contents, then google has hijacked your website by causing a duplicate content and summarily penalized your website. This can answer why so many sites just disappear off the radar.
Then only safe listing of another URL having your URL in it is one that contains a hatmless text insertion, but not one that is an actual serverside command.
Good luck.
ps, I can assure you that many hundreds of thousands of sites are in this predicament. It is worse than ever before.
Many site owners who have seen their websites dropped by google will find duplications of their websites under anauthorised URL's and Redirect Links.
.
and if the premise ( of G ) is that the site must be designed for the user and not for the engines ( presumably including G ) then they shoot their logic in the super ego ..
makes one wonder if they are all PHD's and some are not infact ENArc's ..
I 100000000% agree that the problem is all over the place - however, there are reasons why websites that are not your own can be returned in the above results - they are not all hijacks.
Dayo_UK,
Thank you.
But I think you will find that a duplicate content is a duplicate content caused by google and no other.
It is not possible for a unique Uniform Resource Location to get mixed up. Google is actually creating pages itself.
And if I remember correctly. GoogleGuy said somewhere that should google deem a diminishing sites contents to belong to a higher pagerank that points to it, it will allocate your contents to wherever google wants to. Well, I can't accept that lying down. Can you?
I mean, what sort of a world do we live in?
.
"inurl:www.mysite.com -site:www.mysite.com"
without the quotes. Now your site is eliminated from the results and only the garbage is left.
Or
"inanchor:www.mysite.com -site:www.mysite.com"
More Time Savings:
Put a unique long winded acronym on all your pages and search for that. In that way you can find all your copied pages, if they're verbatim copies.
An acronym like WDGTSCMPG that would never show up in the search results.
Search for "WDGTSCMPG -site:www.widgets.com" no quotes, poof all your copied pages show up. I've done the same for all my Titles, tacked a unique long acronym on the end, and can find all the directories and scraper sites using my TITLES with one search.
When you review the results if they are "Supplemental Result" and your pages are not, I doubt you truly have problems, BUT, then I'm still suspicious about one page in my results!
This doesn't parse. There's no reason to associate DMOZ clones to scrapers -- they are two completely different things (actually three completely different things, since there are a couple of different techniques used by the DC's, and scraping isn't either of them. In fact, scraping bots are a problem that dmoz.org techies have to handle, just as you do.)
Now, you could both scrape other sites and clone the ODP. Nothing prevents you. But you could both read Shakespeare and commit axe murderers -- but "Shakespeare-reading axe murderers" is not a particularly meaningful sociological classification. They are different skills, even when possessed by the same person.
[edited by: hutcheson at 1:59 am (utc) on Nov. 28, 2005]
This first bot may be a harvester? Yes? No?
Let us assume it is. It may go no further and does or does not follow that directive it picked up? Yes? No?
The bot takes that script to a nursery at google because it is new and google wants to control what each type of bot does. Yes? No?
The script actually may be pointing to your website via a TEMPORARY FOUND 302 at the websites server that points to you. Possible? Yes or no?
At this point google sends out the big bots that follow links. Deepcrawlers or whatever you like to call them. This bot is now going to follow a redirect link that points to your website. It is not going to go to your website first, it is going to the site that has the script link pointing to you.
The bot is out to get information of your website. But the thing is not going to your site. Google thinks that the canonical priority should be given to the website that hosts the script link.
The bot follows the script link to a serverside command that tells the bot the contents of the script is temporarily elsewhere.
There is the problem.
Clean and simple.
No need to elaborate any further. Google has messed up. It thinks your contents belong to the canonical URL.
.
[edited by: lawman at 10:55 pm (utc) on Nov. 28, 2005]
What GG has suggested is self-serving, yeah, but it is mostly accurate nevertheless - the sites that seem to suffer from it have other problems to begin with (like canonicalization). Some of us have those techniques being used on our pages with no ill effects (believe me, I check my own stuff regularly).
If you have any great solutions available to those who are suffering from it, they'd be welcome, but just focussing on the symptoms isn't going to get any of us anywhere.
sincere apologies ..
like your similis tho ..macbeth was my favourite too ..
stefan ..inspite of the "yes" .."no's" ( and the appeals to calm ).the OP isn't looking for discussion or offering solutions ..just trawling for disciples ..
cant wait for his link drop to his ebook on the subject ..
now i will try to be good ..it aint easy tho ..
nearly 04.oo am wher i am so ..i 'll have to leave you all to be converted to the one true explanation ..
ciao ..à demain
What GG has suggested is self-serving, yeah, but it is mostly accurate nevertheless - the sites that seem to suffer from it have other problems to begin with (like canonicalization). Some of us have those techniques being used on our pages with no ill effects (believe me, I check my own stuff regularly).If you have any great solutions available to those who are suffering from it, they'd be welcome, but just focussing on the symptoms isn't going to get any of us anywhere.
Stephan,
Thank you. Noted.
But that is not what a good friend of mine wants to hear. Not after he spent many thousands of dollars on his website for the past 6 years. Blood sweat and tears. He knows that I know his problem. I know exactly how google has removed his website from its index. I know too that google has no responsibly or any commitment to rank a website in its index.
I also know that google is a monopoly and not answerable to anybody. But just maybe, if enough of us that know the exact mechanism that google uses to dump websites from its index, and the news gets out. In a way that its stock holders understand, then we would achieve something.
In the mean time, I promised my webmaster friend that I will pursue this problem with tooth and nail.
Nobody has yet had the courage to file a complaint against google itself for a copyright infringement.
This is what I am out to prove. Not the already known issue of the 302 problem.
If google takes your contents that you spent 6 years to create, and appoint it to others ramdom websites. Then google has to answer to somebody for it.
I bet you, that if I disclose the near exact procedure of google bots and algo hijacking websites, many webmasters will support my cause. At the moment I am only interested in making it known that google is to blame.
Many webmasters that have this problem will be interested. Those that rank high for the moment will not. I am aware of this.
.
I bet you, that if I disclose the near exact procedure of google bots and algo hijacking websites, many webmasters will support my cause. At the moment I am only interested in making it known that google is to blame.
Well...spill the beans then. I think we all have the idea that google has a problem with 302 redirects, as you can see if you do a search on...err, well suffice it to say that there are several threads on that subject.
Can we also blame G for not letting us search WW anymore?;)
Leosghost:
You must be a bloodhound, sniffing out droppings from miles away..
Until then, if you're looking for a good summary of where we were last time:
[webmasterworld.com...]
But many people complain that their sites have tanked in Jagger.
Is it their fault? Is it canonilocalization (sp?) issues? Bad links? All of these reasons, and many more, have been suggested in the lenghty Jagger threads. Nary a word about the possibility of a hijack has been brought up.
According to GG, the only thing that should be unique to your website is the site query, that is:site:www.foo.com
If you've got other websites showing up for this command, you've got a problem.
After Google removed all hijackers from the site query then we had to find them by other means, i.e., the allinurl/inurl or other searches. Just because they no longer show up in the site command doesn't mean Google eliminated the problem. I believe it just made them harder to find.
also before we could do a site:mydomain.com search where we could see all those 302 links, but then google removed that option to see the problem.
At the heart of the problem is this quote from Dayo
I am pretty sure that Google is having trouble working out the root page of many many many sites - due to either 302 hijack and/or Canonical problems.
Readers here need to be quite clear that there are two issues involved 302 hijak and/or canonical problems.
I would suggest that they read the various long threads on both issues and use the advice given to improve their serps in Google. <snip>
[edited by: lawman at 10:54 pm (utc) on Nov. 28, 2005]
What GG has suggested is self-serving, yeah, but it is mostly accurate nevertheless - the sites that seem to suffer from it have other problems to begin with (like canonicalization).
Unfortunately this isnt true, despite what G says. The proper script pointed at a site either intentionally or unintentionally can cause a hijack, whether there are issues or not.
And Google certainly would be one to talk about proper problems, seeing as it cannot even determine ownership of a domain in its own serps...
The combination of the sitemap with the verification file tells Google unambiguously, to which site the page belong.
It may be pure coincidence but PR in one of the my pages was restored after I added the sitemap + the verification file. I believe that my PR dropped because a download site 302 redirected to me and I had by mistake a not canonical (not absolute) URL in the affected page.
Vadim.
regards
viggen