Welcome to WebmasterWorld Guest from 18.104.22.168
After Allegra I used robots.txt and URL removal console to remove duplicate content. This was in March. After that I continously had a robots.txt with
Google states that the content removed by the console will stay removed for six months.
My site came back with Bourbon in May. After that I made a mistake. I've added two lines
These two lines were a time bomb.
As far as I know now this entry "User-agent: Googlebot" stops Googlebot from reading the lines below "User-agent: *".
Google states: "When deciding which pages to crawl on a particular host, Googlebot will obey the first record in the robots.txt file with a User-agent starting with "Googlebot." If no such entry exists, it will obey the first entry with a User-agent of "*"."
To say it in another way: If there is an entry "User-agent: Googlebot" I will never read "User-agent: *".
And thus my duplicate files (for printing and mailing articles) were not excluded anymore from being read by Googlebot.
Now I copied the complete "User-agent: *" section to "User-agent: Googlebot". And I hope my site will return soon.
I can encourage anyone to check their robots.txt for the same possible problem. I had to learn that the hard way.
We do have tons of duplicate content, since we are a news site, so use agencies like Reuters, AP etc. But we run a huge amount of unique content as well. I don't think this is to do with duplicate content...very odd.
There are cases where established site homepages and subpages are holding their ranking for one phrase, but dropping out of the SERP's for another closely related phrase (when the site previously ranked for both) ... and where there is no evidence of dup content filters playing a role where pages dropped out.
They've tweaked something else IMO. Possibly related to linking/anchor text/kw patterns.
IMHO: It's either related to 2 things.
2. Duplicate content
For me, I have a few duplicators, but they are of such low quality, that it's unbelievable to me that Google is too incompetent to construct an algo that can't recognize between who is legit and who is not.
The other might be related to links. However, I build theme related links very, very slowly. Less then 5 a month.
Although it might be one of those 2, neither one is really a "glaringly obvious" problem.
Whatever it is, it better get rolled back.
[edited by: Freedom at 6:49 pm (utc) on Sep. 24, 2005]
soapystar, that looks to be the case with this site of mine... Same template throughtout the site. It's possible.
GG - why not request examples from webmasters.. just mention a code to add to those feedback forms!
[edited by: nutsandbolts at 6:52 pm (utc) on Sep. 24, 2005]
We do have thousands of links, since we often break news or media so sites link to us in the hundreds each week...often in a very short space of time. Plus as I said we do have thousands of pages of duplicate stories, but that is the only way you can cover certain world events. Plus although we run a lot of original content as well we sometimes license that out too...
I do hope it changes though or we will be in some trouble. Just don't realise how dependent you are on one company. Guess this is a sit and see.
I also had a look on Alexa (I know flaky but gives rought idea). I noted all our peers and similar sites have followed us in a big drop in traffic last few days.
[edited by: FattyB at 6:54 pm (utc) on Sep. 24, 2005]
Does Google think I am a Link spammer from scrapers?
I don't think links, or templates or anything has the slightest to do with this.
As mentioned above, sites seem to manage to hold onto (or at least not drop much for) some searches, while being dropped hundreds of spots for most things (and seldom ever gone completely out of the top 1000). Also, pages on a domain that have not been copied in any way (like those built a few days ago) also have a mega-drop in rankings, from #1 when not filtered to down hundreds in the regular search.
This is domain related. Specific pages don't have to be copied to be filtered. At the same time, the ridiculously inflated page counts seem to always exist, and it appears (like to hear any exceptions) that you always have to be over 1000 pages, meaning you can never check to see what any of these phantom pages are supposed to be.
It seems awfully advanced for Google to recognize that a domain has some high threshold of copying by other domains, and thus gets filtered for almost all searches -- although this could be the same sort of ill-conceived notion as the establishment of the Supplemental index.
In any case, I don't think people should go too far afield with this, or read too much into it in tin hat ways. &filter=0 corrects the problem... in my experience, it *always* corrects it. That one bit of information should tell Google how they massively screwed up, and tell them what they need to do to fix it. If it is an overall domain level of content theft that triggers it, it is doubtful that we as webmasters can do much of anything about it, since by definition the contetn theft will be widespread, and more importantly, in most cases HAVING THE STOLEN CONETENT REMOVED WILL HAVE NO EFFECT, because it is in the supplemental index (in most cases) and deleting supplemental pages does not get them deleted from the supplemental index.
Google Guy(s) and Google Gal(s), you know what you did. Stop doing it. It accomplished nothing positive. The results are virtually unchanged... except you are filtering out many of the most respected (and stolen from) domains in every niche.
My site scraped and now missing was registered by me in 98. Hard to believe the 250 sites that are listsed instead of me were registered before then. I'll bet not one of them was.
Can it be so difficult to sort this out?
What does google expect us to do rewrite the site for every update?