| This 268 message thread spans 9 pages: 268 (  2 3 4 5 6 7 8 9 ) > > || |
|Scraper Site Clearout Collateral Damage?|
It seems like google has purged many scraper sites from the google serps, as per this thread:
I'm sure many people, including myself are very, very pleased about this as it stops scumbag sites from stealing our content.
However, it also appears that some non-scraper sites have been included in this purge (including my own). My site has been active for 5 years and is based on unique content.
Has anyone else been effected by this, and does google intend to refine the algorithm to stop valid, unique content sites from falling victim?
"However, it also appears that some non-scraper sites have been included in this purge (including my own). My site has been active for 5 years and is based on unique content.
Has anyone else been effected by this, and does google intend to refine the algorithm to stop valid, unique content sites from falling victim?"
Yes I have been affected and I am devastated. For 4 years in top ten for all keywords. Also in DMOZ. A site where articles are written everyday about a specific topic. Now I have no pagerank, no cache and no backlinks. I am nowhere!
Very soon google will have nothing in their index... with a long list of adword advertisers on the right and top.
I was affected bt this purge of scrapers. I too, have only original content, and have been around and on top of the SERPs since 1998.
As of this afternoon, one of my main sites was completely purged from the index by Google.
Have your sites been completely removed, or have they just lost rankings?
Would it be better just upload to a new domain name, as they say that it is impossible to recover from a total ban?
Collateral Damage is a good decription of the carnage we have seen as good sites were inadvertantly destroyed today.
Its a pity google is banning such informative sites while very much of the junks that they hate are still in their index.
Google should realize that not only they are damaging good content sites but they're not solving the problem, because the scrapers are still getting huge amount of traffic from MSN and Yahoo. I know because I scrap a scraper and am hoping for them to write to me in regards to copyright infringement.
For every scrapers' adsense account they terminate, they probably remove along with it all the scraper sites that the scrapers own.
I feel like General Custer must have felt when the
indians were on their massacre
Google sucks. I hope my sentiment is sweet and to the point.
As usual, they totally missed the mark and caused untold psychic damage to thousands of innocent webmasters worldwide.
My main site got hit again too. Different time frame, though. I had lost something like 80-90% of my google traffic with Bourbon update.
Traffic came back on 6/30 for two weeks (and my site was back in 1st postion when search for name of site). Then lost 90% of google traffic again on 7/16 - and most of the rest of it on 7/22.
And now when I search for name of the site, it's no longer on first page. Haven't bothered to check how far down it is.
This a educational site with about 5% "commercial" pages.
They got one of my legit sites too!
Heck, I love the scrapers --- hundreds of them scrape my unique content, provide links to me, boost my rankings at Yahoo and MSN, increase my income AND they hardly ever rank for any decent keywords and do not really create competition for the sites they scrape!
I hate them in Google (now that Google is throwing out a few babies with the bath water), but they are actually automated assistants for good content sites at Yahoo and MSN...
I any of you out there are scraper site builders, please scrape my sites!
My main site got punished yesterday by a complete removal from the index(old news - I have noted this on a couple of other threads). I had been e-mailing Google back and forth for the last week asking them to look at my website and see it that was the most relevant site in my niche - and to please restore my rankings to where they were before July 22 - when I was getting 10000 Google ref a day.
Obviously Google got tired of dealing with me, and kicked me of the index altogether today. I saw on another thread that GoogleGuy said that they were manually removing scrapers today - and I guess I drew attention to myself and got banned along with the rest of them. I AM NOT A SCRAPER - IT IS ALL MY ORIGINAL CONTENT. My Adsense placements were set up like a scraper site - the scrapers know how to maximize clicks - but the content was mine!
Is it too late to do anything - should I reload my site to another domain that I've kept around just in case something like this happens? Is my domain now dead - should I abandon it altogetger? I still get arounf 4000 ref a day from Yahoo and MSN, but my Google refs are now a big fat GOOSEEGG!
My largest and highest quality 5 yr old website was also banned on 7/28, apparently because of the new scraper filter.
I have access to a lot of search data and have a few observations about newly banned websites:
- All have a link directory
- Most have adsense
- Filter is indiscriminate to site age, size, or quality (Many are old legit established sites)
- Algorithmically Banned on 7/28 (Unless Google has army of eval.google trainees that weren't properly trained)
Do your websites follow my observations? Do you have any data that contradicts them?
Fits my site, it had a DMOZ directory (computing section) as it was relevant to the site. Around 6 years old and plenty of highly unique content and adsense.
Do you guys have link exchange directories on your sites? The kind where people can submit a site as long as they add your link? Do you have lots of unrelated categories?
I do know of one site that was nuked but has come back, pr, listing, links and all.
No link exchanges here.
I do have an absolute ton of related links pointing to my site with clean promotion and on-site optimisation. Something that has been worked on for the past several years.
I've also got a forum running on the domain with approx. 100,000 posts. I have the whole site setup so minimal to no duplicate pages will be shown. As an example, non-www pages are directed to www, and forum posts' page (in phpbb) are disallowed in robots.txt.
All pages are gone from Google now. I have seen the effects of it in stats, and it's NOT pretty at all.
Nope, not reciprocal links, just a DMOZ script that featured the computing section (as it is a computing site).
|Google should realize that not only they are damaging good content sites but they're not solving the problem, because the scrapers are still getting huge amount of traffic from MSN and Yahoo. |
Do you suggest that Google do something to stop MSN and Yahoo sending traffic to scrapers?
Remember... Google has been throwing out the baby with the bath water since Dec. 2004 or more likely since they went to 8 billion web pages back in Nov. 2004. The Allegra and Bourbon update saw many quality sites disappear. Do a seach on Allegra and Bourbon - you will see much of the same sentiment. I forgot what type of sites they were trying to get rid of during Allegra and Bourbon...
Here are some ideas that have been thrown around in the past:
- Over optimization penalties...
- Hi-Jacked sites...
- WWW Vs. Non-WWW causing dup penalities...
- Something about hilltop algo being put in place...
The cycle continues.
Tomseys, did the site that came back file a reinclusion request? Or did they remove their directory?
When I say link directory, I mean pages that have many outgoing links to other sites. (similar to scrapers)
My data shows it doesn't matter if the directory is thematic or related.
From Google's POV, any site that could algorithmically fit the characteristics of a scraper could get mislabled as a scraper site and banned.
It's obvious this scraper filter is not finely tuned YET, b/c I still some scrapers.
The question now is will they turn down the nob a bit to algorithmically reverse bans on quality sites? And where is the exact threshold of being labeled as a scraper?
In my dataset, there are many sites that are very similar to quality sites that were banned, yet are ok.
It seems that the only people who believe Google has lost its quality search results are webmasters who've lost their ranking in the serps. Not all webmasters run websites with "quality content" and the whole concept of "quality content" is completely subjective. To say Google has lost its quality is a bit extreme, especially coming from a webmasters point of view which is obviously going to have some bias. My websites have been completely unranked by Google for almost a year now but despite of this, I believe Google's results are amazingly accurate and better than ever.
Surely Google.com should also be banned - they have a DMOZ clone directory
twinsrul- I agree 100%. So sick of all the webmasters who 'objectively' rate the SE's on forums. Don't care if you manage 100 sites or none. No one is objective about SE's just like no one is objective about politics. The masses decide who win in both cases.
A few of my story writers, who sometimes write political stories on my site, have caught wind of Google's ban and see it as censoring given the current political situation in the US.
The thing is, is I can't tell them they're wrong in making these assumptions because I myself don't know why the site was banned/removed/filtred.
RE: Quality. I believe several years of proof-written articles, news stories and commentaries should be regarded as quality content. Yes, the site does include a directory (almost every portal does), but there's years worth of unique content and daily-updated information such as news articles and weather forecasts and 100,000 forum posts which should not be censored/banned/penalised.
|Yes, the site does include a directory (almost every portal does), but there's years worth of unique content and daily-updated information such as news articles and weather forecasts and 100,000 forum posts which should not be censored/banned/penalised. |
The problem with the above is if you let your entire site be crawled you cannot pick and choose what Google should penalize or not penalize. For those trying to rank by using feeds, dmoz clones, pseudo directories, multiple sites with very similar content – I think those days are coming to an end.
If you want to put up a dmoz clone or other feeds/duplicate content for your users benefit then you should block it from being crawled.
Regarding the person whose site came back, yes they emailed google and very quickly came back.
One thing I've noticed on a few dropped sites is that they have a kind of "doorway" page - a page listing a long list of articles that are obviously optimized for keywords.
Perhaps it is penalizing manipulative link architecture as well - where you omit page links in your homepage footer on other pages or use huge link directories with unrelated topics.
I for one will not miss sites that show dmoz content or scrape any type of content off other sites beyond a small ticker or use manipulative linking schemes all for the real purpose of increasing content/rank/income.
[Contractor - You would mean block it from being crawled by bots from SE's that would penalize you for it, yes?]
Some report that directories are not an issue on Yahoo or elsewhere, where having a directory is not a sin. (Not that we know if it is a sin on Google, and I assume it will not be).
Are we talking about sites that have thousands of html/asp/etc. pages, or sites that have thousands of virtual pages fed in real time using scripts, i.e. www.sitename.com/cgi/odp/?cat=business+finance? Or a blend using url-rewrites to eliminate the "?" from dynamic pages?
[Edited for clarity of initial sentence, and ODP usage]
My odp-fed sites via a "?" script all still listed. I added the script a long time ago to try it out, and found that people used it, so I've left it.
Google is likely getting caught more in the situation described in an old book (either "The Mythical Man Month" or "The soul of a new machine"; I can't remember). At some point, the book described that an operating system was riddled with potential side effects, and that the team's management had to decide on which side effects to live with. Why? Because if you change one bug of the OS, you get some new side effects that, if fixed, created a different set of problems, but worse than the one being fixed.
Same with any SE algorithm, or for a project to change the algo. With more time and engineering, you get a chance to stop more side effects. But if you've run out of time for engineering....
Depends on the risk you want to take ... turn the "Quality" knob or turn the "Time" knob, but they are linked and turning one will move the other in the opposite direction. But it's not determined how they interplay until after you turn them. The other knob is the "Scope" (or "Complexity") knob for a project. The running logic is that you can turn 1 or 2 in a favorable direction, but not all three.
So, in the case of eliminating Scrapers, if you turn the "Scope" knob too much you impact some other area negatively ("Quality" goes down), if you turn too little, then you have gotten too little quality as well. Seems that people reporting that they turned the knob "too much" and "too little" in eliminating scrapers. A delicate balancing game.
As far as telling you, me, or the man in the moon what they are doing, banning, or thinking... Google is a publicity-shy company. Any publicity-shy company will communicate less than some people would want, and in the case of their algorithm, it is also a trade secret. And as you may be aware, Google and MSN are battling just on the issue of whether an ex-MSN employee can work at Google.
Back to the issue at hand - I've had sites drop out of the index and back in.. seems way more frequent in 2005. And searching for just the name of my domain, "Yowzawidges", I sometimes see that I'm closer to #100 than to the rightful (!) #1 slot that I held the month before. It's just that people - my users - use SE's as a shortcut to find the site again, or to refer people to me, and when it instead turns up a high-PR site that simply has a sentence using that invented-by-me Yowzawidges word, it does no one any good.
After reading many of your posts it looks to me like many have tried either scraping content, re wrote someone elses content (For those claiming not to scrape) or are involved in link directories.
For those who have truely done the work the natural way with true original content...you can thank the others who don't know how to build a business on the Internet for ruining the good things that were once acceptable.
It should also be noted that cleaning the Internet of these sites is everyones job and not just Googles, so if people don't help Google there is little chance Google can help in return..
There is even someone here asking for his site to be scraped, in my eyes this is the type of webmaster Google doesn't want building sites. Why would they want to re-index text written once already?/ Waste of resources and time!
Others have stated their forum posts of 100,000 posts were dumped...could it be because they had people come in an fill out posts continually to bolster their rankings?
Or perhaps they are one of many webmasters on the freelance job boards posting for 100's of people to make up posts for them or have someone write 100 articles of optimized keywords..
These tactics are what is meant by attempting to exploit the SERPs yet many webmasters think that because something works temporarily for someone else, they should hop on and do the same...WRONG!
Most sites over time may have added a page a week,,so when someone comes along and adds 100 pages in a week,, Googlebot kicks them out...as dumb as people think Google is they are very very intelligent.
If it took your site a month to gain 100,000 forum posts. and it took true forums likes WebmasterWorld, SEW, Etc a year to get to 100,000 posts what makes you think you don't deserve to be banned?
Link directories...most were built over weeks and months, so if you added one with 1000s of outbound links in a day... then you deserve what you get..links are natural and take months and years to develop,,,when you take short cuts, you get cut short!
Also one thing to be learned about Google..you may drop out of the results and reappear several days later so do not panic..doing so only will hurt you more.
Another if you make a major change to the way your site operates and then get dropped, wait a week, if no reinclusion, dump the change and then build a one way link from a site google is crawling daily, re inclusion should be less then two to three weeks.
And yes you can get any site unbanned though there maybe a few hardcore spammers who can't ...again your history shows you as someone not wanted around.
In the past three years I have helped 8 banned sites get back in. The latest was <snip>
100,000 pages dropped in one day..gone vanished...
After 3 months of ripping out all link exchanges, and every other thing that may have been looked at by Google and about 10 e-mails, the site was put back in this past week, and googlebot is indexing 10K of pages daily.
Maybe Googles new slogan should be
"You take short cuts...Your site gets cut short"
What's everybody think?
[edited by: lawman at 6:07 pm (utc) on July 31, 2005]
[edit reason] I Think You Need To Comply With TOS Re URL Drops [/edit]
>>>Others have stated their forum posts of 100,000 posts were dumped...could it be because they had people come in an fill out posts continually to bolster their rankings?
>>>Or perhaps they are one of many webmasters on the freelance job boards posting for 100's of people to make up posts for them or have someone write 100 articles of optimized keywords..
Not at all. Just a lot of political debate, news discussion and general chat. The forum has a large following of loyal users.
One thing I missed was this post
A few of my story writers, who sometimes write political stories on my site, have caught wind of Google's ban and ""see it as censoring given the current political situation in the US.""
A few thing your writers and others forget.
1.Google is a business...and just like any other business can refuse to serve anyone they seem fit.
2. The free results listings are just that free, nobody pays Google a fee to be included...so you really have no merit or ground to complain as there was no arm twisting done to you to have your results show there.
3. As a free service, they will police it as they see fit, you don't pay to be there, you are not forced to have your site there, but if you choose to then you must follow their Terms of Service..
4. You complain when Google protects themselves from spammers and such... but yet if someone came to your site and did the samething.. you would be quick to ban them as well.....kind of being a hypocrite....
5. At the end of most wooden pencils are erasers...
Do you know why there are erasers?
Humans make mistakes all the time!
Food for thought!
While it may seem I directed the question at you,, it is not meant that way...
My way of covering a subject brought into the thread, and hopefully read by many others,,, some of who do try this type of thing.
I know of forums that add 100,000 posts or more per day.
One must be carefull not to paint every one and every site with the same wide brush.
Google isn't smart or stupid it is a public corporation.
Google's software has holes large enough to drive 18 wheelers through. It can't even keep sites straight.
[webmasterworld.com...] messages #6 and #7, and there are at least 5 others.
You couldn't make duplicate home page content and interlink sites as fast as Google did. It looks like a mass classical hijack.
The book you are thinking of is Mythical Man Month by Brooks and deals with IBM's OS/360.
If Google ever settles down I might get a chance to play with that OS again. I have a copy on this system somewhere.
Don't have a clue about what is on your site, however Google says they look for value adds (How does an automated system determine that?).
This is akin to design for the visitor, a lot of things you do for the visitor can result in other problems.
| This 268 message thread spans 9 pages: 268 (  2 3 4 5 6 7 8 9 ) > > |