homepage Welcome to WebmasterWorld Guest from 50.17.27.205
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 268 message thread spans 9 pages: < < 268 ( 1 [2] 3 4 5 6 7 8 9 > >     
Scraper Site Clearout Collateral Damage?
Ian Cunningham




msg:707472
 10:18 am on Jul 28, 2005 (gmt 0)

It seems like google has purged many scraper sites from the google serps, as per this thread:

[webmasterworld.com...]

I'm sure many people, including myself are very, very pleased about this as it stops scumbag sites from stealing our content.

However, it also appears that some non-scraper sites have been included in this purge (including my own). My site has been active for 5 years and is based on unique content.

Has anyone else been effected by this, and does google intend to refine the algorithm to stop valid, unique content sites from falling victim?

 

Seo1




msg:707502
 5:37 pm on Jul 31, 2005 (gmt 0)

Hi Bear

I am sure there maybe a few that can generate 100,000 a day.

I was not painting a broad swipe....what I said was

if you add 100,000 posts to a forum one day it is compared to your site history as well as other websites history

As most forums did not start out adding 100K in posts per day their historical value is used to determine spam.

As far as how can a machine determine value one only need look at the amount of traffic to a website, this can be done by links pointing to a site and "probability analysis" and "trend analysis"

For instance if a search for "webmaster forum" continually brings up WebmasterWorld as the # 1 site then the probablitiy that the user will go to this site is very high, analysis of traffic levels then tells them that indeed WebmasterWorld is a destination site and offers more value than say

<snip>

[edited by: lawman at 9:12 pm (utc) on July 31, 2005]

reli




msg:707503
 5:40 pm on Jul 31, 2005 (gmt 0)

Seo1: "After reading many of your posts it looks to me like many have tried either scraping content, re wrote someone elses content (For those claiming not to scrape) or are involved in link directories."

Since you posted right after my post, it at first looks like it was directed at me. A non-casual reader can see it's not. Also, I don't have many posts, and my prior post was over a year earlier. I usually post towards protecting your own original ideas. I am on the has-gotten-scraped side, not the scraper side.

Seo1




msg:707504
 5:54 pm on Jul 31, 2005 (gmt 0)

reli

"I read many of your posts" refers to the posts in this thread.

Not anyone in particular nor should my posts be felt as if directed toward anyone unless I put their name in the post such as above.

Where I ended up when posting is chance.

We could use this though to figure out ways to protect content.. A new thread....

theBear




msg:707505
 6:04 pm on Jul 31, 2005 (gmt 0)

Seo1,

Please note I said value add, not value, two entirely different things.

In fact serp placement does not actually measure either value or value add.

Anything placed at locations one through three normally gets visited by close to 100% of searchers.

Out there somewhere is the top to bottom results from a study done using IIRC Google serps.

I can think of reasons why a site might add 100,000 posts to a forum in one day that would go against "history" but still be perfectly legit. Try unblocking a forum that was previously closed or changing out forum software by nuking the old directory and installing the new stuff in a new directory with old reformatted data.

We are trying to do black box analysis of what is not only a complex system, but one with intermittent errors in its processing. As the old IBM manuals say results may be unpredictable.

Seo1




msg:707506
 6:58 pm on Jul 31, 2005 (gmt 0)

Bear

I agree with your assesment that there may be times a forum would add 100K in one day.... however this would be a rarity, typically brought about by a major news event which Google can and does adjust for.

There are even the very rare adjustments to the forums themselves... but this is only reposting what was already posted, again not an issue for google.

My points are those who think they have a bright idea by adding 100s of pages or forum posts a day to gain an advantage to higher rankings..... their ideas are not very bright, and usually end in disaster, as opposed to the effect hoped for.

theBear




msg:707507
 8:38 pm on Jul 31, 2005 (gmt 0)

Seo1,

Actually what was posted was that there was a forum with over 100,000 posts and the only reference to timespan was years in that post.

What Google can do and does do may not always be close to being the same.

I've spent a long while in the bit twiddling, data mangling, end of things.

I've seen simple errors result in lost money, jobs, and even companies.

I am interested in finding out as much about each site as I can.

I could care less about the ways Google could do something and until I see the sites in question I'll reserve judgement.

Even then I'll only comment on what I see. If what I see flies in the face of what I've been told or read, then what I've been told or read will be placed in the scrap heap.

Now I invoke rule 4 and go back to model railroading.

Andem




msg:707508
 12:39 pm on Aug 1, 2005 (gmt 0)

Okay Seo1,

Just remember that there have been 100,000 posts over the past couple of years so the adding 100ks pages a day does not apply to my particular scenario.

Seo1




msg:707509
 1:22 pm on Aug 1, 2005 (gmt 0)

Andem

Again it is generalized posting not meant to single anyone out..

They are meant to make people think.

Numbers could be 100 pages if the webmaster usually added 1 page a day and then jumped to say adding 100 [pages per day, google could look at this as a spam attempt..

Again nothing personal is meant.

I don't even know your website so I cannot comment on it

jcmiras




msg:707510
 3:00 pm on Aug 1, 2005 (gmt 0)

Does Google also purge Google directory? as far As I know, all of its content came from dmoz, they just add something new like PR but its purpose is still the same? Additional question, what is the symptoms if your website has been purge? Does your website does not have cache in Google or just does not appear in search results? thanks.

europeforvisitors




msg:707511
 3:14 pm on Aug 1, 2005 (gmt 0)

Does Google also purge Google directory?

No, but Google probably doesn't expect other search engines to index its directory pages, either.

Seo1




msg:707512
 3:58 pm on Aug 1, 2005 (gmt 0)

For jcmiras

You wrote

Does Google also purge Google directory? as far As I know, all of its content came from dmoz, they just add something new like PR but its purpose is still the same? Additional question, what is the symptoms if your website has been purge? Does your website does not have cache in Google or just does not appear in search results? thanks.

No Google doesn't purge their directory.

Nor are all directories bad. Most are very good, it's the type that are used for the wrong purposes that Google has a problem with.

Your question about Google purging your site is not clear,

Simply type in your full URL into the Google search box to see if you are in the index

[yoursite.com...]

or

site:www.yoursite.com will work as well.

If you find your site then you are in the index...if it returns a reply site not found it would appear they have dropped you.

Hoever do not panic..there could have been a server glitch when googlebot came to visit that caused it. Wait a few days and if nothing is resolved then you may need to contact them directly.

Clint

sunflower12




msg:707513
 4:28 pm on Aug 1, 2005 (gmt 0)

"My largest and highest quality 5 yr old website was also banned on 7/28, apparently because of the new scraper filter.
I have access to a lot of search data and have a few observations about newly banned websites:

- All have a link directory
- Most have adsense
- Filter is indiscriminate to site age, size, or quality (Many are old legit established sites)
- Algorithmically Banned on 7/28 (Unless Google has army of eval.google trainees that weren't properly trained)

Do your websites follow my observations? Do you have any data that contradicts them?

I do not have a link directory and I do not have adsense. I have a site that has been in the top 10 for the main keyword and many others for five years. I work on it every day by hand with front-page. Authors submit articles to me for free! and companies submit their press releases to me for my topic for free! It took me years to build up my reputation. I have a few advertisers on my site from companies who trade on the stock exchange. One has written to me and wants to break his contract ever since this happened. Of course I will send him his money back. It's humiliating not to be indexed in google and to have zero rankings with no backlinks. At this point, I would not want to advertise on my site either. I really don't see any light at the end of the tunnel. I don't see google answering my e-mails or listening to me. It's amazing how google can just drop someone for no reason and not care about their livelihood. Just treat you like trash.
And I realze that this has happened to others, not just me. What is so sad is that there is no-one at google who will talk to you, help you or listen to you. I am still in the top ten at Yahoo and MSN, so go figure. But as we all know most of our traffic comes from Google.

asamuel




msg:707514
 4:43 pm on Aug 1, 2005 (gmt 0)

I have also fallen prey to this filter..after reading this thread I am beginning to see my problem of duplicate content.

I run a real estate site and work on behalf of many other people to promote their products. I am sent information and place it on the site. I can see that google may look at this and see me as a scraper site.

So the big question: how am I going to be able to promote other peoples products without falling into this trap and surely this only leaves room for a small amount of big sites to take the market.

If this thread is to be interpreted, I may have to rewrite all the descriptions of every product I promote, does anyonme think this is correct?

Any other real estate webmasters have the same problem?

hdpt00




msg:707515
 5:05 pm on Aug 1, 2005 (gmt 0)

I run about 16 completely whitehat sites, very minimal interlinking only to seperate C-class IPs. Sites are somehwat related anyhow.

Couple weeks ago after PR update, all new sits get PR (4-5). Few days latrer all of my sites, yes all 16, get PR0'd, banned, nothing via site: for any of them. I DOMINATE yahoo and some of my sites are the most recognized/popular site for the niche. Email google and say I spend a crap load on AdWords, almost 5 figures a month, and nothing. Call my rep, nothing. Sent out another email yesterday.

This is complete crap, and was obviously a hand ban. I search for some of my content via google, guess what...? I find a site that completely copied me, fully indexed. Sweet, I get punished and they stay alive. I told google about this, luckily nothing has been done. Goodnight google, I've never realy cared about you, but now I just don't like you.

Buddha




msg:707516
 5:27 pm on Aug 1, 2005 (gmt 0)

sunflower12,

Do you have lots of pages with outgoing links? Do you post a lot of content that can look scraped?

ie. If you post articles and press releases, those could look duplicate or "scraped" if googlebot happens to find that same content first elsewhere.

Scraped or Duplicate content + lots of pages with outgoing links + ads could possibly trigger the filter.

sunflower12




msg:707517
 6:03 pm on Aug 1, 2005 (gmt 0)

"sunflower12,
Do you have lots of pages with outgoing links? Do you post a lot of content that can look scraped?

ie. If you post articles and press releases, those could look duplicate or "scraped" if googlebot happens to find that same content first elsewhere.

Scraped or Duplicate content + lots of pages with outgoing links + ads could possibly trigger the filter."

Hi Budda,
Thanks for taking the time to reply to me. What's amazing is that there are new sites that try and copy my format that have not been banned. What am I supposed to do? I do not do this to try to gain rankings for keywords. I do it because I try to maintain as much information on my site for my readers and subscribers about the topic that my site deals with. I love posting new articles and press releases on my site. I really love what I do.

europeforvisitors




msg:707518
 6:13 pm on Aug 1, 2005 (gmt 0)

If you post articles and press releases, those could look duplicate or "scraped" if googlebot happens to find that same content first elsewhere.

Google is unlikely to ban a site for using press releases or syndicated material. If that happened, every newspaper that runs AP stories or Ann Landers columns would disappear from the index.

On the other hand, it wouldn't be unreasonable for Google to exclude secondhand press releases, AP stories, and Ann Landers columns from the SERPs if it could identify them as duplicate content.

JKMitchell




msg:707519
 7:34 pm on Aug 1, 2005 (gmt 0)

Does Google also purge Google directory? as far As I know, all of its content came from dmoz, they just add something new like PR but its purpose is still the same? Additional question, what is the symptoms if your website has been purge? Does your website does not have cache in Google or just does not appear in search results? thanks.

No Google doesn't purge their directory.

But.... I have an 8 year old site in DMOZ, recently banned by Google (apparently for having an on target directory) and this is no longer listed in the Google directory. I think this means that they do purge thier directory.

andrea99




msg:707520
 7:39 pm on Aug 1, 2005 (gmt 0)

I think this means that they do purge thier directory.

Yes, I have a similar experience. My site remains in the DMOZ but it disappeared from the Google Directory about 10 days before the 27/28 ban. I noticed this at the time, the little green book logo nxt to PR disappeared. I thought this may be ominous, I had no idea how much so...

sunflower12




msg:707521
 7:40 pm on Aug 1, 2005 (gmt 0)

"But.... I have an 8 year old site in DMOZ, recently banned by Google (apparently for having an on target directory) and this is no longer listed in the Google directory. I think this means that they do purge thier directory."

Yes I agree. I am still in DMOZ but not the google directory and I was there before I was banned last week.

sunflower12




msg:707522
 7:41 pm on Aug 1, 2005 (gmt 0)

"Google is unlikely to ban a site for using press releases or syndicated material. If that happened, every newspaper that runs AP stories or Ann Landers columns would disappear from the index."

Then I wonder why I was banned?

andrea99




msg:707523
 8:29 pm on Aug 1, 2005 (gmt 0)

Then I wonder why I was banned?

Carve this into stone for me please. If there were no guessing and no mystery the scammers would game the system until it collapsed. Google is being heartless and yes, evil.

We need to promote Google's competition relentlessly, it is the only way out of this hole. Bad mouthing Google is a bad strategy, just promo Y, M and the metasearch of your choice. I like Dogpile and ixquick.

It's abg (anybody but goog)--just don't say that out loud (bad form).

reli




msg:707524
 11:11 pm on Aug 1, 2005 (gmt 0)

hdpt00 - guess the old definition of "whitehat" is out the door?

Webdetective




msg:707525
 12:08 am on Aug 2, 2005 (gmt 0)

And yes you can get any site unbanned though there maybe a few hardcore spammers who can't ...again your history shows you as someone not wanted around.

In the past three years I have helped 8 banned sites get back in.

Seo1,
I had 5 sites suddenly dropped completely out of the Google index July 29, but it's a real mystery trying to figure out if this is because of something I did or just part of the collateral damage. 2 of my sites were also dropped from Yahoo in June, but I don't know if there's any relationship.

Since we're not allowed to post URLs here, I would like to send you a few of mine in a sticky note if that's ok with you so I could ask you to take a quick look at them.

I use a small reciprocal links directory that I have to manually manage, and it does contain it's share of unrelated links, gambling, pills, etc... but the links are subdivided into their own pages, so all related links are on their own page. I also have a few related and unrelated homepage link exchanges. I am using a page generation software "Rankingpower" but nobody, including google seems to have an answer for that.

walkman




msg:707526
 12:22 am on Aug 2, 2005 (gmt 0)

>> I use a small reciprocal links directory that I have to manually manage, and it does contain it's share of unrelated links, gambling, pills, etc... but the links are subdivided into their own pages, so all related links are on their own page. I also have a few related and unrelated homepage link exchanges. I am using a page generation software "Rankingpower" but nobody, including google seems to have an answer for that.

I think you have the answers.

Webdetective




msg:707527
 12:37 am on Aug 2, 2005 (gmt 0)

Should I drop just my unrelated homepage links, or also my unrelated links directory links too? What if some of my software generated pages are doing well on MSN? Maybe I could exclude slurp and googlebot from those pages with robots.txt

WebFusion




msg:707528
 1:20 am on Aug 2, 2005 (gmt 0)

...or perhaps you should stop trying to trick the engines and build sites for the users. That's what the engines want anyway.

Rollo




msg:707529
 2:08 am on Aug 2, 2005 (gmt 0)

I don't think the message could be clearer, if your reason for being is selling links or pitching ads and you don't provide some useful, perhaps unique, info, goods or service, then you won't survive in Google.

I agree that scraper sites and directories are about as usful as panhandlers who attatch themsleves to you on the pretext of giviing you a tour, but that doesn't mean they should be rounded up and shot.

I don't have a directory but I don't know why Google just doesn't warn people and give them time to clean up before they take everyone down. I'm sure a lot of folks, many innocent ones too, were badly hurt. One post from Google Guy stating that Google is considering penalties for such and such would spread like wildfire and would bring about change in an instant.

ning




msg:707530
 2:57 am on Aug 2, 2005 (gmt 0)

If Google doesn't like dmoz clones, why are dmoz promoting webmasters to make dmoz clones?

walkman




msg:707531
 3:10 am on Aug 2, 2005 (gmt 0)

>> If Google doesn't like dmoz clones, why are dmoz promoting webmasters to make dmoz clones?

maybe cause DMOZ is not Google? Plenty of companies and people promote things, its up to the webmaster to choose wisely.

reli




msg:707532
 3:24 am on Aug 2, 2005 (gmt 0)

Not all clones are created equal. (this defies the definition of clone, tho)

This could be a work-in-progress. My DMOZ sites have not been affected. Either they will be affected, or G will pop DMOZ-fed sites back in, or other factors/issues will get noticed by us as the probable cause.

I've asked if these dropped sites are sites with 10,000 or 20,000 pages of indexable DMOZ pages, or just a script with a /cgi-bin/?odp_cat=motorcycles type of dynamic feed? Can anyone say? It might help people not over-react to the wrong symptom/cause correction.

If you created a site with thousands of DMOZ cloned pages (I lacked the tools to do it, and figured I'd not refresh it often enough)... then maybe the penalty is due to introducing a static "historical" version of DMOZ? G's own copy is static, so it doesn't seem like a logical conclusion, I know.

I'm not sure why non-DMOZ hand-edited directories would be banned. But it could be other sins, or a huge complaint-catch-up rolled out at once.

[edited by: reli at 3:29 am (utc) on Aug. 2, 2005]

This 268 message thread spans 9 pages: < < 268 ( 1 [2] 3 4 5 6 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved