homepage Welcome to WebmasterWorld Guest from 107.20.25.215
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 44 message thread spans 2 pages: 44 ( [1] 2 > >     
My First Ever Whiner Thread - Why Does G Still Punish Quality Sites
Google punished education sites and rewards spam sites that copy content
devitnow




msg:764460
 9:06 pm on Nov 4, 2005 (gmt 0)

People come here whining all the time about how their 'worthy' site has suddenly fallen on hard times. My thoughts are usually, 'wait it out', 'develop better content for visitors', 'things will correct', 'don't think or act short-term' that sort of hardliner attitude. Now it's time for me to do some whining of my own. Because after almost a year of my own site fallen on hard times, I'm tired of seeing spammy sites that link to mine ranking number 1 while my site is buried into oblivion.

I'm in the group of quality sites that were all destroyed in the SERPS around December 22 of last year. It looked like in late September things were corrected but I'm back in exile once again.

WHY MY SITE IS QUALITY AND NOT SOME POSER
I feel I must explain that my site is actual quality and I'm not some poser who throws up a few pages, waits 3 weeks and comes here complaining to you.

1. My content. I provide content that is absolutely unique to the Interment. It can be found no where else. With my content, it is absolutely free and there are no banner ads or solistations running on my page. This content, because it's unique is very expensive to develop. I'm not talking my time but with hard costs. My content is so good I get thank you emails daily thanking me for such a valuable resource. Yes, I have an ego and I do it for the kudos.

2. Who links to me? The very best sites in my area. I have links from all ivy league universities, education sites around the world, personal education sites, U.S. Library of Congress, etc. These are all high value links from solid authority sites like mine with long established time on the net.

3. My past SERPS. Number 1 for searches that produces results in the 5 to 20 million. About 20th and for single word searches that produces results around 45 million. Site visitors, about 300,000 uniques per month. Hey, I was happy, people searched for things that match my site content and I was found.

WHY I WHINE BEFORE YOU TODAY

1. I'm tired of doing a site search of my homepage site title (which is unique I have non dictionary sit title) and seeing my site running under another domain. I'm not an seo export so I don't know how that's done but I don't like it.

2. I'm tired of doing G search on absolutely unique text content and see not my site, but spam sites that pump in MSN search results.

3. I'm tired of doing G searches of my site name and see all the sites that link to me. How ironic, I have a 100% unique site name and what comes up is not my site but all those who link to me.

4. I'm tired of all the sites that provide direct links deep into my content who steal my page text because they're too lazy to write it theme selves be returned in the results while my site is nowhere.

5. I'm tired of searching for unique text content on my site and seeing spammy sites that cloak my DMOZ site description and pump out pay per click ads.

6. Finally, I'm tired of searchers looking for my world class content not finding me.

WHAT WILL I DO
I'm still of the do-nothing position, but how long is long enough?

WHAT I WOULD LIKE TO HEAR FROM YOU
If you made it this far, thank you, I would like to know if you think I am a whiner and should just shut up and go away or is something broken with G that you think they will fix?

As I said, this IS a whiner thread, but with some merit I belive, I would feel a bit better to whine further with other education/quality sites that have experienced a similar history :-)

thank you,

Devvy

 

tigertom




msg:764461
 10:12 pm on Nov 5, 2005 (gmt 0)

It seems the key is to stop other sites copying your content verbatim.

You should:

1. Set up a bad-bot-banning script on your site. There's one offered in this forum.
2. Forbid visitors from Russia, China, Romania etc. using .htaccess.
3. Use absolute URLs in your internal links.
4. Have a script generate a small amount of random content in the HTML of each page.
5. If copyists use Adsense, report them to Google via the Google adword link in each ad.
6. Insert a frame-breaking JavaScript in each page.

1. is to stop leechers;
2. is to stop countries that might try leeching;
3. is to have lots of links to your main site in any HTML copied from you;
4. is so yours is the most recently updated version of the page;
5. Is to rob them of the reason reason to rip-off your content.
6. is to break a stolen page out of any page framing it.

Individually these tricks wouldn't do much. Together, however, they will 'harden' your site.

Kimkia




msg:764462
 2:48 am on Nov 6, 2005 (gmt 0)

Devvy,

I am sympathetic to your plight, and feel much of your angst myself. I can't claim such highbrow incoming links as you have, but I do get natural links all the time from people recommending my content. In fact, 50% or more of my hits come from links left by people in forums, recommending a page on my site. I also get daily "thank you" emails.

The problem is that high quality, original content becomes the prime target for substandard directories, scraper sites and content thieves. The result is the disheartening experience that you have had, and I have had as well. Nothing is worse than seeing a snippet of your original text appear as a description for a page in Google SERPs, only to get on that page and discover it has ripped off your content, word for word; manipulated the keywords to rank higher in Google than you do; or simply scraped off the keywords in a jumbled mess and thrown up a pure spam site which will make them a huge amount of money even if it has only limited life in the search results.

TigerTom has some good advice...although I'd appreciate some more detail on his number one listing. Bad bot ban? I've followed all the rest of your suggestions, but not sure how to proceed with this one. And what do "bad bots" do, exactly?

Fryman




msg:764463
 6:32 am on Nov 6, 2005 (gmt 0)

lol... as always, people use spam meaning "sites positioned above me"...

tedster




msg:764464
 6:45 am on Nov 6, 2005 (gmt 0)

Site's postioned above mine WITH MY CONTENT is a horse of a different color, I'd say.

Iwrite




msg:764465
 6:49 am on Nov 6, 2005 (gmt 0)

I hear you. Just one experience of the internet taking and using an article I had written and had published offline as well as on, and finding that it was used by sites set up just to make money from google upset me.I want people to come to MY site to read; though I don't think it was actually picked up off the site. One thing I did think though it how, having read what you wrote, I was dying to know what the unique niche you had was, and wanted to find it just because what you wrote really hooked me in! Maybe you could use that as a hint for an advertising campaign!

Iwrite

pincher34




msg:764466
 7:46 am on Nov 6, 2005 (gmt 0)

The simple answer to your question - because Google's algo is far from perfect.

If you want a little elaboration - in an effort to improve the OVERALL quality of their index, Google is not afraid to alientate a relatively small percentage of sites.

That said...

In addition to the algo being imperfect, there's the sandbox, which can remove relevant sites from the SERPs.

Also, their lack of ability to remove 410/Gone sites in a timely manner (to keep their over inflated index count up, IMO) and their inability to distinguish the difference between www and non-www (dup penalty) pages also can remove relevant pages.

IMO, that means that Google removes quite a few relevant pages from their SERPS (but conveniently still counts them in their billions of pages indexed).

I wonder how big their index would be if you take out the sandboxed, 410, and supplemental pages...

tigertom




msg:764467
 11:18 am on Nov 6, 2005 (gmt 0)

Google-bashing is of no help to this man. What he, and I, am interested in is tricks to stop your site going down the SERPs, or disappearing, due to plagiarism or antagonistic black-hat techniques.

If the mods don't leave in the link below, perhaps they could point readers to the latest, definitive, versions of the 'bad bot' scripts here; there are Perl and Php versions.

[google.co.uk...]

Anyway, type in 'site:www.webmasterworld.com bad bot script' in Google.

I'm writing my own version. You use 'robots.txt' to disallow a subdirectory. After two weeks, you set up the bad-bot script. The idea is that a search engine bot that indexes a subdirectory which you have disallowed, is a bad bot. It's just hoovering up your data. It's not from a legit search engine. So it's probably a competitor, or a leech.

The script rewrites your .htaccess file to forbid it access to your site altogether.

Webmasters are keen to ban some bots to save bandwidth also.

HomeSurfer




msg:764468
 12:29 pm on Nov 6, 2005 (gmt 0)

Devitnow, I'm in sympathy with you.

I could spend eight hours a day for a year or more seeking out web sites using my original content without permission. Fortunately, our site is the authority, we do well with search engines, and the others basically have duplicate content.

However, we make a modest income from licensing content and it burns me up when I continuously run across people are using the content without providing compensation. I don't "go looking" for it, but constantly run across it when reviewing web sites in our industry.

Unfortunately, though our web site is big, I'm the writer and we're just a few people and don't have the time or money to chase everyone down. When we do find someone, they always ask, "which pages are you referring to?" and we're supposed to go through their pages and tell them which ones have our content and tell them which pages need to be deleted.

Once we identify the pages, the web site owners say "I didn't do it. I got the content from my web site developer." Then the web site developer says, "I didn't do it. I paid someone to develop some content for us." That alleged person can never be tracked down, of course.

Truthfully, I don't think there is a solution. People should be ethical, but all of them are not.

But I do empathize with you.

BillyS




msg:764469
 12:39 pm on Nov 6, 2005 (gmt 0)

Welcome to the club. Other sites steal my content all the time and outrank me.

I don't know why Google hasn't cracked this one yet. To me this problem proves Google cannot detect duplicate content from website to website - only within a website.

The fact they cannot detect were something appeared first encourages others to steal articles. This is a major shortcoming of all search engines - not just Google.

Johan007




msg:764470
 12:59 pm on Nov 6, 2005 (gmt 0)

HomeSurfer a free to use website CopyScape can do much of the job for you.

tigertom




msg:764471
 1:17 pm on Nov 6, 2005 (gmt 0)

I wouldn't debate the matter with them. Give them 48 hours to remove it, then inform their ISP.

"My web developer did it". Ha! Oldest cheap-crook get-out whine in the book.

PS: That CopyScape is a good site. Most interesting.

twebdonny




msg:764472
 3:00 pm on Nov 6, 2005 (gmt 0)

"Have a script generate a small amount of random content in the HTML of each page"

any ideas on this one?

I would like to place such a script into my pages
but cannot find any such script.

Thanks

tigertom




msg:764473
 5:33 pm on Nov 6, 2005 (gmt 0)

Find a random quote script at hotscripts.com that can be called using Server Side Includes for static HTML pages, or a PHP one that you can insert in the footer file of your Php site.

Replace their quotes with some of your own. The greater number of ON-THEME 'quotes' (your blocks of text or HTML) the merrier.

LostOne




msg:764474
 9:22 pm on Nov 6, 2005 (gmt 0)

I really feel for the guy. I hope you overcome it!

devitnow




msg:764475
 12:55 pm on Nov 7, 2005 (gmt 0)

hi everyone,

thanks for all the help and suggestions.

Question - how will random text help fight this problem?

devvy

tigertom




msg:764476
 3:25 pm on Nov 7, 2005 (gmt 0)

I understand it will make your page seem to be the latest, updated version of the content.

As it won't change the 'file last modified' environmental variable, I'm not sure how that works, but the tips I gave were all the ones I remember reading in this fine forum.

stapel




msg:764477
 3:36 pm on Nov 7, 2005 (gmt 0)

devitnow said:
I have links from all ivy league universities, education sites around the world, personal education sites, U.S. Library of Congress, etc.

I thought part of Google's ranking system involved weighing the relative "value" of in-links. As such, how could scrapers with in-links from link-farms rank above a clearly authoritative site with in-links from "valuable" sites?

This doesn't seem reasonable -- unless Google has completely changed the basis of its algorithm...?

Eliz.

Note: I have an educational site of my own, with similar types of in-links and a good ranking in the major search engines. I encounter almost constant copying -- though Copyscape has been a wonderful tool for cutting down the theft to almost nothing, of late. But the experience recounted here, if accurate, augers poorly for my own site, so I would appreciate further information, if possible. Thank you.

devitnow




msg:764478
 3:59 pm on Nov 7, 2005 (gmt 0)

that's a great question, I wish I knew. I have links from PR 8s, 7s and 6 pages and most links go into my subpages.

I wonder to myself why my site has been effected while all of those who I have historically competed with in the SERPs look largely unchanged.

My only observations on why perhaps G sees my site differently is that:

1. I use a template/database driven site structure
2. my site is 5 years old while those I compete with (because they are education) are typically older and with more hand edited html pages
3. I have a large number of pages indexed in G because of a very active forum

As I speculate, I imaging that G takes into account some of the above characteristics and perhaps it plays against me in determining my site's authority and authenticity. But this is just speculation because there are always plenty of sites out there which are exceptions to my three points.

From what I can tell G has completed devalued my site. Important subpages with quality incoming links show no cache and my homepage has not been cached in almost 2 weeks when in the past it was updated every 24 to 48 hours.

I'm going to explore the use of these scripts you all have suggested for keeping the scrapers out. Perhaps it will help.

wanderingmind




msg:764479
 4:32 pm on Nov 7, 2005 (gmt 0)

It could also be a professional 302 hijack. Some of what the poster said are symptoms of that.

I don't mean 302 hijacks towards a few pages. I mean massive attacks, thousands of cloaked pages that display your content only for Googlebot, thousands of 302 and meta refresh redirects.

I have a site - reputed and gets 20,000 visitors a day, has links from universities and even Amnesty internatonal and unique created generated by journalists, and still got hit by those. I suggest that the original poster read up on 302 hijacks (the basic symptoms) in this forum itself and confirm he is not under attack.

(Jagger 3 has made 5 % of my pages come alive back.)

devitnow




msg:764480
 4:36 pm on Nov 7, 2005 (gmt 0)

thanks, i'll read up and learn more about what hijacking is.

I do know that I see my site running under other domains, but my page under their domain looks slightly older.

tigertom




msg:764481
 4:46 pm on Nov 8, 2005 (gmt 0)

For the record, other tips might be:

7. Use Google alerts to email you when your domain name turns up on a site they spider.
8. DANGEROUS: Make the sub-directory/page denied by robots.txt to be chock full of poison words and image names**.

Reasons:

7. Today I got an email about a site containing a URL of mine. Went to the site. Nothing but Adsense; no other text at all. The Google Alert email showed text I recognised as being mine. No sign of it in the HTML of the 'linking' site. Ratted them out to Google. Bye-bye their Adsense account (hopefully).

8. Just an idea I had. Have not tried this myself, and probably won't. Could easily get your site banned if you do it wrong. Do it if you really don't care any more, and you're an experienced webmaster.

**-> You should also forbid access to the subdirectory, using .htaccess, to important SE bots, to be on the safe side.

comicsrus




msg:764482
 5:59 pm on Nov 8, 2005 (gmt 0)

devitnow has described my situation about 90%.
I still have Top placement for my main two Keywords (out of 100/85 million respectively), but 70% of my hits come from a few hundred pages that are in sub-directories.
I have unique content, no popups, no popunders, and try to list only relevent ads from Amazon and a few other affiliate programs on the individual pages.
I've been working on my family-friendly, unique contenet site since 1999.
I had a very consistant growth of 15-20% since day two.
The May 21 update killed me. I came back October 15, and died again yesterday (Nov 7).
The work and research I've put into the site have made me a mini-enterprise.
I guess in the big picture, I'm still a fairly large "small" site (currently 7,000 daily visitors), but following the rules since day one, without spamming, offering easy and free content, it became a full time job.
Losing 12,000 visitors a day (thanks to Bourbon then Jagger) has brought me back to newpaper route type income.
I guess I could just sign up for some popups and unders to replace the $$$, but it kinda sucks for the average internet user when that happens to more and more sites, or is that just me?
I can bag the whole-family (and school friendly) thing, and go for some gambling, girlie and dating ads.
I have hundreds of schools linked my way, but what the heck? A buck'$ a buck, right?

I'm hoping things come back during the upcoming holiday/Christmas season, but I'm losing faith in that.
We'll just have to wait and see.

DaveN




msg:764483
 6:05 pm on Nov 8, 2005 (gmt 0)

pro hijackers are using a google.com/302 page to weaken the site before hitting it... if you can remember the MSN listinga and Google hijack thing?

Dave Naylor

Trisha




msg:764484
 7:04 pm on Nov 8, 2005 (gmt 0)

I would like to know if you think I am a whiner and should just shut up and go away

No, its ok - and if you don't mind I'd like to whine a bit with you.

My main site got lost in Bourbon, came back on 9/22 and since yesterday, seems to have lost all (or nearly all) google traffic again. No blackhat stuff. I wouldn't mind it all so much if there was just some way to get some sort of feedback from them so I would know what it is they don't like so I could change it. As it is I'm completely lost and I don't think I am alone.

tigertom




msg:764485
 8:18 pm on Nov 8, 2005 (gmt 0)

Instead of whining, put your thinking cap on. What's needed in this forum is a 'Remedy' thread for each update: no whining, no 'I was top and now I'm not', just a list of things to look at.

It'd save having to scroll through screeds of whinging, Google-bashing and data centre twitching. It's tiresome.

The algorithm changes, some sites get dumped. Tough.

From memory, here's a list of things that might do you down, and more I thought up myself:

1. Off-theme reciprocal linking;

2. (Excessive) linking between sites on the same IP or web space (your own mini-network);

3. Unnatural link growth (lots of links in a short space of time);

4. Unnatural link loss (a link network drops you);

5. Duplicate content: 20000 pages with little unique content in each;
5a. A site run from a database, with header, footer, and sidebars the same on each page;
5b. (Possibly) In a competitive niche, where all the so-called 'unique' sites are full of mediocre, marketing, template-driven drivel;
5c: (Possibly) Run from a database software like Mambo, and nothing done to change the HTML it outputs;

6. Another site doing 302 redirects to yours;

7. Another site copying your content.

8. (Possibly) 8000 low PR links, and very few decent, authority sites linking to you (unnatural).

9. The addition of 1000's of pages suddenly;
9a. Use of automatic page generation software;

10. Linking to bad neighbourhoods;

11. Lots of affiliate links (use rel="nofollow" in these)
11a. Using a redirect script for your own links.

12. Running a directory, especially with entries drawn from DMOZ or other SE results.

Plus the usual grey SEO tricks done to excess: e.g. excessive use of <H1> tags modified by changing the font size so it's not actually H1 size, that sort of thing.

In other words:

'Authority' sites that get links from other quality sites, from diverse sources, will triumph over tricksy, 'me-too' upstarts every time, in the long run.

The've got a USP, the world and his mother link to them, it's gravy all the way.

Trisha




msg:764486
 8:46 pm on Nov 8, 2005 (gmt 0)

Instead of whining, put your thinking cap on.

I agree, but I've thought of everything I can think of, I have no idea what the problem might be at this point.

1. Off-theme reciprocal linking; Never did off topic linking

2. (Excessive) linking between sites on the same IP ... haven't done this either

3. Unnatural link growth (lots of links in a short space of time); nope

4. Unnatural link loss (a link network drops you);
don't think this is it

5. Duplicate content: 20000 pages with little unique content in each;
I did have some datafeeds on a couple of sites - but I took those down months ago. Not nearly as many as 20000 though

6. Another site doing 302 redirects to yours;
this I don't know, and wouldn't know what to do about it if it was happening.

7. Another site copying your content.
could be, but I wouldn't know how to stop it either

8. (Possibly) 8000 low PR links, and very few decent, authority sites linking to you (unnatural).
I've got an ODP link

9. The addition of 1000's of pages suddenly;
no, all pages are hand written

10. Linking to bad neighbourhoods;
I took down any suspicious looking links a while back on one site

11. Lots of affiliate links (use rel="nofollow" in these)
11a. Using a redirect script for your own links.
I had quite a few with the affiliate links, but again those were taken down a long time ago. I don't think the affiliate links that are left would be a problem

12. Running a directory, especially with entries drawn from DMOZ or other SE results.
no directory

Plus the usual grey SEO tricks done to excess: e.g. excessive use of <H1> tags modified by changing the font size so it's not actually H1 size, that sort of thing.
I use one H1 per page - according to proper html standards, and I keep them reasonably big. Although interestingly I heard someone else mention css being a problem with this update.

In other words:

'Authority' sites that get links from other quality sites, from diverse sources, will triumph over tricksy, 'me-too' upstarts every time, in the long run.

How do you naturally attract links to your good content if no one can find your site in the serps to know it exists to be able to link to it? That is a big problem I've had.

some others to add:

13) too many link exchanges of any type - I stopped doing these a year or so ago - I don't think what is left could do that much harm

14)paid links or those that look like them - I've never had the money to buy paid links. Do I have links that look like they may have been paid for? I don't know, since I don't know what SE's think a paid link looks like.

JuniorOptimizer




msg:764487
 9:11 pm on Nov 8, 2005 (gmt 0)

It's natural to somehow think of yourself as "penalized" when in fact, you may not be. A "weakened" website might be down in the rankings because of a number of factors that are not at all related to the website.

tigertom




msg:764488
 9:58 pm on Nov 8, 2005 (gmt 0)

6. and 7.:

- Use CopyScape, then contact the ISP of offending sites.

- Use a DMCA order (I don't know what that involves).

See my previous post in this thread about stopping site leeching (I doubt many copyists do it by hand).

- Try a solicitor's letter, copied to their ISP.

- If a plagiarising site is using Adsense, use the little 'ads by Google' link to rat them out.

- Maybe you could contact their other advertisers also e.g. "Your ad is displayed on this page here blahblah.com/copypage.htm, check out my page here www.mysite.com/original.htm. Notice any similarities?"

Nothing libellous, just point out an odd co-incidence [grin].

- Save any email correspondence as a template for future use.

302 redirects:
[google.co.uk...]

Use this to search on any other terms you don't understand. This forum doesn't have an internal search engine.

[edited by: tigertom at 10:06 pm (utc) on Nov. 8, 2005]

ap_Rhys




msg:764489
 10:05 pm on Nov 8, 2005 (gmt 0)

How do you naturally attract links to your good content if no one can find your site in the serps to know it exists to be able to link to it?

I wonder how GG and Matt Cutts would answer that?
But they won't...

This 44 message thread spans 2 pages: 44 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved