homepage Welcome to WebmasterWorld Guest from 54.226.235.222
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 268 message thread spans 9 pages: < < 268 ( 1 2 3 4 5 [6] 7 8 9 > >     
Scraper Site Clearout Collateral Damage?
Ian Cunningham




msg:707472
 10:18 am on Jul 28, 2005 (gmt 0)

It seems like google has purged many scraper sites from the google serps, as per this thread:

[webmasterworld.com...]

I'm sure many people, including myself are very, very pleased about this as it stops scumbag sites from stealing our content.

However, it also appears that some non-scraper sites have been included in this purge (including my own). My site has been active for 5 years and is based on unique content.

Has anyone else been effected by this, and does google intend to refine the algorithm to stop valid, unique content sites from falling victim?

 

Webdetective




msg:707622
 4:56 pm on Aug 19, 2005 (gmt 0)

That definition could probably include many keyword page generators like Rankingpower, since that particular program pulls it's "content" from the Altavista search results.

andrea99




msg:707623
 5:22 pm on Aug 19, 2005 (gmt 0)

As it happens, my 7-28 banned site has a lot of original content, both on pages full of text written entirely by me and as commentary about various listed links. I was however also using exerpts and meta descriptions from the linked sites to describe them and must have passed some threshold for that.

There are far too many pages on my site like this for me to change them legitimately in a timely fashion and using text-scrambling software would degrade the descriptions and make my site more like the scrapers themselves. But I may write a script to swap out certain synonyms wholesale.

I am disallowing googlebot from those pages for now (they still do well with Y and M but I fear these may have their own scraper crack-down soon as well). I am keeping my fingers crossed.

In the recent recovery process I accidentally used too many 301 redirects for the server load and probably confused slurp and msnbot too...

But I have almost four years work invested in the database and am working day and night to recover this. If you stop hearing from me you'll know I 'died trying.'

girish




msg:707624
 10:40 pm on Aug 19, 2005 (gmt 0)

Does the following code create a 404 issue?

ErrorDocument 404 /site/missing.html

girish




msg:707625
 10:52 pm on Aug 19, 2005 (gmt 0)

Sequence of my successful multi-path reinclusion request-

Following the July 28 tweak - I took the following five actions on July 31.

1. I made a request here:
http:// www. google. com/support/bin/request.py

2. and resubmitted here:
http:// www. google. com/addurl/?continue=/addurl

3. and did the following -- I conducted an "appropriate keyword search" on Google (you must conduct a similar Google search for your site to generate the appropriate link). At the bottom of page click "Dissatisfied? Help us improve" link and fill out the form with comments that my site USED to be in results such as this and others, but was completely deleted recently - why? And then I provided a link to my site that would have been appropriatre for that search & WAS the #1 result in Google.

4. and contacted google from the adwords page for this domain. I asked adwords support to forward request to the proper department for consideration.

5. This did not work:
help@google.com?subject=Reinclusion Request

-----------------
RESULTS:

Two days later, on Aug 2,I received following first reply: This is an automated reply to your message about adding your site to Google. blah blah

On Aug 6 I got my first reply from Adwords support: "Thank you for your message. This auto-generated email is to confirm that we received your inquiry.

On Aug 8 I got a personalized reply from Adwords signed by a Google Adwords Team rep: I will forward your inquiry to our User Support team, who will provide you with the assistance you've requested.

ON Aug 9 I got reply from the User team (altho not saying so) they reference the Adwords note: Thank you for your note. We understand that you're concerned about your site, www.------.com. Unfortunately, we are unable to send personal responses to all of the requests we receive to review individual website content.

Certain actions such as buying or selling links to increase a site's PageRank value or cloaking - writing text in such a way that it can be seen by search engines but not by users - can result in penalization.

I replied and told them what I had done and how I resolved the issues they raised.

On Aug 10 - I got following note- Thank you for your reply. We understand your concern and have passed your message on to our engineering team for further investigation. We appreciate your patience.

On Aug 11, I found my home page cached again but not indexing. Then on Aug 17th I'm back in top of serps.

Webdetective




msg:707626
 11:14 pm on Aug 19, 2005 (gmt 0)

girish,
How did you word your messages to Google? Did you admit to wrongdoing, apologize, promise not to do it again, etc..?

Today, I made sure my site was cleaned up of any more possible violations and replied back to Google's autoreply, but I haven't submitted my URL to Google's addurl page yet.

Please explain "appropriate keyword search." Do I choose the best keywords I want to rank high for? Does this go into the "Comments:" box? I didn't see a "Dissatified" link, but I guess that comes up after I submit.
Thanks

girish




msg:707627
 12:06 am on Aug 20, 2005 (gmt 0)

1. I wrote them what I thought was in violation of the guidelines (site wide linking in my case). I told them I had removed the links.

2. Go to a google search page- www. google .com

type in any of your keyphrases and click "search". Now scroll to the bottom of the SERPS page where you should see the following links:

Search within results ¦ Language Tools ¦ Search Tips ¦ Dissatisfied? Help us improve

Click the "Dissatified..." link. Then write your message

texasville




msg:707628
 4:14 am on Aug 20, 2005 (gmt 0)

I'm just curious...where does google state "no site wide linking" and what is their purpose in this? I know what google may THINK this is a sign of...but what does it really possibly signal?

Marcia




msg:707629
 4:17 am on Aug 20, 2005 (gmt 0)

They don't tell many specifics, else some would figure out what to get around - just my estimation, not based on actual fact. Certain things do show intent - and they are specific about not doing things to artificially boost rankings.

For example, does anyone actually PAY for sitewide links on a grocery site selling vegetables or on a household widget site for a travel site to 199 different cities thinking it's *advertising* and they'll get targeted traffic looking for hotels that will convert to sales?

Mountain View isn't too far from agricultural areas, but they did not just fall off the turnip truck.

Fighting Falcon




msg:707630
 8:50 am on Aug 20, 2005 (gmt 0)

Our site was doing very well until earlier today..

I was rather suprised to see visits from XXXXXX.google.com ...it was NOT GOOGLEBOT. It seemed as if someone from google ' mountain view' was visiting our site during the last week or so. Our ranking went up slightly during the course of the week.

Then this morning ..its gone entirely from the index.

The site's been up for last four five years and does not do any black hat stuff- although we did add a link exchange script recently which links with related sites like web designers or hosting companies.

Webdetective




msg:707631
 1:54 pm on Aug 21, 2005 (gmt 0)

What if a site once had known violations Ie: doorway pages, that put it at risk of getting banned from Google or Yahoo, but it wasn't, however those pages are still in the search engine's index long after the offending pages were removed?

If I removed all offending pages, and have a 404 redirect back to my homepage, should the search engine eventually remove the missing pages from it's index without penalizing the site for violations that have been long ago removed?

WebFusion




msg:707632
 3:01 pm on Aug 21, 2005 (gmt 0)

If I removed all offending pages, and have a 404 redirect back to my homepage, should the search engine eventually remove the missing pages from it's index without penalizing the site for violations that have been long ago removed?

If you got caught using one of the "scraper systems" (RP, TE, etc.), then I wouldn;t use a 404 redirect- I would serve the engines a straight 404 page.

(....and you might want to send the creators of those "tools" a lovely thank-you note for failing to disclose in their expertly written sales pitches how their tools can and will kill any site that uses them.)

jd01




msg:707633
 6:34 pm on Aug 21, 2005 (gmt 0)

Does the following code create a 404 issue?

ErrorDocument 404 /site/missing.html

No, it is the correct way to define a custom 404 document.

Justin

Seo1




msg:707634
 7:26 pm on Aug 21, 2005 (gmt 0)

Write to the owners that their programs would destroy sites?

Didn't anyone else read Googles Terms of Service?

Webmaster guideline?

Did you think Google would be different than any pther company and not protect it's assets?

Do you realize Google is an engineers heaven and those people love stupid challenges such as THP, Articlebot, Nichebot and others.....The Google engineers could write these simplistic scripts in their sleep and most certainly know how to weed them out.

Ummm where did anyone become dumb and not understand one of lifes simple rules

Cheaters never win....

They may for a brief period of time have some victories, but eventually they all usually get caught.

Adding 100s of pages of gibberish content stolen from other websites, no matter how well conceived, still trips spam filters just as adding 100s of links.

Anything that sounds too good to be true usually is..

To blame crooks for not telling you before hand that they are stealing from you, seems a bit odd ... As a former investigator, I don't think I've met one yet, who warned me he was going to steal from me or anyone else.

Webdetective




msg:707635
 8:04 pm on Aug 21, 2005 (gmt 0)

Webfusion,
Should I remove my 404 redirect alltogether so that everybody gets the standard 404 Not found page, or create a custom 404 page and redirect to that?

Some search engine indexes still think I have 500 "RP" pages on my site, even after I removed them a few weeks ago.

At one time programs like RP were ok with Google and other search engines, and many of the top web marketing gurus were actually recommending it. Like any other quick fix solution to web traffic, it has it's useful life then is one day frowned upon. The only real long term solution is real content.
Fred

WebFusion




msg:707636
 8:18 pm on Aug 21, 2005 (gmt 0)

At one time programs like RP were ok with Google and other search engines

The fact that the gnines took awhile to "catch on" to these spam machines nver meant they were "OK" with them. Cloakers can (and in some cases, still do) also rank well in many engines - but they were never considered "ok".

and many of the top web marketing gurus were actually recommending it

Of course they were/are. They stand to make a bundle by selling these things (or are simply doing abother "guru" a favor by endorsing it, which they will return by endorsing THAT person's next "great thing".

The only real long term solution is real content.

The only thing that has reall y stood the test of time for me. Aside from occasional blips due to some kind of glitch in the algo/crawling activity, etc.....my content based sites maintain stable rankings for years.

Webdetective




msg:707637
 8:29 pm on Aug 21, 2005 (gmt 0)

ErrorDocument 404 /site/missing.html

Do I create a directory /site/ create my own page missing.html to put in there, with a link back to my homepage?

Should I not be using my own custom 404 missing page, and just let the server give the standard error page until the search engines have removed all traces of my now missing RP pages?
Fred

g1smd




msg:707638
 8:52 pm on Aug 21, 2005 (gmt 0)

Yes. Put error pages for each of 401, 403 and 404 in that folder.

Make sure you also upload a BLANK file (completely blank - nothing at all in it) called "index.html" to that folder to stop people seeing the directory index.

Make sure you have a "Disallow" command in your robots.txt file to stop that directory being spidered too.

Seo1




msg:707639
 9:34 pm on Aug 21, 2005 (gmt 0)

Stands and applaauds Web Fusion

Nice work!

Erku




msg:707640
 10:54 pm on Aug 21, 2005 (gmt 0)

Guys forgive my question here,

But can someone explain here how we can check if Google's new changes have effected our sites.

If you publish another content with a permission who can you be penalized?

Thank you.

Seo1




msg:707641
 11:14 pm on Aug 21, 2005 (gmt 0)

Hi Ekru

Google doesn't penalize for Dupe Content used with permission.
If they did many of the leading news websites would not be around.

What they do is give the first found publishing of the document the most weight, and everyone else earns a filtering of the page to lowered results listings or the page may be found under the supplemental results.

Hope this helps

Erku




msg:707642
 12:05 am on Aug 22, 2005 (gmt 0)

Thank you Seo,

Could anyone please explain, my traffic has declined significanly since early August. Is there any search engine update or anything like that? We have thousands of pages and don't know what may have caused this.

jd01




msg:707643
 4:11 am on Aug 22, 2005 (gmt 0)

ErrorDocument 404 /site/missing.html

Do I create a directory /site/ create my own page missing.html to put in there, with a link back to my homepage?

Should I not be using my own custom 404 missing page, and just let the server give the standard error page until the search engines have removed all traces of my now missing RP pages?

You can put the file where you like - it does not matter which directory it is in, only that the path you put to that directory is a server relative path. IWO only includes the necesary path after the base http://yourdomain.com EG /whateverdirectory/whateverfile.html OR /whateverfile.html OR / OR /directory/

I personally prefer a standard 404 page, but I do extensive checks on my internal links and rewrite any incorrect inbound links to the correct location if I cannot (or while I am waiting to) get them changed.

The only thing to be sure of is that you do not use the canonical location in defining where the file is ( http://yoursite.com ) The reason is, this is by default, an external redirect, and could be going to any site anywhere, so the server *correctly* serves a 302 redirect code, not a 404. (302 is usually refered to as 'temporary' redirect, but technically it is an 'undefined' redirect - a 307 is a true temporary redirect.)

Custom error pages are fine, the problem is if they are incorrectly defined by the server header as anything but a 404, the page will be assumed to have been moved to the new location for an undefined period of time, and unfortunately most major SEs are HTTP compliant (or close) and they correctly request a page with a temporary or undefined redirect code from the original location (the one you would like to be removed), not the new location - IOW your pages will never be removed from the SEs, because they are following the standard and requesting the page from the original (now empty) location and being redirected to the new location.

Hope this helps.

Justin

nzmatt




msg:707644
 8:45 am on Aug 22, 2005 (gmt 0)

Very soon google will have nothing in their index... with a long list of adword advertisers on the right and top.

Just thought I'd repeat that…

Yes, it is one thing to filter duplicate content, but its quite another to correctly ascertain what is duplicate and what is the actual original or what is just about a similar topic, but not duplicate or copied.

I don’t think Google is able to do this properly or accurately, and once again many innocents suffer. It’s like the sandbox thing again. Did Google care about all the great new legitimate sites they white washed with that broad brush stroke? No.

I have a site that advertises widgets. We find ourselves going out of our way to write ‘Google content’ instead of what was just original content, because the duplicate filter picks up anything remotely the same as other advertisers. It has doubled the time it takes to write articles and we have to use ‘unnatural language’.

Just who is being duopolistic here Google?

Are you a pioneering search engine or an unconscionable money maker? Can you be both?

Webdetective




msg:707645
 3:56 pm on Aug 22, 2005 (gmt 0)

If my site showed up here [groups-beta.google.com...] doest that mean maybe it's not banned?

Small Website Guy




msg:707646
 10:01 pm on Aug 22, 2005 (gmt 0)

I don’t think Google is able to do this properly or accurately, and once again many innocents suffer. It’s like the sandbox thing again. Did Google care about all the great new legitimate sites they white washed with that broad brush stroke? No.

Let's look at this from the perspective of the web surfer.

Web surfer sees a lot of spammy sites in SERPs, he thinks (correctly) that Google is doing a bad job.

So Google creates an algorithm that gets rid of all the spammy SEOed sites but also throws out a few babies with the bathwater. From the web surfer's perspective, the results are a lot better now, no more spammy sites.

Watcher of the Skies




msg:707647
 5:20 am on Aug 23, 2005 (gmt 0)

My site - about 2 years old - is indexed and ranked in Google, albeit after a one year hibernation in the you-know-what. It consists of a home page and about 40 subdomains - each containing unique content, but still structured around a loose template. The site was set up as ht*p/sitename.com with no "www" included and all links were requested to that structure though, yes, one or two have surreptitiously added the www. Now, each subdomain is slowly only showing a URL (no title, desc., etc) in the SERPS. Is there something inherently wrong (vis-a-vis Google) in setting up this way? (I understand from what I've read that I may want to contact all those who've added "www" to the link and ask them to remove it. Undertandably, an intentional campaign would be more hard to fight.) Is there something else I'm missing? Am I supposed to introduce some type of redirect or something? If it's not glaringly apparent already, I know enough technical things to get a site up and running but not TOO much beyond that. Any help? How exactly does this set-up help get a dupe-content penalty?

JuniorOptimizer




msg:707648
 10:13 am on Aug 23, 2005 (gmt 0)

Just do a header check and make sure you custom document is actually responding as a 404.

dataguy




msg:707649
 1:15 am on Aug 26, 2005 (gmt 0)

ht*p/sitename.com with no "www"

I have 6 sites with about half a million listings in google (combined) which don't use 'www.' I have the DNS with enom, and I use their DNS system to do a site-wide 301 from www to non-www.

I've never had a problem with this, but I know many people just can't understand how a site can even function this way, and many people have linked to these sites using www.

Watcher of the Skies




msg:707650
 4:35 am on Aug 26, 2005 (gmt 0)

Thanks, guys.

JuniorOptimizer




msg:707651
 11:09 am on Aug 26, 2005 (gmt 0)

A win for the good guys: I'm back in today. Took just under one month total and it probably would have been faster had I cleaned up my "spam" a bit earlier.

Seo1




msg:707652
 11:46 am on Aug 26, 2005 (gmt 0)

Goog job Junior

S = Slow (Websites need to ripen)
E = Easy (Everything in moderation)
O = Ongoing (Add 1 page + 1 link per day)
-----------------------------------------------
Front Page 3 major serps + most secondaries

Peace

This 268 message thread spans 9 pages: < < 268 ( 1 2 3 4 5 [6] 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved