Forum Moderators: Robert Charlton & goodroi
Up came most or all of my 135 pages. FIVE of those were on OTHER sites!
Checking source code on those, all 5 had phony .php links to my pages.
They even had mouseover directives to display MY urls masking their obvious intent.
I clicked on the Google Cached version of each one.
Each and every one was "cached" as of 31 December 1969.
All my legitimate pages are cached December 2004 or January 2005.
Questions:
1) Does this mean that Google has fixed the phony redirect issue?
2) If NOT, just what do the 1969 cache dates indicate?
3) Does this mean that the scrapers no longer gain PR, placement or other credit for my content?
4) Does this mean that any PR etc. is now passed through to my pages?
5) Why do ANY pages from scraper.com show on a query for site:www.mysite.net at all?
Very curious. - Larry
I did an inurl search inurl:www.mysite.net.
The 5 scraping sites were way way back at the very end of 135 listings, all mine being first.
Each and every one of those had 'Supplemental Result' clearly indicated.
I missed that earlier. Each one had the 1969 cached date as well, a 1 to 1 correspondance.
Again:
1) Does this mean that those 5 pages are flagged as duplicate content?
2) Does this indicate some other possible penalty upon the scrapers?
3) Do the scrapers still get PR or other benefit from my work? -or-
4) Is PR etc. possibly passed thru to my rightful original pages?
5) Are "supplemental results" always given the bogus 1969 cache date?
Sorry for all the questions. I'm trying to make sense of all this. -Larry
I like the kind of humor with this 31 Dec 1969 23:59:59 GMT - and I like that there is at least one problem someone in Google seems to solve from their bunch of problems in their core business. :-)
I think as long as they're there you'll still be hurt, regardless of the cache date.
I noticed the same thing you noticed since about one week. I have a set of keywords that gives me a few hundred results and I check this result almost every day to see how my site is performing. Since about one week I see many scraper sites jumping up and down the list, and many have their cache marked 1969. Also old fashioned bookmark pages from 1997, 1998 are sometimes marked with this date. So I guess they have a filter which compares the amount of unique content with the number of links on a page.
The strange thing is, that the order of the list every day seems to change. I even have the impression that some pages are jumping from the state "to be deleted" back to "accepted", but I may be wrong. Maybe these are just new indexed files from scraper sites that were not fully indexed yet.
Not 100% knowing what is going on, but I think Google finally has implemented a massive filter that will wipe out many sites in the next large update, but they are still fine tuning the parameters. About 50% of the pages for my keyword set are marked as 1969. So if my assumption is correct, this would indicate that 50% of the pages about these keywords will be deleted soon. If this percentage can be applied for the whole Google content, this is the largest clean-up ever.
Anyone has details about the percentage of 1969 marked caches for their own set of keywords?
After all it is so easy to create such a money making site:
Why didn't I do it my self?
One good thing is that the filter catches almost all illegal copies of my content, so the filter fortunately recognizes which content is original, and which was created later.
As a programmer I really would like to know what kind of tool they built and how they tune it, but I guess GoogleGuy won't post his knowledge about this issue to this thread ;)
The scraped site is fully self supporting - all the links have been changed to direct to within the scraped site. They've maintained 100% of the content (cheeky buggers didn't even delete the copyright notice on the pages) - with one simple addition, a link at the bottom of every page to a "mother" site, - which is just a scumwad directory site. (my own site is notably absent from their directory).
I'm based in Canada, using a Canadian hosting service. The dirtbag site is registered through GoDaddy, and all the contact information goes to a post office box in Scottsdale Arizona.
How to I get these guys to clear my site out of their caches? I put a lot of work into re-designing it for good placement in SERPS, and it's been working a charm. I don't want to lose that work for a duplication penalty, and I'm very concerned about copyright issues as well.
If this is gonna cost me $$ for lawyers, my site will fold. period.
So maybe your first step should be to contact GoDaddy and file an official complaint. If you have enough prove that the copyright owns to you, they must take action because of the DMCA.
For an official complaint according to the DMCA, it must contain the following info:
There is also a special forum dealing with copyright at [webmasterworld.com...]
I'll take the "notify" GoDaddy route. I'm also in luck because, after poking through the site in question, I discovered that they scaped another, much bigger and more commercially oriented site that, I happen to know the owner of. I'm letting him know so we can both lodge complaints.
Do you suppose there is some concensus that the 1969 cache date marks a real or future penalty?
I can say this: NO site in the 1st five pages for my keywords have a 1969 cache date,
I take this to mean that they are WAY down in the SERPs, a penalty in itself.
If so, then duplicate penalties on the original or rightful pages seem less likely.
It wouldn't be to hard to decide which page was bogus.
The phony ones all seem to have complicated arcane .php redirects,
while the genuine sites tend to have straight html links, internal and outbound.
Here's a great way for Google to improve their rankings:
What better vote for an original site, than for some scraper to scrape it!
All they need to do is pass PR thru the phony php redirects, multiplied by 2 or 3. - Larry
1969 marked pages are penalized in some way. I.e. almost all 1969 pages are in the last 50% of the SERP for my keyword set. Although PR, Adsense etc. may be normal for these pages, no normal searcher on Google will ever find them. So they may get normal treatment, but no visitors.
Interesting is, that there are two recent threads on WW that show effects that might be related to the 1969 <-> 2004 marking action. One is in the Adsense forum [webmasterworld.com], where people see that the earnings are less stable than in the past. The second is in the Google News forum [webmasterworld.com] about rapidly changing SERPs.
Why? Because in my case (at least), it looks as if the scraper has already been penalized, with a pr0 on the"scraped" pages, and the "mother" domain has a much lower PR than I'm used to seeing on directory sites.
It's hard to say for sure, though. I'm playing whack-a-mole trying to keep track of where my scraped content is residing, because this site seems to have a way of floating the content between a number of subdomains. (not really a major technological breakthrough). I just wonder why they would bother. If they keep moving the content around, how the heck is G-Bot supposed to assign any PR value to it?
As for keyword issues... I confess I'm at a loss to the advantage of the technique. Again, if it keeps moving around, how does that benefit the scraper? A few days in the SERPS with my keywords, then the content moves, and what, he replaces the content with an adense clickthrough page? Sure, if the entire process is automated, but youd have to be moving huge chunks of content around for it to be viable even if it is automated.
Or is my brain just seizing up at 3 a.m.?
There is certainly no intent to be malicious in doing this, but I have been reading recently that due to some Google bug, Google handles this badly.
I think you should at least be aware that at least a lot of people running these PHP link things do it with no ill will and with no idea that it might hurt your Google ranking. They do it because they want to count how many clicks they send you, that's it.
Of course since this Google bug exists there are no doubt some nasties out there who do it to cheat... I think it should be easy to spot the honest people from the cheats though. Honest links will still redirect straight to yourself. The cheats will intercept it and send the traffic back to their own page...
Don't just assume the guy with the PHP links is up to no good though, this is really Google's bug to fix.
"...supplemental sites are part of Google's auxiliary index. We're able to place fewer restraints on sites that we crawl for this auxiliary or supplemental index than sites that are crawled for our main index. For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index.
The index in which a site is included is completely automated; there's no way you can select or change the index in which your site appears. Please be assured that the index in which a site is included does not affect its PageRank."
I had an issue with on a site of mine with domain-com and www.domain-com and the opposite happened. Now I think the problem I think is that Google is penalizing BOTH sites, the supplemental and the "original" page. That's what I have noticed. Maybe if you have a PR8 or 20,000 backlinks you might overcome it, but normal sites are hit hard.
To me, however, I found a bunch of Supplement Result pages. I couldn't find someone with similar data to my site, yet, lots of my pages became Supplement. Dont know why..