homepage Welcome to WebmasterWorld Guest from 54.205.119.163
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 80 message thread spans 3 pages: 80 ( [1] 2 3 > >     
Google Products Used for Negative SEO
turbocharged




msg:4567897
 11:52 am on Apr 25, 2013 (gmt 0)

A website I am working on had previously suffered in the SERPS for its homepage. Duplicate content, created externally, resulted in hundreds of copies of their homepage and internal pages. Most of the copies reside on Google owned properties (Google Apps and Blogspot). To combat the problem, we did a complete homepage re-write. New images, lots of next text and functionality were added to combat the problem. We also modified the htaccess to prevent this from happening again (or so we thought).

A single character in the htaccess file left only the homepage open to Appspot. Within two days, four Appspot URLs were indexed and our client's homepage was in the "omitted results."

I am here to say that Google's preference for their own brands is harming the internet in more ways than just limiting consumer choice. Webmasters and SEO professionals, like me (webmaster) are spending countless hours defending themselves against Google's products. I discovered this when I did a search and found many complaints from others that have been proxy hijacked by Appspot.

Since I am not an htaccess pro, or SEO pro by any means, we submitted a request for additional funding from our client to bring a SEO pro on for a limited consulting basis. Other proxies that have cached content, not owned by Google, appear to be placed in the omitted results appropriately. But we develop webpages mostly, and have no idea how many other Google products/techniques are being used as a negative SEO weapon.

Yesterday I sent out 4 DMCA notices to Google on behalf of our client. Today I will work on assembling the rest of the list, which reaches well beyond one hundred different domains, and hand them off to another in my department to File DMCA notices. Once I get client approval for the SEO pro, he/she will get the list to for review.

A good portion of the wasted time/money could be saved if Google used noindex on Appspot proxy pages. Does anyone have any idea why they would legitimately not do so? At the present time, I estimate this problem is going to cost our client $1,500. In addition to submitting the DMCA notices, we will have to rewrite some pages that probably won't meet Google's threshold for removal. Multiply this by thousands or tens of thousands of small businesses, and you have a lot of financial damage occurring. What a nightmare!

 

getcooking




msg:4567993
 4:39 pm on Apr 25, 2013 (gmt 0)

This is interesting. About 2 weeks ago I blocked the user agent "AppEngine-Google" because I found a proxy had scraped one of our pages and it was in the "omitted" results when doing a routine string search for text on that page. A little more digging and I found more of these and was able to trace it back to that user agent.

Now, two weeks later, the traffic on our Panda-hit site is surprisingly up by 10%. I'm not sure if there is a correlation or not, but given what you found now I'm wondering if that has been affecting us. The site in question has had real problems with ranking the homepage. It ranks on our brand name and that is about it.

aristotle




msg:4568063
 9:11 pm on Apr 25, 2013 (gmt 0)

If scraped copies outrank you, it could be a sign that your site has been penalized by either Panda or Penguin.

Robert Charlton




msg:4568074
 9:54 pm on Apr 25, 2013 (gmt 0)

For a proxy to outrank you requires some coordinated effort. Your page needs to be requested through the proxy (as, unless it's requested, it's not normally just sitting there), and then the resulting url needs to be linked to.

Probably the reason that Google products are being used in this case isn't that they are favored by Google... it's that they are free. I've got to confess, though, that I haven't checked out the idiosyncrasies of Appspot.

With regard to Blogspot... quite a few years back, as I remember when Microsoft launched MSN, a common technique that spammers used to achieve rankings was to put scraped content on Blogspot and then to point a bunch of Blogspot links at the page. At the time, MSN was much more vulnerable to this kind of spam than Google was... so if Google is favoring Blogspot sites, it's changed policy. Since then, MSN has also morphed into a Bing, much more sophisticated than MSN ever was... and the current Google, while also much more sophisticated than old Google, is liking to index fresh content, which may or may not be providing spammers with a window of opportunity.

Additionally, spammers and hijackers tend to target Google (and to exploit weaknesses in the Google algo) because it's got much more traffic than any of the other engines. This is not the equivalent, though, of Google favoring its own properties.

That so many pages are able to outrank you, though, does suggest other weaknesses in the site.

Andem




msg:4568087
 11:18 pm on Apr 25, 2013 (gmt 0)

It's actually rather amazing with some of the Blogspot sites that are nothing more than scrapers. I own a rather popular content site where we're actually getting a fair deal of traffic from the scrapers that often leave in in-content links or image links.

What's amazing is the fact that these links provide real traffic. I would imagine sites like this have high bounce rates, which would imply that most people aren't actually clicking through to links, thus little-to-no referral traffic. That would suggest, in the end, that they are getting some traffic.

In my case, I've already filed several successful DMCA requests to Google but with no closed accounts. Since these Blogspot blogs (all thus far containing AdSense) are actually getting some traffic as described above, I think there is a conflict of interest on Google's part.

I've never seen them rank above any of the keywords I (actually don't really) monitor, I suspect they might just be getting some juicy long-tail by including additional keywords on-page.

In the end, I believe Blogspot et al. should be consigned to the dustbin of history.

As a side note and from what I can gather in my experience, all of the Blogspot scrapers seem to be originating in either South Asia or Southeast Asia.

TheOptimizationIdiot




msg:4568099
 12:54 am on Apr 26, 2013 (gmt 0)

<rant>

This is a really good thread and good info, imo, but I don't think we should call it negative SEO, because to me theft is theft and copyright infringement is copyright infringement, so let's not "candy coat" what it is and call a spade a spade.

It's not SEO to steal someone else's content.
It's theft / copyright infringement.
Period.

There's not one bit of true SEO in stealing from someone else. SEO, imo, is being able to make a site/page rank based on it's own merits without needing to steal or take someone else's work and pretending like it's yours.

SEO is SEO
Theft is Theft

No real SEO I know, or know of, needs to steal content to make a site/page rank. That's not how true SEOs do things.

</rant>

turbocharged




msg:4568101
 1:10 am on Apr 26, 2013 (gmt 0)

If scraped copies outrank you, it could be a sign that your site has been penalized by either Panda or Penguin.

We just changed the content, pinged Google immediately after posting it and the Appspot proxy copy was cached first. Taking a snippet from the new page's text, the client site appears in Google's omitted results, but its cache is that of the old page. I'm not sure why this is, but what Google is displaying for the client page in the SERPS does not match the cache. The Appspot proxy page now returns a 405 header response when accessing the client's site for any page.

For a proxy to outrank you requires some coordinated effort.

I completely agree. To substantiate this claim, the app's name, which is passed to the log files, specifically references the client's company name in a derogatory manner on a fair number of these Appspot proxy URLs. There is no doubt in my mind that the creation of this duplicate content is intentional and malicious. None of the Appspot proxy URLs I checked have any backlinks.

Probably the reason that Google products are being used in this case isn't that they are favored by Google... it's that they are free.


Indeed free is appealing to the unethical. There's few tracks for the victims to trace. But in doing my research regarding Appspot and Blogspot problems, the black hatters claim that they can rank Blogspot blogs with just a few link "blasts." It appears the black hatters are split between Blogspot and Tumblr as to which one is the easiest to rank with spam. I don't know where Appspot falls at in this mix, but from the complaints I have seen from others with similar problems I can only assume that anything residing on the Appspot domain is quite strong without any backlinking.

When I get time, I will have to learn more about Appspot and how to create proxies. If nothing else, I really want to learn what would justify calling an external page without an appended noindex tag. People have been complaining aboutAppspot proxy hijacking for years, and it's a security vulnerability that has harmed many webmasters and potentially many more. Google could/should append a noindex meta tag and the problem would be solved.

The client rejected our request to bring a 100% dedicated SEO on at this point. :( Because our hourly rate is cheaper, he expects us to provide the same level of service/care for half the price. While I don't mind the work, it takes us away from completing the design aspects of this project and will have a notable impact on completing the tasks which we were originally hired to do.

turbocharged




msg:4568102
 1:14 am on Apr 26, 2013 (gmt 0)

@ TOI

The Appspot apps apparently can be named by their creators. The app names appear in the log files and provide enough information to describe the intent of why the proxies exist.

Yes, it is theft and copyright infringement. However, at least in this case, the app names make it clear that the proxy creator wants to obliterate our client's homepage. And thus far he/she has been quite successful in doing so.

TheOptimizationIdiot




msg:4568114
 1:38 am on Apr 26, 2013 (gmt 0)

In your .htaccess file, put this at the top:
(Right after RewriteEngine on if it's already there.)

RewriteEngine on
RewriteCond %{HTTP_REFERER} PutTheDomainNameOfTheThiefHere
RewriteCond %{HTTP_USER_AGENT} PutTheAppNameHere-EscapeSpacesLikeThis\ WhenYouDo
RewriteRule .? /403-bad-domain-scraper-thief.php [L]

If that doesn't do the job and block them from scraping the site, then let me know and please post the actual user-agent string shown in the logs and/or IP Address of the AppSpot application doing the scraping or any other information you have from the logs, all with in the TOS of course.



On 403-bad-domain-scraper-thief.php put:

<?php header('HTTP/1.1 403 Forbiden'); ?>

At the top of the page.

Then put your regular HTML with a notice in the <body> section of the page:

If you are visiting us from [stupid-thief.com], please understand they are stealing our content and if you came from Google it would be appreciated if you clicked back to the search page and clicked "block all results from this domain", then revisited us. [link]Click Here to Bookmark This Page[/link] (It will display correctly when you visit again, as long as you do not click a link from [stupid-thief.com])

If you would like to view this page without clicking back and blocking the site stealing our content, simply [a href="<?php echo 'http://www.yoursite.com.'$_SERVER['REQUEST_URI']; ?>]click here[/a] to reload this page and view the content.

turbocharged




msg:4568122
 3:05 am on Apr 26, 2013 (gmt 0)

We are currently using in our .htaccess file:

RewriteCond &#37;{REMOTE_HOST} \.appspot\.com
RewriteRule ^.*$ - [F]

Since all of the proxies appear on sub-domains, I don't want a user agent name change to occur in the future and allow the client site to be copied again. I have yet to see a legitimate reason to allow any Appspot app to retrieve data from this client's site.

TheOptimizationIdiot




msg:4568125
 3:25 am on Apr 26, 2013 (gmt 0)

I don't want a user agent name change to occur in the future and allow the client site to be copied again.

I'd do all 3 and .? is much more efficient than .* in this situation since you don't need to back-reference anything.

RewriteEngine on
RewriteCond %{REMOTE_HOST} \.appspot\.com
RewriteCond %{HTTP_REFERER} PutTheDomainNameOfTheThiefHere
RewriteCond %{HTTP_USER_AGENT} PutTheAppNameHere-EscapeSpacesLikeThis\ WhenYouDo
RewriteRule .? /403-bad-domain-scraper-thief.php [L]

I'd also still use the custom 403 page just for their UA/Referrer string, and you might even have some fun with a custom page, just for them ;)

RewriteEngine on
RewriteCond %{REMOTE_HOST} \.appspot\.com
RewriteRule .? /just-for-appspot-thieves.html [L]

RewriteCond %{HTTP_REFERER} PutTheDomainNameOfTheThiefHere
RewriteCond %{HTTP_USER_AGENT} PutTheAppNameHere-EscapeSpacesLikeThis\ WhenYouDo
RewriteRule .? /403-bad-domain-scraper-thief.php [L]

Same as above on the 403-bad-domain-scraper-thief.php page, but if they really want to "proxy serve" a page from you, why not let them and then put the following on just-for-appspot-thieves.html?

In the <body> section of just-for-appspot-thieves.html:

We've created this page just for the copyright infringing AppSpot users who insist upon copying our content. In all reality we think they should rank very well in search engines for thieves, copyright infringement, and even: sorry people who can't build their own website with unique information.

If you would like to visit the site they're stealing from please search for: the-name-of-your-site.com keyword.

We apologize for any inconvenience this page may be causing you, but we refuse to let others steal from us and attempt to deceive you into thinking our work is theirs.

Thanks for understanding.

turbocharged




msg:4568477
 12:16 pm on Apr 27, 2013 (gmt 0)

Thanks for the .htaccess tips TOI. I've included all of them in the .htaccess file, with the exception of the custom 403. Outside of the many Appspot and Blogspot domains, there are far too many copies or partial copies of this client's site floating around on Wordpress, Tumblr and other free blogging places to do custom redirects.

By the middle of next week we should have all the DMCA notices sent out to the remaining blogging platforms. At this stage I would like to focus on prevention before I get back into the design work I originally was hired to do.

The client's site is verified through Google+ as a publisher with a linked G+ icon on the homepage and linked/verified to his WMT account. That does not appear to make any difference to prove authenticity/ownership as it is copied too and displayed on Appspot proxies and full page copies on many different free blogging sites. The Appspot pages are still ranking and the client site is in the omitted results still.

Does Google read metadata (EXIF/IPTC) of images? Would this be a viable way to help an algorithm, with no concern of the originator of content, determine ownership? Yes, image meta data can be changed, but I doubt most scrapers would devote much if any time to the task. And if they do, I'd like to consume as much of their time during the process as possible. It also may make it easier to substantiate DMCA takedowns when a digitally signed image is discovered on partial page copies.

rish3




msg:4568537
 4:25 pm on Apr 27, 2013 (gmt 0)

You can block ALL the appspot stuff...

RewriteCond %{HTTP_USER_AGENT} AppEngine [NC]
RewriteRule .* - [F]

The api for appspot won't let them alter the fact that "AppEngine" appears in the HTTP_USER_AGENT.

I had this same issue, and the funniest thing is that Google denied my DMCA with the reasoning that the infringing app was a proxy. I replied that a sane proxy would have meta robots tag instructing search engines not to index the proxied content. They still didn't honor the request.

The solution was the .htaccess change listed above, in combination with the Google public url removal tool (which gets the offending content removed from the G cache quickly)

turbocharged




msg:4568569
 7:30 pm on Apr 27, 2013 (gmt 0)

rish3, thanks for the tip. I'm going to try to remove these Appspot proxies (the apparent strongest ones) from Google's cache. If it works, I may just get some help removing the other hundred or so from Google's cache come Monday.

It's really odd that Google caches their own proxies and not the original author. It's very, very frustrating. :(

tedster




msg:4568596
 12:04 am on Apr 28, 2013 (gmt 0)

It seems clear at this point that Google basically views the web first as a collection of data or content, and only secondarily as a collection of intellectual property.

turbocharged




msg:4568613
 1:46 am on Apr 28, 2013 (gmt 0)

That may be true tedster, but I think some people are more aware of the value of their content these days. I know my client is extremely hot about this Appspot stuff, and rightfully so. It's costing him money for me to clean things up. Although I'm cheaper than a SEO firm, it's still going to cost this guy more money than he had originally budgeted for something he was not aware of.

In these days where Google is really driving home the importance of creating quality and compelling content, it's astonishing that they have allowed so many webmasters to be victimized by their own service. The user above "getcooking" reported a 10% rise in traffic after blocking Appspot. Obviously this is what my client wants to see. If his site is in the omitted results, it's kind of obvious that Google is treating him like the scraper. What kind of sitewide implications does this have? Nobody really knows, and it would be difficult to gauge when Google is continually tweaking their algorithm.

The user rish3 reported that Google denied his DMCA request because it was a "proxy." I would have an issue with this. If Google caches the proxy page, then they are storing a stolen version of someone elses property. Even though webmasters may use .htaccess to block Appspot, the burden of time, and cost in my client's case, falls onto the site owner.

Although I don't know how Appspot works, I can't think of any legitimate reason to cache content from a proxy. If it is indeed a proxy, then it is pulling data from another source. It's that source that should receive credit for their work 100% of the time. A noindex tag would solve the proxy problem, but I don't think this is a high priority to Google. Appspot based proxies have been referenced in countless proxy hijacking posts for years, including on Google's webmaster help forum. For whatever reason, Google chooses to turn a blind eye to the theft of content and the burden it creates for website owners. The bright side is that I am being compensated to help correct the problem. But this problem should really have never existed from the start.

TheOptimizationIdiot




msg:4568626
 2:48 am on Apr 28, 2013 (gmt 0)

In these days where Google is really driving home the importance of creating quality and compelling content, it's astonishing that they have allowed so many webmasters to be victimized by their own service.

FYP: In these days where Google is really driving home the importance of creating quality and compelling content, it's astonishing that they have allowed so many webmasters [who do so] to be victimized.

But this problem should really have never existed from the start.

Absolutely agreed, and what's even more interesting, is apparently those "kids in Redmond" agree and think it's easy to stop, which might be why Bing doesn't have near the issues Google does with stolen content ranking.

Of course, Google has been following Bing for a while based on what I've been seeing, so maybe they'll follow along and stop showing the duplicators and thieves above the creators who are trying their best to do what G says in their guidelines.

diberry




msg:4568700
 3:22 pm on Apr 28, 2013 (gmt 0)

It seems clear at this point that Google basically views the web first as a collection of data or content, and only secondarily as a collection of intellectual property.


Very much so. In the early days, this made sense - after all, Yellow Pages was never responsible for determining if a company had a right to make the claim they made in their ads. Why should Google be expected to sort out copyright claims?

But it's grown beyond that, and lowered the quality of their SERPS. Plagiarism has become such a norm noline that on some queries even the average searcher notices that the top several websites are just repeating the same article. This is something Google wants to combat, but they still don't want to take responsibility for the copyright claims, which frankly I can understand as a POV. (Google does a nice job responding to DCMA requests, which suggests to me they respect our intellectual property rights, but feel it's not their responsibility and should be handled by a third party (ChillingEffects).

Bing, conversely, seems to view the web as a collection of data that it's their job to curate. Google seems to whitelist everything and then try to blacklist as needed; Bing blacklists everything to begin with and carefully whitelists those sites and pages they feel have earned it. Bing gives the "cream of the crop" results, but occasionally if I can't find what I'm after, I put on my hip waders and head to Google where I know I'll have to surf through some muck, but the thing I'm after is almost sure to be in there somewhere.

If Google was a smaller, less dominant company, I would entirely agree it is not their responsibility to sort out copyright issues. However, Google is a special case. Because their algo has (a) caused part of the problem by making it so easy to rank and serve Adwords on stolen content and (b) they are powerful and rich enough to do something about it, I really think they need to do better. Panda and Penguin are probably attempts at that, as is the cooperation with DCMA requests, but they could do better. As evidenced by Bing.

TheOptimizationIdiot




msg:4568708
 3:56 pm on Apr 28, 2013 (gmt 0)

But it's grown beyond that, and lowered the quality of their SERPS.

Yup. I saw 4 sites that were fairly obviously duplicates of each other in the top 10 yesterday and they were just as obviously (to me) owned by the same person/business. I know determining ownership algorithmically may be difficult, but if the first one wasn't what I was looking for, neither were the other 3, so why on earth would they feel it's necessary or a good idea to include duplicates/near duplicate in the results is beyond me.

Of course 8 of the top 10 results from the same domain seeming like a good, useful idea is beyond me too, so maybe I "just don't get" what searchers really want?

tedster




msg:4568714
 4:29 pm on Apr 28, 2013 (gmt 0)

I saw a short TV interview with Eric Schmidt this morning, and it also occurred to me that Google has always been driven mostly by a vision of what they think the future needs of the computing public will be rather than handling the present moment well. Because Bing is in an intensely competitive mode, their focus is much more in-the-present.

turbocharged




msg:4568726
 5:16 pm on Apr 28, 2013 (gmt 0)

I've come to the conclusion that using Google's public URL removal tool is futile. It asks to enter a keyword that appears in the cached page that no longer appears on the live site. I've tried many different single words and the result I am getting is:

The content you submitted for cache removal appears on a live page.

If the live proxy page is blocked by htaccess, and returns a 403/405 error, how can every word I am using be appearing on a live page according to Google? It's not like I am using the word "forbidden" or anything like that. I'll just chalk this one up as another wasted but billable effort compliments of Google.

seoskunk




msg:4568737
 7:11 pm on Apr 28, 2013 (gmt 0)

Time to dust off that old RDNS script for googlebot by the sounds of it

[webmasterworld.com...]

But I would say in my experience proxy hijacks seem to occur when your site has other problems as aristotle pointed out

backdraft7




msg:4568810
 4:30 am on Apr 29, 2013 (gmt 0)

If scraped copies outrank you, it could be a sign that your site has been penalized by either Panda or Penguin


and where is Penguin and Panda to penalize the scraper site with their thin content and black hat webspam?
Not to mention the stolen content and most likely a cloaked PRD?
These sure are the dark ages of SE's.

diberry




msg:4568822
 5:23 am on Apr 29, 2013 (gmt 0)

and where is Penguin and Panda to penalize the scraper site with their thin content and black hat webspam?


If Panda and Penguin EVER once worked as advertised, I must've taken that day off and missed it. ;)

I think maybe a lot of current problems with the SERPs might be solved by Google just dialing down the "trust" factors. They just don't seem to have them quite right. For example, being the original publisher of content that has been duplicated ought to count for a lot in terms of trust, but it doesn't seem to. And while having a trustworthy result from Amazon at or near the top is often sensible, having 14 of them... not so much.

turbocharged




msg:4568862
 10:36 am on Apr 29, 2013 (gmt 0)

@seoskunk

I believe the problem is that the Appspot proxies were crawled, with the client's new homepage content, before Google cached the client's homepage. That was our error by allowing everything but the homepage to be accessed by Appspot. The main Appspot proxies are losing their caches slowly as we corrected the issue quickly. There's 140 Appspot proxies in total that have scraped this client's homepage in the past (before we made changes).

Both backdraft7 and diberry raise interesting points. Why a Google owned proxy is allowed to cache anothers content is disturbing. No panda, penguin or other algorithm should promote the theft of content, but Google's algorithm and lack of noindex use on their proxies presently condones such actions.

In my observation, Google has absolutely no respect for the originator of quality content. If you are one of the lucky ones to already have domain authority, you are free to rip someones elses work for your benefit. You also are probably benefiting from host crowding.

To be fair, I'm not as qualified as a SEO as most on this forum. But I am finding myself doing things that Google says we should not do by developing content for the search engines. If Google would quit allowing the theft of others work with Appspot proxies, and be more aggressive in policing Blogspot scrapers, maybe I and thousands of other webmasters could focus on other tasks.

tedster




msg:4568867
 12:05 pm on Apr 29, 2013 (gmt 0)

Just before the first Panda release, there was a supposed upgrade to the scraper algorithm, we called it the Scraper update at the time. See [webmasterworld.com...] It supposedly set the stage for Panda which was nearing its first release - so Google did see the need, at least.

I said at the time that it wasn't enough improvement to make Panda an even-handed algorithm, and that has proven to be the case even more as the months tick by.

TheOptimizationIdiot




msg:4568871
 12:28 pm on Apr 29, 2013 (gmt 0)

I'm not too happy with Google and the duplication in the results from scrapers, but I do think in fairness I should point out:

Why a Google owned proxy is allowed to cache anothers content is disturbing.

Unless you're serving a cache-control header for the document(s) or have a past expiration date set in the expires header then documents are often cacheable by proxy servers according to RFC2612.

Allowing caching is not a "Google proxy issue", it's them following protocol and people not using the tools available within the protocol to stop proxy caching. If you're using the w3.org protocols as specified to prevent caching and they're still doing it, then it's an issue on their end, but if not, then it's not their fault people don't know what they're doing or how to prevent caching.

[w3.org...]

Also, a proxy "injecting" something like a "noindex" header is absolutely against protocol and is a very slippery slope, so they're definitely correct to not do it for any reason.

And, the chances of even cache control stopping someone from thieving content are not very good, because to have the newest version when it's available they really have to be revalidating the request, but it's still not Google fault for running their system according to protocol, it's the fault of the thieves using their system.

* All that said: Indexing the content and allowing it to display over the originator is Google's fault and they've been doing it for years, which imo is something they should have figured out how to correct long ago, but as it's been said before, it's their search engine and they can do as they please with it, so if making the efforts necessary to show the originator isn't something they're interested in doing, then that's their decision.

seoskunk




msg:4568998
 7:09 pm on Apr 29, 2013 (gmt 0)

I don't know why for certain some sites are affected by proxies more than others but I would guess it being some sort of trust issue.

Its very frustrating to have to deal with this and you have my sympathy turbocharged and I strongly suggest you try forward reverse dns of googlebot as a solution to the problem.

I fail to see why google can't eliminate this from the web, especially if its their own properties. I guess they have there reasons and may feel the site in some way deserves it.

turbocharged




msg:4569210
 3:49 am on Apr 30, 2013 (gmt 0)

@seoskunk

Thanks for your kind words and suggestion. As a designer, the intricacies of SEO and content theft are not my expertise. But I'm being paid to learn about the dark side of scrapers and how easily Google's services are abused, so I can't complain too much.

I'm not sure why the homepage is susceptible to harm from proxy hijacking, but it may have something to do with the combined 140 Appspot proxies targeted onto this client's homepage. It also may have something to do with the homepage text that the client had before we changed it. The text was keyword heavy, but within norms for his industry, and not structured well. We fixed that, but numerous Appspot proxies were cached first with the new page info while our client's site is still cached under the old content. Thankfully more of these Appspot proxies are losing their caches and hopefully will be dropped from the index entirely in the near future. Once they drop, we can rule that out. Otherwise, the site is healthy and gets xx,xxx visitors daily from good content/updates from other pages within his site.

@TOI

Injecting code is far different than securing a service with a noindex tag that protects the greater good of the internet community. I see no legal issues with Google restricting their services for lawful acts by preventing their proxies from being indexed and cached. That is the responsible thing to do. I would think that Google would be more liable for damages when their apps, which some are named "vulgarity-business", are copying and damaging the good name of some businesses as in my client's case. Seriously, some of these Appspot proxies were made to intentionally harm my client's site. Some of the appid names are quite telling.

TheOptimizationIdiot




msg:4569211
 4:00 am on Apr 30, 2013 (gmt 0)

Injecting code is far different than securing a service with a noindex tag that protects the greater good of the internet community. I see no legal issues with Google restricting their services for lawful acts by preventing their proxies from being indexed and cached. That is the responsible thing to do.

Much like the public registration discussion, this is no solution to the problem. As soon as it doesn't work on AppSpot those who want to do damage to your client's site(s) will go somewhere else, because duplication is indexed by Google and can outrank the original, and it's not only AppSpot duplication.

The solution, in the case of duplication, is Google not indexing/showing duplicates.

"Adding directives" when proxy serving (that's done by adding (inserting, injecting) HTML code into the source of the page or adding (inserting, injecting) server-side headers not specified by the originator -- IOW: Either way they break protocol and inject, (insert, add) code into the original) is not a solution to the problem.

Them caring enough to only show the first discovered version (and if necessary giving people a way to submit content prior to publishing publicly) is, because then copying does no good and people are forced to do something besides steal content to rank and "harming the originator" becomes much more difficult and quite possibly even worthless in many cases.

They really do not need "noindex" on the page or in the HTTP header of a duplicate to not index the duplicate, really, they don't. All they need to do is stop their search engine from indexing duplicates and much of the duplication on the Internet will stop.

This 80 message thread spans 3 pages: 80 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved