My text pasted into Yahoo Answers - now my page is not in the SERP - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

My text pasted into Yahoo Answers - now my page is not in the SERP

santapaws

6:57 pm on Mar 28, 2008 (gmt 0)

i have a page where the whole text on that page has been copy and pasted into a yahoo answers question. Now if you search for any string of text on that page the yahoo answers page comes up at number one and my page is nowhere at all. Even if you put the string into quotes you have to click the SHOW OMITTED RESULTS to get it to show as the very last of the 4 quotes that copied my page. Can someone help me understand how part of the algo can remove a page totally when another site copies it? I mean not just demote it but remove it totally.

tedster

8:57 pm on Mar 28, 2008 (gmt 0)

See [webmasterworld.com...] where I gave links to various Google patents on how their algo works with duplicate and near-duplicate documents. It is a hard problem, and as you've discovered, you can still lose credit for your ownership of intellectual property rather easily.

If the Yahoo Answers page doesn't at least link to you, you should demand at least that much. If this page's ranking is important to you on your own domain, then it sounds like a DMCA filing might be appropriate.

santapaws

10:13 pm on Mar 28, 2008 (gmt 0)

The just added a note at the bottom of the text saying source and then my domain name but didn't make it an href link.
you mean log into yahoo and post to the person who put the text there to add a real link or to contact yahoo? No the page isnt a money page it was just very strange to understand how your text pasted into a comments section of a website can remove a totally legitimate page, not just demote it but remove past all the others who dont even have the string of text.

I should also add that my page has 20 external links to it including from yahoo answers on other topics. The yahoo page replacing mine for a string of text has no external links which the page about the patent says is the most important thing.

So i really do not understand what bit of logic code results in this. I wonder if indeed its a mistake as they did recently with the posttion 6 thing. But it takes enough people to shout about it for it to be looked at.

Oh and i should also say the page ranks for keywords, just not for text string searches.

koen

2:28 am on Mar 29, 2008 (gmt 0)

Maybe you can add a comment with the href to your page and see if that solves the problem.

outland88

2:33 am on Mar 29, 2008 (gmt 0)

The trick you mention is used by a lot of overseas optimizers but it is normally used to promote another site not yours. This sounds like a test, somebody with time on their hands, or a competitor. Whatever the reasons, as you�ve found out, it can kick you out of the rankings or at least lower your rankings in Google. Similar to the message box I'm writing in now it can be shrunk to hide a whole page(s) and people never spot it. These optimizers usually target scrapebook and social bookmark sites to hide your content. Meanwhile Google is very pro on these type-sites which can replace you rather quickly in the results.

The quickest way to stop it is hope they have Adsense on the page and fax a DMCA to Adsense. After that you have to wait until the cache updates which can be 3+ months. Yahoo has fax numbers. Also look for hidden links.

The biggest problem though you can encounter in this BB-scrapebook-social bookmark type area is many of these sites scream first amendment free speech even though their TOS prohibit it. They can hide your content in private areas. You�re very lucky its in Yahoo. It can get really rotten.

mic089

4:20 am on Mar 29, 2008 (gmt 0)

If they have copied and pasted text from your site, and you are the copyright owner of the text, you could contact Yahoo!'s Copyright Agent and ask for the text to be removed from Yahoo! Answers.

Has the text being pasted into Yahoo! Answers in the United States? Yahoo US has removed content copied from my site when I emailed them following the procedure outlined here [info.yahoo.com...]

Unfortunately, I have had some difficulties when trying to contact Yahoo! Copyright Agents in other countries.

santapaws

9:37 am on Mar 29, 2008 (gmt 0)

thanks for the suggestions which i will look into. What still baffles me though is what code has resulted in this? How would you possibly write a bit of code that says basically:
original cache? if no then remove original website and show only websites without original cache for exact string searches.

Ok somewhat simplified but you get the picture. Original page predates yahoo answers by YEARS, has 20 external links while yahoo answers has none and google will not return a quoted phrase at all unless you click for omitted results. It returns yahoo answers and two other sites with a selected copied amount of text. So what bit of code could legitimately result in this? This surely must be a coding error within the algo.

tedster

5:25 pm on Mar 29, 2008 (gmt 0)

It's a VERY complex algo. Google tries to measure and weight so many elements, including how fresh a page is, how trusted an entire domain is, whether backlinks just recently appeared or if they are now only aging - the list goes on and on. These many factors all need to be balanced in the final ranking, and in some cases that balancing act doesn't work as well as it might.

For a look at just part of this long list of algorithm factors, check out our thread on the Historical and Age Data patent [webmasterworld.com].

Also take into account that Google's first job is keeping their end users happy, and not keeping webmasters happy. They do care about webmasters - with clearly more communication for webmasters than any other search engine - but that will never be the most important goal of the Google algorithm.

santapaws

6:48 pm on Mar 29, 2008 (gmt 0)

Yes i appreciate that but ti still does not make sense. The other site that appears is a pure scraper and yet i only show with omitted results. The page still ranks for keywords so its not a trust issue as i see it. This is specific to exact text string searches which in reality nobody is going to do. So put a section of text in quotes and you will get just 3 pages, 2 of which are scrapers plus yahoo, only i am left under the omitted results. I do not see how this is an effect of a complex algo that still has a basis in logic and parameters. This to me is simply wrong, an effect that does not appear to relate to any stretched logic and complexity of different factors merging. With just 4 results i am still now allowed to show with the original cache page with the only external links to the page.

tedster

7:09 pm on Mar 29, 2008 (gmt 0)

Of course it's wrong - lots of things in the Google Results are wrong, and lots of thing in any search engine's results are wrong. I've got relevance problems on an internal site search that I manage, and that's a lot smaller than trying to create a search engine for the entire web.

I doubt that search can ever be free of errors - it's a MACHINE intelligence run over an immense data set the size of which you and I will most likely never need to manage. Google uses almost a million networked servers and they have hundreds of people writing and tweaking code all day.

So just do what you need to do to make it right and don't wait for Google. There's good advice up above. Another interesting experiment would be to get one new backlink to your url.

santapaws

7:48 pm on Mar 29, 2008 (gmt 0)

Perhaps i didnt explain myself correctly search. Wrong in the sense that its not a bad result but the algo itself has a real bug. Im sorry but there is no logic to this. Of course i stand to be corrected if someone can demonstrate set of circumstances where this would be a legitimate side affect, some collateral damage from part of the algo where a level of tolerance is inevitable. Such as fighting spam will unfortunately always hit some innocent sites because of thresholds. In my example i do not see any possible scenario where this can be the result of applied logic at any point.
I am not going to spend time on this because as i said this is not an important page to me, its really put up to inform users of the website . My post was really to highlight a flaw, a bug because the example is so specific.

tedster

8:17 pm on Mar 29, 2008 (gmt 0)

You're right - the algo is still flawed in this area, as well as in others. Seeing your page bumped into "omitted results" is a kind of collateral damage of a big, blundering algorithm, if you will.

If you were hoping to get Google to change something in the algo based on your post here - well that's not very likely. It's also not the purpose of this forum, as we mention in the Google Forum Charter [webmasterworld.com].

You can get more direct access to the Google algo team by first doing the problematic search, and then using the link toward the bottom of the page that says "Dissatisfied? Help us improve."

santapaws

10:39 pm on Mar 29, 2008 (gmt 0)

er no i wasnt expecting that. I was interested to hear what others thought. Like i said its not a problem to me. The page ranks quite normally for keyword terms. What im seeing is something very specific for long strings of text.
I was interested to see if anyone could explain an actual way this could come about. I really dont see how it would happen. Yes i understand the generalities of the algo and complexities etc but this is a simple example with just 4 results. Even with just 4 results a non penalised page is not displayed unless you click the omitted button.

ChicagoFan67

10:39 pm on Mar 29, 2008 (gmt 0)

I had something very similar happen to me. Someone copy/pasted about 5 pages worth of tutorial from my site onto a forum. For about a week or two that forum page replaced me in the #1 position for several important keywords and search phrases. I wrote to the administrator and asked them to remove the post and instead they added a link to my site at the bottom. All the images they copied still hotlink back to me and I have replaced with a custom error message.

Well, I'm currently back at #1 for many search terms and phrases but often the forum page will be right under me at #2. For some phrases neither my page or the forum page turn up in the results - even when there is not another page out there on the whole entire web that comes anywhere near to providing that exact match. Overall, I've lost a lot of traffic from very specific search queries.

I'm currently in the process of rewriting and restructuring these pages to make the tutorial easier to understand and follow. Hopefully, I'll kill two birds with one stone in the process!

santapaws

10:43 pm on Mar 29, 2008 (gmt 0)

well i just dont get that either. I can understand how the algo is looking to demote keyword searches so you cant just willy nilly rank for them, but for long strings of exact text is strange. That was what made google so great originally, that you could find obscure pages by matching strings of text. Now it seems you cannot do that.

tedster

12:02 am on Mar 30, 2008 (gmt 0)

When Google introduced the Supplemental Undex (excuse me, Index) the ability to find exact strings, especially from the content area, became more and more problematic. Even with lasy summer's "improvement" in searching the supplementals, it's still quite crippling to a url to be in there. Part of the reason, at least, is that apparently the data from a supplemental url is not as thoroughly tagged as the data from the regular index.

I'd say that the ability to match exact strings of text - especially to the original source - is also not high on the average user's list of priorities. Google has clearly been willing to trade off in that area, whether we like it or not (and I don't.)

I'm glad to hear ChicagoFan67's report of the effect of that link. It's the effect I would have expected, but it's good to hear confirmation. It'seven possible that just one new direct backlink can pop a page out of the Supplelemental index partition.

Robert Charlton

8:11 am on Mar 30, 2008 (gmt 0)

What im seeing is something very specific for long strings of text. I was interested to see if anyone could explain an actual way this could come about. I really dont see how it would happen.

This is how Google has been reacting to duped content for several years now. I first observed it when a page of ours that gets scraped incessantly had gotten temporarily knocked out for a competitive two-word search. I immediately checked by searching for a whole sentence in quotes, and only one scraper page was ranking. It took adding &filter=0 to the serps page url bring our page back.

What was curious, though, was that our page was still ranking on a desirable three-word phrase. My thought is that we'd gotten classified as the dupe (temporarily), probably because of a bunch of links pointing to the scraper. We had enough inbounds to overcome the filter for the three-word phrase, but not enough to overcome it for searches on the two word phrase. This, even though the scraper was not ranking in our place for the two-word search, at least not on Google.

But we didn't have the kind of links that would rank us for the whole string I'd searched, which didn't include any words we were optimized for. So we had no link text boost or other sort of algo boost to overcome the dupe filter. That's my reasoning on it anyway... and I've observed this behavior a number of times over the past several years when we've gotten scraped.

Google generally adjusted this kind of dupe problem pretty quickly... within a week or so... and there was a period of almost a year when I didn't see it happening very often. But in the past couple of months I've been seeing several of our pages apparently reacting to scrapers again, and taking much longer to recover than they used to. Adding &filter=0 to the Google search is restoring them, and I can find the duping pages, generally with Copyscape if not with exact text searches.

I should mention with regard to scrapers that long exact quotes won't always find them, because scrapers are often breaking up the text they scrape, borrowing parts of sentences instead of whole ones.

I'd say that the ability to match exact strings of text - especially to the original source - is also not high on the average user's list of priorities. Google has clearly been willing to trade off in that area, whether we like it or not (and I don't.)

Yes, it's as if Google gets bored. Google likes results with differentiation. On legitimately duped content, like the Declaration of Independence, Google will display many more results on a longish but occasionally-quoted passage (eg, "deriving their just powers from the consent of the governed") than on a longer and little-quoted passage (eg, "for the sole purpose of fatiguing them into compliance with his measures").