Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

noindex part of page? possible?

         

Ramian

8:44 am on Mar 15, 2015 (gmt 0)



Hi All,

There is a way to tell SE's "Don't index only this part?" (because it will be duplicate content from major websites)?


Thanks all!

rish3

4:38 pm on Mar 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



No, there's not really a way to instruct the search engine to "noindex" only a portion of a page.

You can play some javascript tricks, like loading the duplicate content after the page load with ajax. However, there's no guarantee that search engines won't be able to see through that sometime in the future.

If the page has significant value and content beyond the duplicated part, I don't see it as a real issue. There are many high value pages that have long passages of content quoted from other sources. The trick, of course, it that they also have significant amounts of their own high quality content on the same page.

not2easy

7:21 pm on Mar 15, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Surround with <blockquote> tags? If you mean something like a product description, not critically important, duplicates all over the web. A lot depends on the content and context whether it is something to worry about.

aristotle

7:42 pm on Mar 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In some circumstances you might choose to display the text on an image.

lucy24

8:14 pm on Mar 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



because it will be duplicate content from major websites

This sounds hinky. I hope you didn't mean it that way. I guess you could put the not-to-be-indexed part inside an iframe* and then the framed text would be subject to the original site's crawling-and-indexing directives.


* I cannot possibly be the only person who habitually reads this as "iFrame" and then wonders why Apple has an HTML element all its own.

Hoople

10:03 pm on Mar 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Or put the duplicate text in an image (can't believe I actually said that).

phranque

10:55 pm on Mar 15, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, Ramian!

Google is likely to penalize you for cloaking your content or blocking resources.
they have gotten good at OCR and can use a headless browser that processes CSS and JavaScript.
anything you try will not be sustainable and might disappear next month so keep that in mind.

seoskunk

11:34 pm on Mar 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wrap the text in a div tag with style-"display:none" then use javascript to show the element to users.

Google will ignore the content inside the display:none div.

lucy24

11:56 pm on Mar 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wrap the text in a div tag with style-"display:none" then use javascript to show the element to users.

Google will ignore the content inside the display:none div.

I am inclined to think that most of phranque's comment would apply equally to this approach.

seoskunk

12:02 am on Mar 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am inclined to think that most of phranque's comment would apply equally to this approach.


But then most sites with slideshows that often are coded this way would also be penalised.

johnhh

12:13 am on Mar 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



lucy24 quite correct , Google will also read javascript variables as well

seoskunk

12:29 am on Mar 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Interesting however as I just spent some time looking at why Google ignores slideshow content because its wrapped in diplay:none div I would have to disagree with both of you on this occasion. If you code inline style ie

<div id="ignored_by_google" style="display:none'">
Blah Blah Blah
</div>
<script>
window.onload = function() {
document.getElementById("ignored_by_google").style.display = "block";
}
</script>


Google will ignore "Blah Blah Blah" currently.

lucy24

1:24 am on Mar 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Dunno about John, but I was referring to the "and then turn around and secretly make it visible again via javascript" aspect.

Ramian

6:26 am on Mar 16, 2015 (gmt 0)



Hi All and Thanks!
Just to be clear, If I'll use this code:
<div id="ignored_by_google" style="display:none'">
Blah Blah Blah
</div>
<script>
window.onload = function() {
document.getElementById("ignored_by_google").style.display = "block";
}
</script>

It's Safe? Without penalties?

Ramian

7:29 am on Mar 16, 2015 (gmt 0)



Update: I tried to use this code, but google bot can see the text :|.

image: [postimg.org...]

lucy24

7:50 am on Mar 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



google bot can see the text

Well, of course they can see it. The question was whether they will index it.

My impression is that the "Google will ignore" idea is a reaction to an earlier phase in SEO activity. In those halcyon early days, people would put huge wads of keyword-stuffed text inside {display: none;} sections, so Google would count it as part of the page and rank accordingly. Once a search engine gets wise, it's a simple matter to look at the CSS and discount anything that would be invisible to a human.

But if google can figure out which parts of a page's content aren't visible to humans, they can equally figure out which parts are visible to humans, no matter how many layers of scripting and stylesheets it's buried in.

rish3

11:41 am on Mar 16, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Here's an example of something that they currently don't index...loading content via ajax, after the main page itself has loaded.

[jsfiddle.net...]

Again, could change in the future.

Barbados

1:17 pm on Mar 16, 2015 (gmt 0)

10+ Year Member



* I cannot possibly be the only person who habitually reads this as "iFrame" and then wonders why Apple has an HTML element all its own.

No Lucy, you're not ;-)

Ramian

1:46 pm on Mar 16, 2015 (gmt 0)



Well, of course they can see it. The question was whether they will index it.

it's INDEXED! :|
[postimg.org...]

gameon

2:42 pm on Mar 16, 2015 (gmt 0)

10+ Year Member



Hiding the content in an image? Google will spider the image! Iframes are a better option, or how about spending time writing unique content ;)

lucy24

3:48 pm on Mar 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Incidentally...
they have gotten good at OCR

but apparently aren't wasting it on Google Books yet ;) (Generic OCR is one thing. Training your OCR program to work with one specific book is a whole nother venture.)

Are there any clearcut, provable cases of a text image being OCR'd and indexed as text? Bonus points if the offending text image includes an alt and/or title whose content is intentionally different from what the text actually says.

aakk9999

6:14 pm on Mar 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, of course they can see it. The question was whether they will index it.

it's INDEXED! :|
[postimg.org...]


But would it be ranked? Maybe it is indexed but shows only when you do a query for this exact text in conjunction with the site: search.

I have seen numerous times where some content is indexed because Google shows it when you search for it alongside site: command or even then it is only shown when you click on "repeat this search with omitted results included", but is nowhere to be seen if you try to rank for it.

Therefore this text is kind of demoted. The main question is: does the existance of this repeated text harm the page or the site?

If it is just being ignored and only shown when searching for the text in quotes using site: command, then all should be fine. But if it weights against the page and having lots of these against the site as a whole, then one would want the way to not show it to Google.

As for cloaking... wouldn't blocking ANYTHING in robots.txt be in fact cloaking?

Ideally, OP should make sure that there is enough of their own unique content on the page to outweight the content copied from "major websites".

<edit>Fixed quote</edit>

[edited by: aakk9999 at 11:28 pm (utc) on Mar 18, 2015]

Ramian

6:31 am on Mar 18, 2015 (gmt 0)



aakk9999 - My problem is when I'll copy the content I'll show it in LightBox..this is the problem. I can't put there Canonical/noindex.

any suggestions?

Ramian

7:57 am on Mar 18, 2015 (gmt 0)



AND, what you can tell me about:
<!--googleoff: index-->
<!--googleon: index-->

It's work?

netmeg

12:28 pm on Mar 18, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Work for what?

Ramian

12:32 pm on Mar 18, 2015 (gmt 0)



to tell Google "Don't index this part.."

rish3

1:14 pm on Mar 18, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



AND, what you can tell me about:
<!--googleoff: index-->
<!--googleon: index-->

That is for a google enterprise search appliance...not for google's official crawler. Won't help you.

Ramian

1:44 pm on Mar 18, 2015 (gmt 0)



So..I'm still looking for an answer......

rish3

1:53 pm on Mar 18, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



So..I'm still looking for an answer...


You've been given several. They all boil down to two high level choices

1. Play tricks with javascript, knowing that Google may change what it chooses to crawl and index in the future.

2. Have enough unique, quality content around the duplicated snippets that it doesn't matter.

Being that you're going to load the content into a lightbox, I would just place the duplicate content into a separate url that's marked as noindex (via a meta tag), and load it into the lightbox with ajax. Most lightbox implementations support the idea of passing them a url already.

olenoides

2:18 pm on Mar 18, 2015 (gmt 0)

10+ Year Member



I have a few eCommerce sites where every product listing has a substantial amount of boilerplate content but also has significant amounts of unique content. I've never had this boilerplate content cause issues. As far as I can tell it pretty much just gets ignored by Google for indexing/ranking purposes.

My experience matches what Google has said on the issues many times recently that in these situations the duplicate content more or less gets ignored. But, the page needs to have enough unique content that it's not seen as a thin content page.
This 32 message thread spans 2 pages: 32