Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

SEO and Robots.txt that blocks Javascript

         

guggi2000

7:25 pm on Mar 31, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



For almost 10 years we have been blocking Javascript in robots.txt because we thought that SE should not pick up some small dynamic content that we injected with Javascript (like hovering tips for users, cookie notice, auto-complete files, etc...). When we ran Google Search Console's "Fetch as Google" today we saw that GoogleBot does not see the rendered page as the users would. Obviously!

Do you think that the small difference in the rendering could have a negative SEO impact?

rainborick

12:19 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google has let it be known that dynamic content that is not visible to users when a page is initially loaded is strongly devalued. That doesn't mean the page will rank lower, its just that the dynamic content itself will not greatly contribute to rankings. So, I'd say your rationale for blocking the JavaScript file no longer exists. Unblocking it can't hurt you and it may actually help, but no matter what you decide I wouldn't expect a significant impact given your description of the situation.

keyplyr

3:29 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google & other SEs have been reading & indexing JS content for several years (example: document.writeln.) Blocking JS snippets or trying to hide them via CSS has become futile. Better to just let the SEs have access to it all and remove any possibility of a negative action IMO.

Now if you're wanting to stop Adsense from seeing some text and have been using a JS snippet to display it, you could instead use these wrappers:
<!-- google_ad_section_start(weight=ignore) -->text here
<!-- google_ad_section_end -->

lucy24

4:14 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Instead of telling them not to crawl, you could try a noindex, like
Header set X-Robots-Tag "noindex"
(exact wording will depend, of course, on your server type). Not all search engines recognize the X-Robots-Tag header, but G### does, and that's who you're concerned about isn't it. It's perfectly understandable that you don't want random searchers landing on your scripts--not that it's hard to find javascript once you're actually on the page, but why make it easier for people.

I'm still not letting search engines crawl piwik, though. But that's way at the bottom of the page where it can't possibly affect anything.

keyplyr

5:17 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@lucy24 if the JS is a file in another directory, yes the X-Robots-Tag could be used that directory's htaccess. But, as said above, it no longer makes sense.

lucy24

6:30 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the X-Robots-Tag could be used that directory's htaccess

Mine's in my global htaccess inside a FilesMatch envelope for .js. I don't mind them crawling if it makes them happy, it just makes no sense for scripts to be indexed.

keyplyr

7:55 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well I think the OP referenced that JS use was used to hide text from being read/indexed by SEs, not to stop indexing of the scripts themselves.

I'm a throwback to when it wasn't a crime to use a lot of JS, and have yet to see even one of my 30+ scripts indexed and I don't block them.

guggi2000

8:59 am on Apr 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Examples of dynamic content that we don't want to be part of the page. It's about not having repeated content on every single page that is not really topic related:

- Cookie banner that is called according to GEO
- Alert banner that indicates that some of the expert generated content has not been approved by our editors.

Both won't be shown to returning visitors.

We are concerned that the fact that Google sees two different pages when they render the page (once for GoogleBot and once for the browser) may raise a flag or result in a small penalty.

keyplyr

9:13 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All the reason to not block the scripts. But IMO Googlebot has already parsed the content of those scripts. How much of that is used by Google is unknown, but unblocking JS would not cause anything more.

Robert Charlton

9:51 am on Apr 1, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This is going to be a longer and more specific version of what keyplyr has already said.

A lot of these small differences are inconseqential as long as they're not specifically aimed at Googlebot. Eg, a logged-in returning user who's tracked by cookies will get a "welcome username" message on the page which isn't what Googlebot will see, but that kind of small difference really doesn't matter to Google. Ditto, Google doesn't care about an IP determined location that's rendered for the user in javascript. As far as Google is concerned, an IP determined geo page is in Mt View.

Google will index the default page, without the cookie-driven login, so they won't be able to see the logged in material, and the form driven material probably doesn't matter to them. It might be preferable, though, to allow Google to access your js content rendering, if they want to try it, so they know it is inconsequential.

Googlebot might conceivably get concerned about what it can't see, so unblocking that particular javascript could soothe its worries.

lucy24

5:35 pm on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



have yet to see even one of my 30+ scripts indexed

You'd only see it if you did an exact-text search for some string that occurs only in your script-- which will then turn out to be a string that turns up in someone else's completely unrelated script. (I occasionally get inexplicable visits to one obscure page. I can only conjecture it's because I quote some bits of code and/or an unusual error message, that inadvertently matches whatever it is the searcher was really investigating.)

What you do want to avoid is anything that results in your page showing up in a SERP with a snippet that says only "This site uses cookies" or "This process requires javascript" or similar. Who's going to click on that?

iamlost

7:15 pm on Apr 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are three main reasons for using javascript:
Note: I use as little javascript as possible and never full libraries but the principle remains the same.
* as part of dynamic rendering of the DOM.
* as NOT part of dynamic rendering the DOM.
* as part of dynamic rendering of the DOM but behind a 'wall' where you only want humans AND you don't want pages indexed.
I keep the three types separate so the first is crawled and the second and third not.
If 'wall-ed', why bother hiding javascript? Because it leaks information AND because I don't want it being returned in results.

A problem rarely mentioned is that the various 'official' SE bots are not their only crawlers. They do like to check up to see if what you serve the official flavour is what you serve the unofficial. They also like to get at pages ruled off limits by robots.txt, etc.

Careful use of robots.txt, X-Robots-Tag header, meta noindex, etc. enables appropriate SE page rendering but keeps actual scripts and private aka not for SE consumption pages from query results.

Yes iamlost: viewing SEs as reasonable marketing vehicles and so happy to share sufficient samples to get the traffic I need for the conversions I want. I am not willing to open every room and cupboard especially as SEs' value is comparatively minor these days (sites' average conversion rate is ~9%, SE traffic but ~3%).

Afterthought: these days, especially with WordPress et al, javascript is increasing playing second fiddle to the CSSOM (CSS Object Model) as problematic both in time to render and what is rendered to whom when - a subject for another thread.

guggi2000

10:16 pm on Apr 2, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



How do you tell Google not to take certain html parts into account, i.e. like a repeating privacy message at the end of the page. I remember vaguely that there was a way to do it in html ?

And more importantly, is there a way to block some parts JS?