Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to stop Google indexing AJAX calls as pages

         

westcoast

4:31 pm on Jul 8, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



Main page "A" can call and embed a set of ajax pages, "/ajaxcall.php?id=X" based on what a user selects on page A.

There are thousands of these potential URLs.

What I've got happening is that Google is crawling all of these ajax calls and indexing them as individual pages:
mysite.com/ajaxcall.php?id=444
mysite.com/ajaxcall.php?id=1521251

etc.

These are all incredibly thin and useless outside the context of page A, and I don't want Google indexing them.

Can anyone tell me how one stops Google from doing what it's doing above? I could stick a NOINDEX tag at the top of ajaxcall.php, but does that not then risk noindexing main calling page "A"?

Note that I tried robot.txt'g the ajaxcalls, but Google ignored those and indexed the ajax anyway!

Thanks for thoughts...

webcat

5:08 pm on Jul 8, 2022 (gmt 0)

Top Contributors Of The Month



Have you done a test of your robots.txt to make sure it's formatted correctly to stop those pages from being indexed? Also, could it be that Google is accessing those "filler" pages to get a better understanding of what a user would experience viewing the main page? Finally, what about setting up canonical tags to the main page and letting Google do its thing? So many questions ... ;)

westcoast

5:36 pm on Jul 8, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



* Have you done a test of your robots.txt to make sure it's formatted correctly to stop those pages from being indexed?

Format is correct. Google even says it reserves the right to ignore robots.

* Also, could it be that Google is accessing those "filler" pages to get a better understanding of what a user would experience viewing the main page?

Sure, and that's fine, and it makes total sense in the context of the user experience of page A. It does not make any sense to take those little thin snippets out of the context of page A and create 2000 indexed thin pages out of them.

* Finally, what about setting up canonical tags to the main page and letting Google do its thing?

Because an AJAX call is not the same as Page A. It is a little piece of information that is part of page A. A couple of lines of content returned by an ajax callback is nowhere near canonical to the entirety of Page A.

Dimitri

6:05 pm on Jul 8, 2022 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



- What kind of content do these AJAX calls return? JSON? plain text? HTML?

- Are you casting these answers with a MIME type?

If your AJAX responses are "application/json" or "application/text", Google shouldn't index them.

You can also try to add to your AJAX responses , the X-Robots-Tag header : "X-Robots-Tag: noindex"

You can also test the User Agent and IP address, and if it's Googlebot, return an empty response, or a forbidden code, etc...

westcoast

8:07 pm on Jul 9, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



"If your AJAX responses are "application/json" or "application/text", Google shouldn't index them."

Ahhh thanks, I will try that! It was sending whatever default header was sent. probable text/html