Google Indexing Pages They Can't Get To

Forum Moderators: open

Message Too Old, No Replies

Google Indexing Pages They Can't Get To

Limited IP Firewall

IanTurner

9:33 am on Apr 24, 2019 (gmt 0)

I have an interesting situation in that Google is listing Urls in search which it can't actually get to. The pages are intranet pages in an organisation and the IP access to those pages is strictly limited to IPs within the organisation.

The pages have robots meta tag and http header tag - but of course these are irrelevant because the spider doesn't get anything other than the domain is unavailable because the request is blocked

I'm assuming that this is due to people inside the firewall using Chrome to access the pages and then chrome sending that data back to HQ and the index then having a record of the URL.

We are planning on ensuring that all chrome instances in the organisation are set to not send data back. Are there any other measures we can take to ensure that these URLs are not listed in Google SERPs

engine

9:56 am on Apr 24, 2019 (gmt 0)

Is the page actually indexed, or simply a link to it?
Is there a cache of the page?

IanTurner

11:06 am on Apr 24, 2019 (gmt 0)

No cache, just the URL and a 'No information is available for this page' notice. Some of the Urls are a little sensitive though so need to stop it happening if at all possible.

engine

11:54 am on Apr 24, 2019 (gmt 0)

I wonder if the data is coming from external links on the web. Check it's not got links pointing to the content.

The other alternative, as you say, is Chrome, or the site's been scraped.

Make sure all the tags are 100% correct on every page, and noindex and nofollow employed.
[support.google.com...]
HTTP response header
[developers.google.com...]

You've probably done all that.

IanTurner

2:08 pm on Apr 24, 2019 (gmt 0)

Yes - I have done the tags - am doing a recheck on those at the moment, just in case someone has gone in and messed them up somewhere.

It could be external links - thanks for picking that one up - can't necessarily do anything about that but I suppose staff could be posting the links to twitter/slack or something like that. Will give them a kick to remind their staff about that one.

Site shouldn't have been scraped as it is behind a firewall, but its a possibility - will try some very specific searches to see if I can find other (scraper site) pages with the same content.