Welcome to WebmasterWorld Guest from 3.228.21.186

Forum Moderators: open

Google Indexing Pages They Can't Get To

Limited IP Firewall

     
9:33 am on Apr 24, 2019 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator ianturner is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 19, 2001
posts: 3667
votes: 55


I have an interesting situation in that Google is listing Urls in search which it can't actually get to. The pages are intranet pages in an organisation and the IP access to those pages is strictly limited to IPs within the organisation.

The pages have robots meta tag and http header tag - but of course these are irrelevant because the spider doesn't get anything other than the domain is unavailable because the request is blocked

I'm assuming that this is due to people inside the firewall using Chrome to access the pages and then chrome sending that data back to HQ and the index then having a record of the URL.

We are planning on ensuring that all chrome instances in the organisation are set to not send data back. Are there any other measures we can take to ensure that these URLs are not listed in Google SERPs
9:56 am on Apr 24, 2019 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:26367
votes: 1034


Is the page actually indexed, or simply a link to it?
Is there a cache of the page?
11:06 am on Apr 24, 2019 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator ianturner is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 19, 2001
posts: 3667
votes: 55


No cache, just the URL and a 'No information is available for this page' notice. Some of the Urls are a little sensitive though so need to stop it happening if at all possible.
11:54 am on Apr 24, 2019 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:26367
votes: 1034


I wonder if the data is coming from external links on the web. Check it's not got links pointing to the content.

The other alternative, as you say, is Chrome, or the site's been scraped.

Make sure all the tags are 100% correct on every page, and noindex and nofollow employed.
[support.google.com...]
HTTP response header
[developers.google.com...]

You've probably done all that.
2:08 pm on Apr 24, 2019 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator ianturner is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 19, 2001
posts: 3667
votes: 55


Yes - I have done the tags - am doing a recheck on those at the moment, just in case someone has gone in and messed them up somewhere.

It could be external links - thanks for picking that one up - can't necessarily do anything about that but I suppose staff could be posting the links to twitter/slack or something like that. Will give them a kick to remind their staff about that one.

Site shouldn't have been scraped as it is behind a firewall, but its a possibility - will try some very specific searches to see if I can find other (scraper site) pages with the same content.