|Google refusing to index a page|
I've recently starting managing the SEO for an ecommerce website.
The page for one of our primary products is not in Google's index.
All of the obvious checks have been carried out for valid code, no spam, no duplication, super-clean meta content, correctly configured headings and body copy, no bad inbound links, good quality internal linking from the page.
It is the only page on the site for this product, so there are no other similar pages.
We have Adwords running with ads pointing to this page and they work, also no quality errors are reported for these ads either.
I submitted a sitemap with just that one page. WMT reported 1 item submitted, 0 indexed.
I tried using Fetch as Googlebot. This time WMT said that the page had been fetched and submitted to Google's index. It wasn't.
I've run out of things to look for and could do with some advice on where to look. Is there something I've not mentioned above that I should be checking?
First thing is to make sure there's no robots tag in the source that says "NOINDEX" (seems obvious, but one has to ask)
Are other pages from the site indexed?
Was this page ever indexed and fell out, or has it never been indexed?
Is it somehow blocked in robots.txt?
Does this page show up okay in Bing?
How long ago was this page created?
Is it possible there's some leftover redirect that someone forgot about?
You forgot to say how you know this specific page isn't indexed. Did you do an exact-text search? Or is the site so small that a site: search should turn up the entire contents of the site? Has the googlebot been crawling the page? For that matter, has the adwordsbot (I forget its formal name) been crawling regularly?
(Query: Would g### outright deindex a page-- as opposed to some random algorithmic drop-- without telling you in wmt? I thought that kind of thing was manual.)
They don't always tell you. I think it's still possible to not be indexed AND not have a manual penalty - or at least not one they tell you about.
|You forgot to say how you know this specific page isn't indexed. Did you do an exact-text search? Or is the site so small that a site: search should turn up the entire contents of the site? |
Can you answer this question? Also, if you do the following:
do you have your page URL returned or not?
And if you don't, then you should work down netmeg's list.
I first noticed it was missing because it wasn't showing up in my WMT Structured Data section so out of interest I tried to use the Data Highlighter on the page and immediately got a message that I can't use the highlighter on a page that's not in the index. Then I used the Site: check and noticed it really wasn't there too.
No, not blocked by robots meta tag nor robots.txt and yes, it does show up in Bing.
The whole site is new (upgraded to HTML5) but from my initial site health report I carried out, I think the old version of the page was not in the index either. I can also state that as of today there are no back-links to this page (God knows what the previous SEO peeps were thinking, this is one of the primary product pages) and sure I'll be working on sorting that out but still, that in itself shouldn't preclude the page from being indexed. There are many pages on the site which are in the index but have no back-links at the moment.
This is farfetched but the response header for the page itself can issue a no-index request. That's how I finally got my robots.txt file de-indexed.
The format in the response header is:
.htaccess or php code could cause this.
Yea but then it probably wouldn't be indexed in Bing.
I guess I meant to add this afterthought; perhaps your site has been hacked by a competitor? Conditionally blocking Google indexing a key page which still, frankly, is all that matters.
Is there any chance at all that the product displayed is something G might not index for any reason? Just asking.
@bumpski Yep netmeg kinda closed the response header matter off, that's not the issue but worth thinking about (I'm fast running out of ideas for things to look at, so thanks.) Yes, it certainly is possible that it's a competitor hack, I can't exclude that but then I also don't know how that can be a) confirmed and b) repaired.
@tangor No, it's as regular as wire, a standard everyday product with both B2B and B2C appeal.
There is another factor which I forgot to mention. All of the pages on the new site have new URLs. The old site had .html pages, the new ASP site doesn't. The old URLs are all mapped to the new pages using 301s in the web.config file. All of the mappings work, even for this product page. However, for this particular product, Google is retaining the old version of the page in its index, even though it 301s to the new page. For all other products, Google has indexed the new pages. There is nothing unusually different about this product versus all other products on the site, no special coding or treatment for this page. That is why I felt that perhaps it was some kind of singled-out penalty for a quite separate transgression of the quality guidelines. Is that possible? Has anyone else ever experienced a penalty being used in this way?
|However, for this particular product, Google is retaining the old version of the page in its index, even though it 301s to the new page. |
What time span are we talking about? Google can be quite persistent when it comes to URL changes and prefers the old URL over the new one for many weeks, not displaying the new one.
Are URL parameters involved? Check parameter handling in WMT.
Applying WMT URL removal to the old URL could force Google but is not worth it as far as I can tell.
It might be worthwhile to try using the "Fetch as Googlebot" tool in WMT on one of these URLs and submitting it to the index just to see if it takes. As long as there are internal links to the URL it should, of course, get indexed. Anyway, it might give you something to follow up on.
|Google is retaining the old version of the page in its index, even though it 301s to the new page. |
Are you sure that the OLD page has been crawled and if so, how long ago? Google needs to crawl the old page in order to see 301. Also, could you double check that the OLD page URL is not blocked by robots.txt otherwise Google will not see 301.
If after checking the logs you see that the old URL has NOT been crawled (and therefore Google is not seeing 301), then it is possible that the new URL has been filtered out as duplicate content of the old URL.
|Google can be quite persistent when it comes to URL changes and prefers the old URL over the new one for many weeks, not displaying the new one |
Seoholic is right, I have seen this many times.
Also, the time may depend on how big the site is - that is, how many pages have changed their URLs. Bigger change may take longer.
And lastly - I presume you are internally linking to new URLs? And that this product page which has a problem of being indexed is internally linked with the new URL and there are no internal links with old URLs?
Having said that, have you perhaps tried to link to a new URL of this product page somewhere from the home page in the attempt to emphasise this product page importance?
All of the above should be investigated.
I would be curious to what happens if you give that page a new URL and that you comb the site to make sure any intneral links to that page show the new URL and you do not 301 and OLD links to that new URL. Just post it as new.
What happens? (That's the next step)
(If a 301 is the problem, get rid of the problem)
I have found that Google will respond fairly quickly to:
<link rel="canonical" href="New Page"> installed in the old page, as long as the new page is in the same domain as the old page. (There's been no mention of canonical)
In order for this to work though, the 301 from the old page to the new page will have to be removed so Google can crawl (must crawl) the old page, and clarify to itself, that the old page is indeed similar to the new page. With the 301 in place Google never is allowed to see the old page. To see the "rel=canonical" to the new page Google must crawl the old page. So put a link to the old page somewhere out of the way, leaving links to your new page in place.
Only once Google accepts the new page into the index, should the 301 redirect be reinstated.
Of course this is a judgement call, too much time may have elapsed and Google is in the process of finally acknowledging the 301?
@seoholic: the new site went live end Feb this year. I do understand about Google opting to still show old URLs in SERPs but as far as I know, they still index the new page. Therein lies the prob. Google isn't indexing the new page. URL parameters were used extensively on the old site. The new site has none. We have 301'd the non-parameter part of the old URLs to the new pages (and for all other pages this is working fine too). I don't want to use the URL removal tool but you do infer the point that perhaps returning a 410 on the old page could work. That's something I can try (in conjunction with tangor's suggestion of treating the new page like a "new" page rather than a 301'd version of an old page.
@rainborick: Already tried this (I think I mentioned that in the initial preamble. But thanks.
aakk9999: Yes, perhaps the old page hasn't been crawled yet, valid point. But...I don't see why the new page would not be in the index. I can understand your idea that the redirect may be lagging but the new page should still be in the index (I know the new site has been crawled and using Fetch as Googlebot forced a crawl of this particular page too). The site is 150 URLs so not massive at all. All current internal links are correctly pointing to the new pages and none point to old pages but no, it is not currently directly linked from the home page. I'll try that. Thanks.
@tangor: I like that, certainly worth a pop.
Well guys, I've got a few things to try here. Thanks very much for your attention and ideas. I'll report back as soon as I've tried out your suggestions.
@bumpski Sorry, our replies seem to have crossed, I've only just seen your comment. I can't reinstate the old page because it was using PHP technology and the new site is ASP-based but as has been suggested already, I can remove the 301 for a short time.
|perhaps the old page hasn't been crawled yet |
Never, since February? That's impossible (she said, flatly) unless there's a new robots.txt blocking the old URL. And all the other new URLs link to the missing URL, right?
|as long as the new page is in the same domain as the old page |
I got the impression it's a new domain. Or is it just a physically new server?
|I've only just seen your comment. I can't reinstate the old page because it was using PHP technology and the new site is ASP-based but as has been suggested already, I can remove the 301 for a short time. |
ASP should allow you to create the exact old URL with the new content, BUT, make sure the old URL has the "link rel=canonical ..." to the new ASP page. Even if you create a static page. And again the 301 would have to go away temporarily.
Haven't dealt with ASP much.
Again I only recommend this for the same domain.