Google quickly indexes and then removes new pages

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google quickly indexes and then removes new pages

lee_sufc

8:03 pm on Dec 3, 2015 (gmt 0)

Evening all!

I've got an established site (+15 years) that's been doing well (touch wood) in the SERPs for a while now.

On part of my site, I have a section where I post advice and articles where I try and add a new post every month. However, the past few weeks, I noticed a problem and cannot work out why it's happening.

Basically, I add a new page and within 1 day, I see it indexed on Google. However, a day later, it's gone. Searching for the page manually doesn't bring anything up, either. I'm not blocking it with robots.txt or anything and I cannot find any reason for this to be happening. There're also no messages on GWT. Is it just an anomaly and maybe the pages will come back at some point or could there be a more serious issue here?

Robert Charlton

10:57 pm on Dec 3, 2015 (gmt 0)

When you say "removes".. are the pages dropping down in the serps to where you can't find them, or are they vanishing from the index? What kind of tests have you run? Here are some basic low-hanging fruit type tests that initially come to mind...

Try a site:domain search with a unique text string in quotes, to see whether the page has remained in the index. The text, of course, should be html text on the page (as opposed, say, to the page title or description).

Also, try searching for the unique text string in quotes but without the site: operator, in case you're getting scraped. Also, try adding "&filter=0" at the end of the Google search page url, to turn off the dupe filter.

Anything different about these pages from other pages on your site, in the way of inbound or outbound linking? Are the titles and descriptions unique?

How is traffic to the rest of your site? What happens if you post a new test page on your site, but in another section? Try a test page using the same template... and then a test page using a different template.

Also, check the canonical tag, if you have one, to make sure that's not creating problems. I'd even check the header response.

lee_sufc

11:34 pm on Dec 3, 2015 (gmt 0)

Thanks, Robert.

When I search for the page using the site:domain, it's not there at all.

I did a header check and get this:

HTTP/1.1 200 OK =>
Date => Thu, 03 Dec 2015 23:33:01 GMT
Server => Apache
X-Pingback => http://www.example.com/blog/xmlrpc.php
Link => ; rel=shortlink
Vary => Accept-Encoding,User-Agent
Connection => close
Content-Type => text/html; charset=UTF-8

Nothing else is different about the new pages; simply added new posts via Wordpress and nothing else?

Really not sure what else could be wrong?

Could it possibly be a Google blip and maybe it'll correct itself in time?

Robert Charlton

11:40 pm on Dec 3, 2015 (gmt 0)

And a PS.

You posted...

I have a section where I post advice

What kind of "section" is this? Is it, eg, a subdomain or a subdirectory? What I'm looking for is some common factor, possibly in your server setup, site nav, or page template, that might group all of these pages together and perhaps provide clues about where to look.

lee_sufc

11:46 pm on Dec 3, 2015 (gmt 0)

It's a sub-directory (Blog). I have other pages on there still ranking (at the moment, anyway). It just seems to be the most recent couple of pages that I've added.

Robert Charlton

12:16 am on Dec 4, 2015 (gmt 0)

When I search for the page using the site:domain, it's not there at all.

I'm assuming you're also including unique quoted text strings in the search.

The confusing part here is that you see the pages indexed before they are dropped, which makes me think it's some kind of filter. If it were a dupe content filter, the dropped page, unless penalized, in my experience would generally come back in a week or so... but it would always also show up on a search for an exact text string in quotes from the page.

There was a period, just as Panda was beginning, when Google was not returning searches for quoted titles. As my theory goes, Google felt that the quoted string... in this case, the page title... was not predictive of a useful result, and thus the phrase-based index dropped it from that "layer" of the algo. Note that "layer of the algo" here is my way of describing my hypothetical construct of what the algo is doing. I would make sure that the text strings that you include in your search do not include the page title, or sections of the page title. I would try some searches with a variety of short five or six word unique phrases.

Beyond that, I'd follow up the Blog subdirectory clue for what it's worth... perhaps try adding a somewhat similar test page to a different subdomain on your site and see what happens.

I'd want a much larger sample size, though... more pages over time... before calling this a symptom of a problem, but it's wise to keep on top of this. I would keep careful records of the dates, etc, in case others start pointing to similar problems happening around the same time.

If you've got a date reference now... say when each page was indexed and when it disappeared... that might be helpful to add to this thread.

lucy24

4:19 am on Dec 4, 2015 (gmt 0)

It's a sub-directory (Blog).

Sub-directory or subdomain (like blog.example.com)? Does it physically live on the same server as the rest of the site?

timemachined

12:13 am on Dec 6, 2015 (gmt 0)

RE original post, tell me about it, drives me crazy.

I end up doing fetch, up to 3 times if necessary. I should monitor each page and do daily look ups but gets a bit much across several hundred pages. I'm sure pages keep dropping out but frustrating on new pages.

There's nothing wrong with the new, unique content etc. just google indexes and then sometimes drops. So I do fetch, because that's the idea right? Do a new page, it gets indexed and is supposed to stay there, especially when compared to some of the rubbish I'm trying to outrank which is 9/10 automated (pulled together dup) content.

I do have a thousand pithy pages like my competitors, they have backlinks to support their useless dragged together content however, I rely on SEO and keyword phrasing and internal links. It's possible it is a filter but can be beaten by using Google fetch enough but unsure if that pushes a different page out and unsure if I'm limited to x number of pages for inclusion with this site. Just trying to backfill the pithy pages, perhaps it will all click in the end...

nakkers

1:27 am on Dec 6, 2015 (gmt 0)

Dang timemachined, you really shouldn't be so obsessed with it like that ;P

timemachined

9:00 am on Dec 6, 2015 (gmt 0)

I think you might be on to something as that's my gf's favourite word. "you're obsessed"

glakes

1:36 pm on Dec 6, 2015 (gmt 0)

Unless there is a technical problem on your site, I'm guessing that your site does not appear in some of the data centers. Popular sites seem to get new pages pushed to all data centers fast because of their traffic volume. Most don't. I'm guessing in a week or less your page will appear in all of Google's data centers. Can you find it now?

lee_sufc

1:42 pm on Dec 6, 2015 (gmt 0)

glakes - it's been about two weeks now with no sign? Haven't got a clue why? Was thinking of the earlier suggestion of republishing into a different subsection of the site.

lee_sufc

4:01 pm on Dec 7, 2015 (gmt 0)

As there's still no sign of the two new pages being indexed, I'm going to now re-upload them to a new page in a different directory. However, before doing this, shall I simply 404 those other two pages or shall I 301 redirect them to the new URL?

timemachined

4:25 pm on Dec 7, 2015 (gmt 0)

404? 301? Surely duplicate content? Why would you redirect duplicate content?

If I was you, I'd leave the pages where they are and use google fetch in webmaster tools then see how long they last. Do social links to the new pages as well, from both Twitter and G+.

lee_sufc

4:31 pm on Dec 7, 2015 (gmt 0)

Hi Time Machined - they've been uploaded for a couple of weeks now with no sign? I originally though delete the original pages and re-upload to a new page?

I've used Google's fetch tool, promoted the pages multiple times on Facebook, G+, LinkedIn and Twitter - run out of ideas now?

timemachined

4:38 pm on Dec 7, 2015 (gmt 0)

Fetch again, I've done it three times before they stayed before now. It's a unique content page yes? I'm not an ex spurt but who is.

Have you done a site:domain.com "text" search to see if indexed?

not2easy

4:48 pm on Dec 7, 2015 (gmt 0)

In using "Fetch as Googlebot" does it show any blocked resources? And when you used "Fetch" did you submit to index? Do you have 301 domain canonicalization rewrite rules in place? (meaning the page can only be accessed at one URL) Just a few more things to check.

lee_sufc

4:51 pm on Dec 7, 2015 (gmt 0)

not2easy - I've just done a "fetch and render". Could see no issues (just images on the page being blocked by my robots.txt).

Weirdly, however:

I submitted the URL again (selected "just this URL). Then, went back to Google, did a search and voil�, on page one! Granted, this might not stick (again), but I can't believe it worked that quickly?

lee_sufc

4:54 pm on Dec 7, 2015 (gmt 0)

I just did it again for the second page and that too has been indexed - I just hope they stick this time (have been stressing something more sinister was wrong on my site).

I'm guessing this issue must mean my site isn't crawled regularly by Google so I should do this each time I create a new page?

timemachined

1:54 pm on Dec 8, 2015 (gmt 0)

Fetch is instant usually, however it may drop back out, move up a bit or move down a bit after entry.

Do me a flavour for a tick, go into URL Parameters under Crawl in webmaster tools and paste here what you have shown on that page. Just want to see if you have a similar issue as me.

Parameter URLs monitored Configured Effect Crawl

timemachined

1:56 pm on Dec 8, 2015 (gmt 0)

Oh and when you search and see yourself on page one, remember to use &pws=0 in the url. It turns off personalisation that we all hate.

lee_sufc

2:16 pm on Dec 8, 2015 (gmt 0)

this is what I get at URL parameters:

"Currently Googlebot isn't experiencing problems with coverage of your site, so you don't need to configure URL parameters. (Incorrectly configuring parameters can result in pages from your site being dropped from our index, so we don't recommend you use this tool unless necessary.)

timemachined

2:36 pm on Dec 8, 2015 (gmt 0)

You don't have a list of parameters underneath that text? If none showing, I think that's good at least.

Interesting that Google states that an empty page means 'no problems' yet on a page filled with parameters, they don't suggest it's a problem. Either it is or it isn't and if it was a problem, they should have messaged me and alerted to the fact.

This is mine, currently trying to block and remove from google. Reduced some by thousands but google incessantly monitoring others more.

paging 317 Oct 29, 2015 Paginates No URLs Edit / Reset
page_id 316 - - Let Googlebot decide Edit / Reset
searchcupon 315 Oct 29, 2015 Other No URLs Edit / Reset
ak_action 197 Dec 3, 2015 Other No URLs Edit / Reset
attachment_id 117 Oct 28, 2015 Other No URLs Edit / Reset
wpsc_action 101 Dec 3, 2015 Other No URLs Edit / Reset
p 77 - - Let Googlebot decide Edit / Reset
redirect_to 48 Oct 29, 2015 Other No URLs Edit / Reset
ver 30 Nov 8, 2015 Other No URLs Edit / Reset
gc_message_bar_redirect 11 Dec 3, 2015 Other No URLs Edit / Reset

lee_sufc

3:00 pm on Dec 8, 2015 (gmt 0)

That's odd - no, never seen anything listed (as you say; hopefully that's a positive).

I hope you resolve your issues - wish I could offer help (maybe someone will be along that knows more than me).

not2easy

4:55 pm on Dec 8, 2015 (gmt 0)

@ lee_sufc - Nothing would be listed under "URL Parameters" if you did not put any parameters in there. The list seen by timemachined is parameters they (or someone with access) added and configured. Google does not add parameters.

timemachined

6:10 pm on Dec 8, 2015 (gmt 0)

Incorrect not2easy, google sniffs around with spiders in places where it shouldn't and puts whatever it likes there. As I never put those parameters there, I edited them, didn't put them there though. Unless you're suggesting someone is really bored and hacked my account and put them there, they must be really bored.

not2easy

7:38 pm on Dec 8, 2015 (gmt 0)

You may see information about your site in various locations in the Google Search Console (formerly called GWT) but URL Parameters is a tool for you to use. At some point in time, someone added those URL Parameters. If you did not, someone else did, but Google did not add them.

The URL Parameters Tool is offered by Google as a way to control various URL parameters generated on your site. Google does not populate that tool with anything, it is there for you to use to tell google to crawl, not crawl, index or not index or whatever on various URL parameters. Read it from Google: [support.google.com...]

timemachined

8:27 pm on Dec 8, 2015 (gmt 0)

my account hasn't been hacked as far as I know and I didn't add those parameters. I only set most to no urls as frustrating as it has got in the way of my site being indexed properly. Repeat, Google added those parameters.

timemachined

8:31 pm on Dec 8, 2015 (gmt 0)

Quote from article "To access the feature, log into your Google webmaster tools account, click on the site you want to configure, and then choose Site configuration > URL parameters. You�ll see a list of parameters Google has found on the site, along with the number of URLs Google is �monitoring� that contain this parameter."

[searchengineland.com...]

You�ll see a list of parameters Google has found on the site,

Hence why Google stuffed me in the first instance by going places it shouldn't without warning me.

[edited by: timemachined at 8:36 pm (utc) on Dec 8, 2015]

timemachined

8:34 pm on Dec 8, 2015 (gmt 0)

Another post sorry, but didn't see this while tackling this problem the last two months.

"Seem overwhelming? Fortunately, another new feature Google has launched is the ability to download all of the parameter settings as a CSV file so you can sort though them offline."

Silly me, I think I did most already but will download the table and bulk deindex.

This 31 message thread spans 2 pages: 31