Pages indexed chart (wmt) - bug? - Bing Search and SEO forum at WebmasterWorld

Forum Moderators: mack

Message Too Old, No Replies

Pages indexed chart (wmt) - bug?

SEOPTI

4:48 pm on May 14, 2012 (gmt 0)

I'm seeing a huge drop in indexed URLs in Bing WMT. This started on May, 5th. From 500k URLs -> 350k URLs, this is huge.

Either it's a bug or something changed on this day. This chart has been consistent for at least six months.

It affects all domains with millions of long tail local URLs. It doesn't matter what link profile or user engagement they have, they are all affected exactly in the same way.

Probably their is a connection to this thread:

[webmasterworld.com...]

rustybrick

12:44 pm on May 15, 2012 (gmt 0)

My indexed charts are static pretty much across several sites.

BostonGuy

1:59 pm on May 15, 2012 (gmt 0)

We saw a similar drop on May 5th. We went from 1.25 million pages indexed to 1 million and saw a traffic decline of 20% on Bing and Yahoo.

textex

2:10 pm on May 15, 2012 (gmt 0)

We had three sites delisted on the 5th. Bing says there is not penalty and and our sites are alls showing as having all their pages in the index. Just not showing in results.

SEOPTI

2:25 pm on May 15, 2012 (gmt 0)

Thanks for the reports. I hope the mass deindexing of URLs will stop. It just does not make sense at all. It is exactly the same drop for all domains.

Indexed URLs Chart:
_ = 1 day

_ _
\_ _
\_ _
\_ _
\

The number of indexed URLs drops every two days. This is really strange.

bingdude

7:33 pm on May 15, 2012 (gmt 0)

@Boston & SEOPTI - you have mail here at the forum - can't investigate without the domain names...

lucy24

9:13 pm on May 15, 2012 (gmt 0)

Weird. Do your numbers use a different date from the chart? At the top, in words, is

"pages indexed(%)"

giving the number from 4 days ago and the percentage change from 4 weeks before that (popup gives the exact dates and numbers). In my case that's 8% -- and I can assure everyone it is not because my site is 8% more interesting than it was a month ago :)

But the graph goes 2 days beyond that, dropping to a count that is now 1 less than it was on the start date (same date for text and graph). My peak date was May 3d with a one-time spike of about 10% both before and after. No relation to the day I manually removed a cluster of long-outdated URLs; that was about a week later, with mysteriously no downward hiccup in the graph.

Wish there were a way to figure out if those are the same 10% of pages, indexed and then promptly de-indexed again, or just some unnerving coincidence.

martaay

3:49 pm on May 16, 2012 (gmt 0)

I'm seeing a large traffic decline showing in my bing webmaster tools account on May 1st - bingdude could you possibly investigate this for me also? pm me and I can provide urls

SEOPTI

3:54 pm on May 19, 2012 (gmt 0)

The drop continues. What I see is nofollow links started disappearing completely in webmaster tools.

It could be the case they treat nofollow links in a different way now and they don't pass link juice at all.

lucy24

1:28 am on May 20, 2012 (gmt 0)

Interesting that you should mention nofollow, because I've got one particular page whose only links are no-followed-- and it was crawled only the other day. (I took a closer look specifically because this thread made me curious.) I can only hope that their sole reason for crawling the page was to confirm that it is still flagged "noindex".

And, yup, since the last time I looked, I've changed from +8% to -6%.

OK, this is clunky but I can't think of any other way to get the information.

Go to the Index Explorer page. Open every single directory on the list. (If there is an Open All button that I've overlooked, I am going to feel stupid.) Select the entire list. Paste into text editor.

:: pause for jaw dropping as I discover that this gives me, in one fell swoop, the full content of every popup on the page ::

Save. Now I've got something to compare to in a few days if the number is again different.

BostonGuy

5:04 pm on May 21, 2012 (gmt 0)

The drop continues for me as well, nearly a 30% decline since early May. I have not investigated our nofollow links, but that is an interesting observation.

SEOPTI

6:02 pm on May 21, 2012 (gmt 0)

Duane, did you have time to check some of the URLs I sent last week via PM. I know you have tons of work to do and I'm simply stuck with this deindexing problem. Thank you!

lucy24

10:12 pm on May 29, 2012 (gmt 0)

Continuing...

I've got two text files in front of me. One based on bing's Index Explorer from 19 May, one from 27 May. Drop between the two, 21 pages or about 25% -- just to show that they appear to be doing this to everyone, not only the Big Boys. More exactly, it's 23 pages out, 2 pages in. No idea why they added those two pages; they've been around forever. Not complaining, though.

It seems to have leveled off; I just wish I had known beforehand so I could have started checking on May 3, which was the spike date for me. The no-longer-indexed pages are really gone; Bing WMT isn't just saying so to alarm us. I tried a few unique-phrase searches and came up cold.

First discovery: There's a big lag between crawling and indexing. This may be proportional rather than absolute, so ymmv. The "last crawled date" on the earlier index varied, but never less than 13 days before the index date. The "last crawled date" on the later index is newer and-- here's the interesting part-- for many pages the crawl date given is before 19 May. That is, the "last crawl date" isn't really the most recent date; there's some kind of limbo in between. (I hadn't the energy to check raw logs and see how they compare-- in particular, how many pages are they crawling but not indexing?)

Even the current list of indexed page isn't completely accurate, because it includes pages that I've explicitly removed from SERPs due to redirects and so on. They don't get de-indexed; they just sit in a back room somewhere.

So what got removed? Some are legitimate. For example, a few pairs like

:: cough-cough, ahem ::

http://example.com/ebooks [this is Bing's naming format for a directory's index file]
and
http://example.com/ebooks/index.html

A stray case of

oldestname.html
oldername.html
redirect.html

where they've finally dumped oldestname-- but not yet oldername. Likewise a few-- but by no means all-- files so old, I've gone from a year of redirecting to an unequivocal Gone.

One no-longer-indexed file is in a roboted-out directory. Well, thanks, Bing. To make up for it, two still-indexed files are explicitly labeled noindex-- and located in public directories, so no falling back on "Well, but how were we to know it's noindex? You wouldn't let us see!"

Others are more puzzling. A few very specialized pages from the /fonts/ directory. Two pages so new, they can only just have been indexed-- and then Bing turns right around and de-indexes them. Both are almost entirely in a language (and script) Bing doesn't know, which may be relevant. But surely they'd have noticed in the first place?

There's one specific deletion I can't figure out at all. No way, no how. Over a year ago I completely rearranged one directory. All pages are still there; they just have different paths and slightly different filenames. No particular change to title or text. The number of pages is out of all proportion to the weight of the content, so humans got a special 404/410 page and robots got a simple Gone. Let them find the new locations from scratch; the top level of the directory is unchanged.

A week ago, one matching pair was indexed both ways: Old URL (in spite of steady diet of 410) and new URL. A week later, the new URL is gone and the old URL-- the 410 version-- is still in the index.* Huh what?

:: insert "noidea" emoticon here ::

* And still getting crawled. At this point I threw in the towel and added redirects for a few specific long-gone pages that the search engines persist in looking for. Well, if they want it that badly-- but not badly enough to realize it's directly linked from a page they crawl regularly--

g1smd

11:25 pm on May 29, 2012 (gmt 0)

Bing takes forever to follow redirects and index the new URL for a page.

Searchengines never forget a URL. They will recrawl every URL they have ever seen forever, even after it rediects, or returns 404 or 410.

Pierre Far mentioned at SMX London that Google recrawls 404 and 410 pages and that those returning 410 are recrawled less often. Google continues to look at these pages because a significant number of them eventually come back into use returning real content served with 200 OK status.

BostonGuy

5:45 pm on May 30, 2012 (gmt 0)

This week has been a good week on Bing/Yahoo. Our index has returned to over a million and I have seen a 30% increase in sessions, nearly a full recovery from early May. I hope you all are seeing the same.

SEOPTI

3:56 pm on May 31, 2012 (gmt 0)

I don't see a recovery. I think in my case it has to do with sitewide links from authority sites. I always had a huge number of nofollow sitewide links from a few authority sites. It seems to me the links don't pass on the same authority as before. As posted above most nofollow links disappeared in Bing WMT suddenly.

Lesson learned, never stop link building and never build more than 1-5% of nofollow links for a healthy link profile. I'm curious how long it will take to get the 60% of lost URLs back in their index after gaining some new authority dofollow links.

I think "static rank" should not be a query independent value. I don't get it why they remove URLs with no negative feedback from users just because they don't have the same link power as before.