Not being fully indexed, Who is at fault?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Not being fully indexed, Who is at fault?

Google or Webmaster

F_Rose

3:20 pm on May 10, 2006 (gmt 0)

I need to understand this..

Google is only indexing a very small portion of our site...

Matt Cutts is claiming that BD is over..The reason websites are having trouble with supplemental results and not being indexed is due to spam..

Reading webmasterworld for the past few weeks, everyone claimed everything is Google's fault, just be patient with your site..Google needs to recover..

What is your opinion now..Not being fully indexed means that our site has duplicate or some other spam issues?

g1smd

4:07 pm on May 10, 2006 (gmt 0)

I don't know much, but I do know that Google has (at least) four versions of their index in play, and some of them have a LOT of issues.

Matt Cutts makes some interesting points in his Blog today, but there are many more issues that are not being commented on at this time.

RS_200_gto

4:16 pm on May 10, 2006 (gmt 0)

Google does not deserve accolades from me at this moment.

This random up and down, here today gone tomorrow in Google's new game!, is giving me a King size google headache!

JuniorOptimizer

4:22 pm on May 10, 2006 (gmt 0)

Pagerank has everything to do with whether your website is crawled deep or not.

F_Rose

4:24 pm on May 10, 2006 (gmt 0)

gs1md,

What important issues are you getting out of Matt Cutts blog?

To be honest he is confusing me more..

I am trying to see if the pages that G is indexing is differing from the rest of the pages (as far as SEO) and can't figure it out. I have followed all guidelines on all of my pages.

My question has anyone done recently (after April 26)a change that has increased thier pages indexed on Google?

F_Rose

4:26 pm on May 10, 2006 (gmt 0)

"Pagerank has everything to do with whether your website is crawled deep or not."

You are right..Why in the world has Google stopped crawling deep into my site?

Why are they listing some pages while others aren't even being indexed?

texasville

4:30 pm on May 10, 2006 (gmt 0)

It is NOT our fault. And it is NOT due to spam. I see some incredibly black hat sites that have survived this nicely. These are sites that do NOT do well in Y! or msn or ask.
If you are white hat, not suffering from your own problems on your site and badly indexed in G but fully indexed in the others, it can only produce one conclusion: Google has missed the mark.
They have made every excuse in the book and finger pointed at the webmasters and claimed "spam" but what it all comes down to is they have a super problem and can't seem to fix it.
I was believing them for a while. I even signed up under the gmaps. No problems there according to them. So, it leaves only one conclusion: it's THEM!
It leaves me wondering. Google is supposed to have all the brainy guys? Then why do Y! and msn have no trouble with our sites? And I am not talking about a big complicated site. Google can't even index a 30 page site. Straight html built so as not to run afoul of any G guidelines. I have Xenu'd and checked this site from one end to the other. Pffft!
It's Google.

tedster

5:13 pm on May 10, 2006 (gmt 0)

I'd say the answer to the title question is that sometimes it's the website and sometimes it's Google. Discovering which is the case is what's driving many webmasters up the wall right now.

You can have any number of technical issues that don't seem to affect your site for a long time, and then all of a sudden they do. Part of this current moment with Google may be (for some sites) that Big Daddy finally allowed them to handle more data, compare more data on the fly, and so on. This could even mean the site trips some spam filter that it previously avoided.

Sometimes there may be a dirty trick played by a competitor who notices a liability with your urls, and they begin to point troubled links at your site -- eventually they get spidered and your apparently trouble-free site starts to show trouble. Or you make what seems like a small change, but that change opens up a liability to googlebot that it never had access to before.

So it is important for the webmaster to do some due diligence, assuming there "may" be a problem they can address. These threads go into a lot of the detail that is worth checking:

Checklist for Sudden Drops in Rank [webmasterworld.com]
Dropped from Google - a checklist to find out why [webmasterworld.com]
Dropped Site Checklist [webmasterworld.com]
The url-only problem [webmasterworld.com]

All that said, I am seeing sites vanish from the index that make no sense to me -- for example, a PR7 home page, ten years on line, and over 100,000 very natural backlinks, most of them to deep pages on an extremely authoritative information site.

So I do think we have a mix of "fault" going on.

Right Reading

5:22 pm on May 10, 2006 (gmt 0)

I am one who has seen the number of indexed pages drop. My site is noncommercial and if I have tripped a spam filter it is accidental.

Just now I had an odd experience with G's search page. I was in classic search and I clicked to switch to personalized search. The result was this message: "We're sorry... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now."

That rather astonished me, and I tried again and got the same result. To repeat, I was logged in to Google and all I was doing was switching to my personalized search page.

This reinforces my feeling that their paranoia about spam has caused them to ratchet their filters up way too high.

outland88

5:30 pm on May 10, 2006 (gmt 0)

IT'S THEM!

arubicus

5:34 pm on May 10, 2006 (gmt 0)

"So I do think we have a mix of "fault" going on."

I do agree with this statement. It is just sad that we don't know which is which at this point. How do you fix what is broken when you don't know what is broken? Why attempt to mess around and fix a site when it may NOT be broken in the first place? Which could cause further problems and even destroy rankings in other engines who are having no problems with these same site.

Most of us are just spinning in circles. We have 0, none, zilch, NADA feedback or even clue to what is going on. Especially looking at results of some of the spam crap that show up. If there is a problem most webmasters here have yet to pinpoint it. I don't think Google even knows.

g1smd

6:46 pm on May 10, 2006 (gmt 0)

If you understand the few simple technical issues surrounding all of this then most of the fixes that you can implement are very simple. They are basic SEO and design.

Make sure that every page of the site has a unique title and meta description.

Make sure that every page of the site links back to "/" and to the main section indexes.

Make sure that all domain.com accesses are redirected to the same page in the www.domain.com version of the site.

If you have mutiple domains, then use the 301 redirect on those such that only one domain is indexed.

If you have pages that say to bots "Error. You Are Not Logged In", for example "newthread", "newreply", "editProfile" and "sendPM" links in a forum, then make sure the link has rel="nofollow" on it, and the target page has <meta name="robots" content="noindex"> on it too.

If you have a CMS, forum, or cart that has pages that could have multiple URLs, then get the script modified to put a <meta name="robots" content="noindex"> tag on all but one "version" of the page.

Use the site: search to see what you have indexed, and work to correct these issues. The presense of Supplemental Results, URL-only entries, or hitting the "repeat this search with omitted results included" message very quickly are all indications that you have stuff that needs fixing.

It is a sad fact that systems like vBulletin, PHPbb, osCommerce, and a whole range of popular scripted sites, have a large number of SEO-related design errors built in to them. The designers are clever programmers, but have no clue about SEO or how their site will interact with search engines; and the situation isn't getting any better.

Run Xenu LinkSleuth over your site, and run a few pages through [validator.w3.org ] too - just in case.

If you have done all of that, then you'll just have to wait for Google to fix whatever they have broken at their end.

outland88

7:30 pm on May 10, 2006 (gmt 0)

G1msd I have the utmost respect for your thoughts on these boards and you�ve caught things I�ve missed. Lord knows yesterday I spotted five pages of mine gone supplemental. The content took two months to develop and was extremely unique. A dozen other search engines crawled the content perfectly but somehow Google perceived it wrong.

The point is beyond the basics Google is creating an uncalled for workload. I now lecture myself now not to tell people those things that take me a couple of minutes are easy. It often took me years to be able to do so. The chief dangers though are what Arubicus brings up and what I'm always mindful of.

Google even took a simple program like Adwords and turned it into some exotic �riddle� to pay less for clicks.

g1smd

7:42 pm on May 10, 2006 (gmt 0)

Yes, Google has increased people's workload, but I still see that many sites have not done even the basic stuff.

Google is now much more fussy about spidering some types of site. Sites might have got away with it before, but I think they stand less chance now.

So, I always advise to do all the basic stuff and then evaluate where you are after that... it often works.

You are right, I now see some examples where I have no idea where the problem lies, and advise waiting until Google fixes the bug at their end.

arubicus

10:17 pm on May 10, 2006 (gmt 0)

When all of the basic stuf IS taken care of...then there is something deeper that could be any number of the 600 different aspects google looks at. When it gets that deep it is a huge risk and very time consuming to walk through each one. Sites with thousands of pages could take a long time to step through it all (while waiting to see the effects of such changes). There could be combinations of things withing a combination of pages(groups) off site possibilities etc. talling into thousands or tens of thousands of possibilities.

This is where some of us are stumped.

But yes you are right. Get the basics done first.

rbacal

12:45 am on May 11, 2006 (gmt 0)

I'd say the answer to the title question is that sometimes it's the website and sometimes it's Google. Discovering which is the case is what's driving many webmasters up the wall right now

Actually, there is a third possibility which is, for most sites, the most accurate. It is the INTERACTION between unknown variables on the site, and unknown variables and algos google uses.

With some sites, it's possible to guess if there is an onsite problem. With most sites, because the results (SERP) are the end product of 100's of variables related to the site, and hundreds re: the algo, the whole question is rather pointless, because, by and large, it's not answerable.

Even with more information, it's not knowable. Most things in complex systems (which is what we are talking about) are multi-caused, and multi-caused in non-linear and interacting ways.

That won't stop people from speculating, or thinking they know the answer. Sit back and enjoy the ride, cause you just ain't going to be able to figure out HOW the ride works. It's like a magic trick that amazes, and no matter how hard you try, you can't figure it.

In short, if you look for ONE cause, or one simple answer, you will lead yourself astray, and perhaps do the wrong thing. If you happen to do a right thing, it will be by complete accident.

Reid

1:19 am on May 11, 2006 (gmt 0)

If your pages are dropping or not being indexed then the number one thing before anything is to check that your site is not messing up googlebot.
Browsers are prety forgiving they can 'assume' when they are not sure and they can go into 'glitch mode' they can display pages fine even when they don't validate. But a bot is much more basic and if it hits an error it may just turn around and leave. It won't assume anything.
Check out your site with a lynx browser before you blame 'them' for anything.

rbacal

4:46 pm on May 11, 2006 (gmt 0)

Yes, Google has increased people's workload, but I still see that many sites have not done even the basic stuff.

I agree that one needs to do those basics, which are fairly well known. But I'm afraid that's "old world" thinking. There is absolutely NO stability, or predictability about how search engines, particularly google, index pages.

That's THE obvious lesson that many aren't quite getting. Someone can do all the basics on a new site, or modified site, have the best darned site on its topic, have good inbound links, and get absolutely nowhere. You can fiddle, tweak, listen to "experts" who really don't know what's going on (since nobody does), and end up with a large investment of time, horrible rankings, and no income.

Or, you could hit the jackpot. It's like playing slot machines now, and for people who are into that, that's fine. But trying to run a business that is completely (and I mean completely) unpredictable is not a wise choice for many people.

Stability and a somewhat predictable business environment is one of, if not the most important requirements to run a successful business long term. Right now we don't have it.

Speaking for us, until we have some stability, we're not investing much in web development beyond basic maintenance tasks. We aren't redesigning our sites (that too is a crap shoot - if we redesign do we lose whatever rankings we have left?) We aren't investing ad money, either, into adwords, and we aren't counting on adsense revenues. We'll be ready if things improve, but we're moving our business elsewhere, so to speak.

I'm through playing guessing games with google.

tedster

5:04 pm on May 11, 2006 (gmt 0)

I've been looking at a few sites recently that developed sudden problems after years of good Google results. One thing they all had in common was some kind of hidden text or other low-level spammy trick that they put in place a long time ago and seemed to "get away with".

I'm not saying that this is the full explanation for eveything we see right now -- far from it. But it is worth looking through Google's webmaster guidleines [books.google.com] one more time -- they may have just become better at enforcing some of them, or turned the dials up higher.

cbartow

5:22 pm on May 11, 2006 (gmt 0)

I'm convinced I'm having a duplicate content issue. Though this same technique worked fine on other sites, I think I'm being penalized for it right now, and I don't expect google to fix it.

I changed url's from format /foo/id/ to /foo/bar.html The old URL's generate 301's to the new ones. The old url's where all supplemental in Google, now only a few are left, but the new ones aren't making it in.

So I'm pondering returning 404's for the old URL's to see if this resolves the issue.

jadebox

5:39 pm on May 11, 2006 (gmt 0)

It is a sad fact that systems like vBulletin, PHPbb, osCommerce, and a whole range of popular scripted sites, have a large number of SEO-related design errors built in to them. The designers are clever programmers, but have no clue about SEO or how their site will interact with search engines; and the situation isn't getting any better.

I'd have to vote for "Google is at fault/broken" since we have to do all the things you list to "SEO" our sites even though Google says we should develop our sites for visitors, not search engines.

-- Roger

Play_Bach

6:57 pm on May 11, 2006 (gmt 0)

> So I'm pondering returning 404's for the old URL's to see if this resolves the issue.

I tried doing that for a couple of days, made for even less traffic! So I've put all the pages back and replaced the content on each with a memo and link to the current page. Even though they're no longer connected to the site, the old URL's were still sending a lot of visitors. Whenever Google finally gets the site re-indexed, I'll remove the old pages.

Reid

12:07 am on May 12, 2006 (gmt 0)

So I'm pondering returning 404's for the old URL's to see if this resolves the issue.

persistant 404's will hurt you in google.
Look at the HTTP status codes.
404 document not found (and we don't know why or we're just not saying why)

google interprets this as site is broken temporarily and will keep requesting the page.

410 GONE (this page no longer exists)
google has no problem with that and will remove it from the index.

what I do is put a 301 (document moved permanently) redirect on the URL until the new URL is in the index and then (after requests for the old URL peter out due to the 301) change it to a 410 for a while until 410 errors peter out, then remove the old url (410) completely.

Be careful though - during the time you have the 301 status on the old URL make sure you find any links to that URL and get them updated else bots/visitors will keep coming from those links and requesting the old URL.

cbartow

12:38 am on May 12, 2006 (gmt 0)

what I do is put a 301 (document moved permanently) redirect on the URL until the new URL is in the index and then (after requests for the old URL peter out due to the 301) change it to a 410 for a while until 410 errors peter out, then remove the old url (410) completely.

I've had this in place since early March, and all URL's point to the new ones. Google visits the new ones, but they don't end up in the Index. All the old one's are listed as supplementals or are gone.

What's interesting, is if you look at the cache for the supplemental's, many have the new meta description that I added to them for the new url's. So Google is connecting the old url's to the new one's in some manner.

minnapple

2:26 am on May 12, 2006 (gmt 0)

I opened a new site that has truely 500 pages of unique copy.

301'ed two related sites to it. Sites [A] and [B]

After 3 weeks 420 pages indexed.

Removed one of the 301's [B], two weeks later less than 200 pages indexed.

Put an outbound link on site [B], two weeks later I had around 300 pages indexed.

Put a few outbounds on sites that had no pr, but were getting crawled, and the indexed page count went up around 20 pages per site.

Reid

12:33 am on May 13, 2006 (gmt 0)

cbartow
are you sure you are using 301 and not 302?
I would double check it with a header-response viewer.
G should have the new or both listed. Maybe there is a stray inbound link somewhere that you missed?
Check your server logs and if both are being requested still by gbot (maybe you will find a referrer page in there) then put a 410 on the old one, this should not affect the new one.

mattg3

12:37 am on May 13, 2006 (gmt 0)

Now I know I am gonna go out on a limb here but I would say it depends .. :)

Reid

1:01 am on May 13, 2006 (gmt 0)

minnapple - what are your conclusions from this?
I would say that when you removed the 301 you cut a connection and caused a re-index of your site. When you added the link(s) you just caused more gbots to visit.
There are several googlebots and there are different functions.
some just collect links some spider pages some check for tos violations, and they all modify the index with the info they gather.
So maybe when the 301 was removed site b was being crawled (as what was moved) then the 301 was taken away so that crawl was cleared as (this site is now 1 page) then the link said 'hey here is an entirely new site yum over 400 pages'. Then a few more links caused a few more googlebots to join in the fun.
My guess and from reading different posts on WW is that.
1st googlebot(s) crawls site and lists links only and makes somewhat of a sitemap of the site along with a list of outbound links for each page. it then adds this info to an internal google database.
2nd googlebot(s) crawls pages listed in the database and adds them to the index, caches them etc.

trinorthlighting

12:50 pm on May 13, 2006 (gmt 0)

I do not say its anyone fault. Gooogle does not "have" to index your site. They make no guarantees.

cbartow

1:15 pm on May 13, 2006 (gmt 0)

are you sure you are using 301 and not 302?

Reid:

Positive. I did the same thing for another site and it worked like a charm.

I've checked the site multiple times for any URL's in the old format and none exist. The majority of requests go to the new URL's, but I still get a few stray requests to the old URL's. The requests are usually from a SE bot, and users for the most part go to the new URL's.

Googlebot hit up 251 of these pages in the new format again this morning, but it did this last weekend, and the weekend before that and they never made it into the index. It didn't hit any of the old pages.

This 45 message thread spans 2 pages: 45