Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google site: operator showing low number of indexed pages

         

Emmanuel2

2:56 am on Sep 15, 2010 (gmt 0)

10+ Year Member



Pretty simple problem,
I've recently overhauled a website: www.steps.org.au

Currently a "site:steps.org.au" is showing 269 pages found. When there should be (and used to be) about 900 pages. The online shop has over 800 products.

I guess that means a penalty of some sort?

To the best of my knowledge I've done the right things, but perhaps not...

Any pointers?

tedster

2:37 pm on Sep 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello Emmanuel2 and welcome to the forums

Seeing a low number of indexed pages is not a penalty - as in Google flagging your domain to artificially lower your original rankings. But it is often a sign that something isn't at all optimal with the site. Some possibilities include:

Possibility #1
The server performance is slow, and this eats up the time googlebot has to crawl.

Possibility #2
Not enough backlink strength (PR) is available or being circulated by the sites internal linking, to show that deeper pages are important.

Possibility #3
Canonical URL problems create a bot trap that causes googlebot to find duplicate content for many different URL.

You say you recently "overhauled" the website. How long ago was that? And did you change the URLs?

Emmanuel2

6:58 pm on Sep 15, 2010 (gmt 0)

10+ Year Member



I think the server performance is okay, but There are some things I want to do to speed up page load times, like minimizing MySQL queries.

Thats a good point about backlink strength, I'll work on that one.

What do you mean by Canonical URL?

I changed everything about 3 or 4 months ago now, yes It was a complete overhaul and I've used url re-writing to change all the product urls in the shop from something like this:
index.php?main_page=product&c_ID=2&p_ID=76
to this:
/Shop/product/Product-Name.html

I've got 301 redirects pointing all the old product links to the new ones.

tedster

12:38 am on Sep 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Canonical URL issues occur when there is more than a single URL that is served any given piece of content without redirecting. The one single URL you intend for that content is called the canonical URL.

There are possibly 45 factors that could contribute to a non-canonical URL - so if you are unfortunate enough to leave every vulnerability open you have the poential of 45! (that's factorial) different URLs for one bit of content.

The most well known canonical URL problem comes from resolving both the "with-www" and "without-www" URLs as 200 OK -- but that's far from the only one. Here's a thread where we dug into many of the possibilities: Canonical URL Issues - including some new ones [webmasterworld.com]

Having canonical URL issues with your website can cause googlebot to waste time when it crawls, only to find duplicate content that it will not put into the main search index. Canonical problems can also cause your backlink equity to be split into several different "buckets", even though there's only one bit of content. That can cause ranking problems.

Emmanuel2

3:25 am on Sep 16, 2010 (gmt 0)

10+ Year Member



Ah okay,
I'm sure I don't have any of those problem with my URLs.

So I don't know what the problem is...

tedster

3:58 am on Sep 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



After just a quick look:

1. According to the site: operator, you have URLs indexed with the https: protocol. I see that you do have a 301 redirect to http: so the https: versions "should" eventually fall out of the index. But something is a bit funky there. Until it gets sorted, the https: protocols will also slow down googlebot's crawl and that will limit the crawl depth for your site.

2. Your server does not strip extraneous query strings from the URL, so you have a vulnerability there.

3. Your urls are using mixed case. You will probably lose some backlinks because of other sites getting the upper/lower cases wrong and therefore serving a 404 for those links.

If you go through the possibilities cataloged in the thread I linked to - give them a test on your site - you will probably find more. This kind of thing is why I usually deploy the canonical link tag (with absolute URLs for the href attribute). I take care of the obvious canonical issues on the server and then, because all the edge cases can be hard to catch, I use the canonical link tag.

-----

However, more backlinks to your best content will do a lot to help. And one step further, the site: operator numbers are notoriously strange and lack accuracy. Many sites have recently seen a similar drop to what you reported, but their traffic continues unchanged.

So I'd suggest monitoring how many of your URLs actually get search traffic, rather than trying to understand what Google is doing with the site: operator. Memebrs here have some hunches, but no one knows for sure, as far as I know. Pages that get search traffic - that's the data that matters the most, anyway.

Emmanuel2

7:20 am on Sep 16, 2010 (gmt 0)

10+ Year Member



Okay,thanks for your pointers, just a week or two ago I modified things somewhat, and changed the https to http.

I'll try tweaking things a bit, and yes, like you say, perhaps the site: operator isn't the most accurate Google site analysis tool!

tedster

3:52 pm on Sep 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I figured the https change might be recent for you. It will take some patience on that fix before you see Google sort it all out. If you placed a lot of 301 redirects recently, then even more patience. But it looks like long term you should see improvements.

Emmanuel2

7:42 pm on Sep 16, 2010 (gmt 0)

10+ Year Member



Okay then.

Thanks so much for taking the time to answer my question. Its Greatly appreciated!