Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

WMT Sitemaps - How to see which URLs are indexed?

         

simple simon

3:36 pm on Jun 1, 2009 (gmt 0)

10+ Year Member



Hi all

I'm using WMT and a simple txt file sitemap. It shows 1400 URLs in the sitemap and 954 indexed. 954 were indexed within days of submission but it's been stuck at 954 for about 2 months...

Is there a way of seeing which URLs have NOT been indexed? I'd like to look at the non-indexed pages in case there's a theme running through them. Maybe then I could work out why they haven't been indexed and refine / add to.

I've owned the domain for years. It's not had any real content for a long time. Added (substantial) fresh content around 2 months ago. I'm not expecting loads of traffic but would like each page at least indexed cause it's all unique content.

Thanks in advance.

Simon

tedster

6:39 am on Jun 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello simple and welcome to the forums.

Well, clearly there's no direct method I know of that will extract this information from Google. So you need to begin working with two lists: all the urls that Google returns for a site: operator search and all the urls that you know exist on your site (your Sitemap urls).

Make these two lists into one big list -- and then don't just de-dupe it, remove BOTH entries for any url that appears twice. You could do this in Excel or Word, or just create a simple script on your own that works on over a text file.

You may find that your resulting list is still larger that the 450 or so urls that you might expect in your case. Sometimes the site: operator returns incomplete results and urls that actually are indexed still are not returned for site:example.com - but they may be returned for site:example.com/directoryname/ (It does happen - it's one of the Google mysteries!)

But it may be small enough to see the patterns you're looking for. Or you may need to check the remaining urls one by one in order to winnow the list even further - and yes, that may be tedious.

simple simon

8:15 am on Jun 2, 2009 (gmt 0)

10+ Year Member



Thanks Tedster

I thought it might be long winded...

Just tried site:www.domain.com with 100 results per page and copied / pasted into Excel with a view to extracting URLs using a macro. But with preferences set at 100, Google only lists around 300 URLs - much less than the number indicated when prefs are set at 10 or 20 per page.

I've done a visual through site:www.domain.com to see if any obvious ommissions - nothing obvious.

Does anyone have any experience of whether the number of indexed URLs shown in WMT is reasonably accurate to the actual number shown for a site: check on Google?

Maybe a list of the indexed URLs in WMT would be a useful add on for Google to consider.

g1smd

8:32 am on Jun 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It probably doesn't "list less". It probably only changes the little number shown at the top-right of the screen. As seen from the first page of results, that number is often only a best guess anyway,

If you click through to the last page of results, you'll likely see almost the same number of entries actually exist whatever number of results per page you look with.

Do also look for text like this at the end of the results and click that link to see what happens: In order to show you the most relevant results, we have omitted some entries very similar to the 16 already displayed. If you like, you can repeat the search with the omitted results included.

simple simon

9:14 am on Jun 2, 2009 (gmt 0)

10+ Year Member



Hi g1smd

I should clarify. I change prefs to 100 per page. It shows the same number of results 1-100 of about 1470 (this is bigger than the 1400 in the sitemap cause I only submitted my high quality content pages - Google has discovered other, what I would call lower quality, pages itself. WMT shows 954 indexed from sitemap of 1400 URLs)

At the bottom of the page it only shows links for 1-2-3 pages. I get to page 3 and click on the omitted results link. Google still only shows me 3 pages of results and has changed to Results 201-280 of about 280.

Maybe 2 months is not long enough to wait for my sitemap to be fully indexed. Or maybe my PR (3 after recent update) or trust level is just not up to listing all those URLs just yet.

As an aside, I've also recently submitted 3 more txt file sitemaps that relate to other parts of the site. I've kept these 3 areas separate from the main sitemap cause the content is image related (galleries) or more likely to be considered dups (affiliate links etc) or low value (little content or syndicated content) to see the proportion of indexing by Google. I'm guessing the take up will be low because of the dup issues etc.

Anyway - thanks tedster and g1smd. I've followed your (MANY!) posts for a while and really appreciate your feedback!