homepage Welcome to WebmasterWorld Guest from 54.227.40.166
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

    
WT "not selected" URLs number higher than "ever crawled"
MinosTheNinth




msg:4493186
 9:27 am on Sep 10, 2012 (gmt 0)

Hi, I recently noticed strange behaviour on one of my sites.

Site is running Wordpress installation and on 8/26/12 it was moved to different hosting. Recently number of "ever crawled" and "not selected" URL in Google Webmaster Tools started to grow very fast.

On 7/22/12 there was:
- 753 ever crawled
- 606 not selected

On 8/26/12 there was:
- 5.686 ever crawled
- 5.499 not selected

On 9/9/12 there was:
- 5.686 ever crawled
- 10.404 not selected

So the questions are:

How can be "not selected" higher than "ever crawled"?

Does it indicate problem with my site? How can I find it's source and make corrections?

Any help or suggestions is very appreciated.

 

g1smd




msg:4493195
 10:47 am on Sep 10, 2012 (gmt 0)

Google has seen links pointing to 10 000 URLs, but has only crawled 5600 of them.

If a significant number of crawled URLs are 404, soft 404, redirects {maybe}, or other "non-pages", then crawling is throttled back so as to not waste their crawl budget.

phranque




msg:4493202
 11:32 am on Sep 10, 2012 (gmt 0)

i'm going with g1smd's answer.

i would start analyzing the urls crawled by googlebot and look for the responses given to non-canonical urls.
if the status code is a 302 or a 200 that's your problem.

MinosTheNinth




msg:4493252
 1:46 pm on Sep 10, 2012 (gmt 0)

Thanks a lot guys. Hope i found source of the problem.

It seems, that calendar plugin messed with the URLs with adding ?month=xxx&yr=xxxx to almost every URL. When i switched to GWT to add this parameter as it does not affect displayed data I found that it is already here with option "Let googlebot decide" and 10.270 monitored URLs. So i changed it to option "No: Doesn't affect page content" (just to be sure).

Thanks you both for you very quick answer and help. I'll try contact developer of this plugin, and report this issue.

phranque




msg:4493269
 2:17 pm on Sep 10, 2012 (gmt 0)

a calendar plugin is a typical source of infinite url space.

g1smd




msg:4493294
 2:40 pm on Sep 10, 2012 (gmt 0)

I have no idea why calendering systems don't limit the date range that is accessible and don't return 404 for empty dates. Almost all of them seem to suffer from this flaw.

[edited by: g1smd at 2:50 pm (utc) on Sep 10, 2012]

MinosTheNinth




msg:4493300
 2:48 pm on Sep 10, 2012 (gmt 0)

Problem is, that it appends month and year selection parameter even to posts completely unrelated to calendar.

I have no idea, how crawler find these URLs, but i discovered it with use unix command line utility called webcheck (btw very nice utility).

g1smd




msg:4493302
 2:50 pm on Sep 10, 2012 (gmt 0)

That's even worse.

phranque




msg:4493324
 3:26 pm on Sep 10, 2012 (gmt 0)

i've seen one site where any page could have any date, past or future, appended to the already non-canonical query string.
every page started out with links to all of "this month's" dates, whatever month that happened to be at the time the page was requested, and links to the next and previous months.

lucy24




msg:4493382
 4:44 pm on Sep 10, 2012 (gmt 0)

It's easier* to handle invalid queries after the fact than to prevent them from being added in the first place. Especially when people or googlebots can add anything they like to their address bar. So no matter what, you always need a bad-query-handling routine.


* Not "better" or "more desirable". Just easier.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved