Welcome to WebmasterWorld Guest from 23.20.184.141

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Help with "Not Selected" please?

     
2:17 pm on Nov 25, 2012 (gmt 0)

New User

joined:Nov 25, 2012
posts: 5
votes: 0


Hi everyone, long time lurker here, just signed up. This is a great community, congratulations to everyone involved.

I am the webmaster for a 10 year old website, it has approximately 250.000 unique pages but over 1.2 million not selected url's.

Can you please point me to how to discover what these url's are? Why doesn't GWT tell us which url's they are? I would like to fix them, to improve googlebot's crawl of that website, I think it'd be helpful if they let us know what URl's they were! Can you experts give me any hints on how to find out what url's are not selected so we can fix them?

This site is 100% in-house, no php forums or wordpress installations. In fact there is a blog but it's hosted outside the main domain, so it's probably not listed there.

Thanks again and best wishes to all.
3:25 am on Nov 26, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10553
votes: 13


welcome to WebmasterWorld, johnsirella!

you could get a list of urls crawled by googlebot from your server access log file and compare that to urls reported in GWT.
11:14 am on Nov 26, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 16, 2004
posts:1341
votes: 0


You could use something like Xenu and see what it uncovers...

[download.cnet.com...]
1:16 am on Nov 27, 2012 (gmt 0)

New User

joined:Nov 25, 2012
posts: 5
votes: 0


Thanks so much for your replies.

phranque: thanks for the welcome, glad to be here!

You mention urls reported on GWT, where do I look for that? I can certainly get report on the log googlebot hits, but how do I cross that with some GWT data?

lexipixel: thanks for that, had never heard of it. is it safe to use against sites?
1:50 am on Nov 27, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10553
votes: 13


you can get some urls from the Traffic/Search Queries/Top Pages list but i'm not sure what the upper limit is on that - i doubt they will show you 250,000 urls.

are you using sitemaps?

it's really a matter of getting a list of "good" or canonical urls and comparing those to what googlebot is crawling.


Xenu Link Sleuth is a good tool for crawling your site.
you might also trying Screaming Frog but the free version will only crawl a limited number of urls.
10:57 am on Dec 3, 2012 (gmt 0)

New User

joined:Nov 25, 2012
posts: 5
votes: 0


Hi everyone. Still dwelling with the "Not Selected" dilemma here.

Following the advice I got here I am studying the logs to see what exactly Googlebot is pulling from the server, so as to try and identify the source of the 1.2 million unselected pages(in a site with 250k to 300k max unique pages). Still no success but I'm not giving up yet.

Here's a question for those of you with more experience: do the Not Selected URL's ever decline, or is that a cumulative number that only grows?

If I happen to fix whatever is wrong, will that graph decline? Or will it always list "maximum not selected"? It seems to me that no matter what I do, every week a few thousand URLs are added to it!? If I fix whatever is wrong, will it decline? Sorry if this is a dumb question.

Thanks in advance for your wisdom.

[edited by: tedster at 5:33 pm (utc) on Dec 3, 2012]

6:00 pm on Dec 3, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


do the Not Selected URL's ever decline, or is that a cumulative number that only grows?

The only way they should decline is if the URLs get indexed.

I would very definitely not worry about them, except that the constant increase may indicate some URLs are being generated that shouldn't be, so I might try to find the source of that, but the not selected number itself is not something I would lose any sleep over at all, personally.
6:54 pm on Dec 3, 2012 (gmt 0)

Full Member

joined:Oct 29, 2012
posts:292
votes: 19


I have a wordpress install that somehow generated tons of automatically generated URLs that were "not indexed" by google. I still cannot find the source of the bug but I have now added that string of automatically generated URL's to robots.txt as disallow just past weekend. Now those pages appear with "A description for this result is not available because of this site's robots.txt"

The gibberish url is something like ?gibberish/page2/page3/page2 and continues on and on. And for some reason my wordpress install recognizes it as a valid URL with robots tag = index and everything. Although they are "not indexed" by google (supplemental index) because they have exactly the same content as my archive pages. I cannot seem to remove the gibberish generated pages that because it is out of my capability.

It will take a while to see if Google recognizes it as a bug and remove those accordingly. I just hope. I will report if my "not selected" count go down in the future. Or at least if it will stop rising.

I do think that you may have to worry if the "not indexed" count continue to rise, it may be a bug with any of your code that generated and feed gibberish url to google.
7:35 pm on Dec 3, 2012 (gmt 0)

New User

joined:Jan 9, 2012
posts:15
votes: 0


A couple weeks ago, we removed a big chunk of URLs from the site. Since the removal, the number of "Not Selected" URLs has plummeted from about 15,000,000 to about 1,000,000. I'm not certain what it was on those pages that was causing the issue, but I know those pages were the source.

We have not seen any change in our organic traffic since, but this change was made the day of the last Panda refresh. Fingers crossed.
9:00 pm on Dec 3, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 5, 2009
posts:1344
votes: 165


I got a bit panicky about this not selected issue. I heard of it, dug around a bit and I consider ultimately worthless information on troubleshooting possible issues. At first I was hopeful it would provide insight into possible Panda etc issues, but at the end of the day I consider it a red herring. If your head hurts, you have a headache. If you tell me you have a headache, I can tell you your head hurts. That's not helpful information I just gave you. That's my view of "not selected". If I missed something during my investigation, I'm all ears. Just my experience with this says it's not something I'm analyzing further or even checking for that matter. Again, just one geeks opinion for what it's worth (or not worth).
10:29 am on Dec 4, 2012 (gmt 0)

Junior Member

joined:Mar 9, 2012
posts: 87
votes: 13


My situation was similar to frankleeceo except that it was not a wordpress blog but a static html site. Somehow Google and other bots created 1,000,000 rubbish URLs out of my 1600 pages that my server recognized as valid. My "not selected" line spiked and my rankings dropped. I wouldn't say they tanked. Once the htaccess file was reconfigured to make the server return 410s, the "not selected" line stopped rising. Now, it's dropping by about 10 pages a week. Rankings have still not returned.
1:17 pm on Dec 4, 2012 (gmt 0)

Moderator This Forum from GB 

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2599
votes: 179


The only way they should decline is if the URLs get indexed.

And I think there are few other ways for this number to decline. Here is my experience:

- If you return 404/410 for "Not selected" page, the number will decline
- If the "Not selected" page gets indexed, the "Not selected" number will decline (as TMS said)
- If you redirect "Not Selected" pages, it will NOT decline
- If you noindex page previously indexed, it will increase
- If you noindex page that is "Not selected", the "Not selected will stay the same
- I am guessing that if the page is blocked by robots, the "Not selected" should decrease
- I am not sure what happens if the page has canonical link element set to point to another page, but if it is treated the same as Redirects, then this will not have an impact on "Not Selected"

I wish google would break "Not Selected" into two buckets, as I would like to see a number for "the page not blocked in robots, not redirecting, not noindexed, does not have canonical pointing elsewhere, but we have not selected it because we do not like it". That would be a really useful figure.
4:17 pm on Dec 4, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Ah, I thought it was all URLs, but 404/410 & NoIndex/robots.txt might do it.

AFAIK redirecting and some of the others should not ... (I'm fairly certain redirecting will actually increase the number.) ... I'd have to look into in some more to know for sure, but I really don't have time to spend on something I don't care much about right now lol.
1:49 am on Dec 5, 2012 (gmt 0)

Moderator This Forum from GB 

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2599
votes: 179


According to Google article, the "Not selected" includes 301 redirects so it will certainly not decrease the number and I agree, in some cases may increase it.

So theoretically, if you redirect one of "not selected" URLs to existing "indexed" URL, there should be no change to the number of "Not selected".

But if you introduce new URL structure and then redirect a "not selected" URL to a new URL then if new URL ends up also being "Not selected" then "Not selected" would increase.

I am pretty certain about robots noindex pages increasing "Not selected" pot as I saw this 6 months ago when I dropped 6000 pages by noindexing them from one of sites - the "indexed" went down and "Not selected" went up in parallel - the graph was symetric.

I am also pretty certain about 404/410 reducing "Not selected" as I am in the process of sorting out the mess of 80,000 "Not selected" URLs on the site with 8000 pages indexed but with only 1500 unique pages worth indexing (huge number of duplication owing to dates in URL, capitalisation, parameters order and other classic URL mistakes). We are redirecting only about 2000 URLs and letting all others go 404/410.

The new URL structure went live last week and Google has already dropped 1K URLs from "Not Selected" count.
7:05 am on Dec 5, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13210
votes: 347


the page not blocked in robots, not redirecting, not noindexed, does not have canonical pointing elsewhere, but we have not selected it because we do not like it

"blocked by robots.txt" is already a separate category. Everything else ... yah. At a minimum, there's a difference between pages that can't be indexed (noindex meta, redirect) and pages that could be indexed but aren't ("I dunno, there's just something about this page we don't like").
8:18 pm on Dec 5, 2012 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 11, 2003
posts: 130
votes: 0


Anyone see any over optimization penalties in the past 2 days? Is this something that happens randomly to sites, or do many see it happen at once?
12:02 pm on Dec 12, 2012 (gmt 0)

New User

joined:Nov 25, 2012
posts: 5
votes: 0


Hello everyone, I was on the road and could not log(yes I'm old style, I only log in from my PC!) in to thank you all for your help. I still have to digest all this, but my questions have been answered: the "not selected" nr. is not cumulative, that is, if we make the correct changes, it WILL go down.

Thanks so much for the help, appreciate it.