Forum Moderators: open
I was checking my site (PR 6) on the www-sj.google.com server and noticed that about 3000 pages (out of 30,000) have been dropped by google (which will get reflected for this next update, once the dance is over). I tried but I am simply not able to debug manually which pages have been dropped.
Is there a way to get a list of URLs (from a site) which are indexed by Google. I wish to compare the current normal server with the -sj server, and with that I will be able to debug why those pages were removed. Could be some fault in the my index pages.
I don't think it could be any penalty, as I am a very honest webmaster, and haven't used any wrong techniques. (to my knowledge).
Or could it be that it didn't deepcrawl enough?
My site was down for 3 hours during the deepcrawl. Do you think a period of 3 hours could cause a drop of about 3000 pages?
IMaster
site:www.mydomain.com -blablabla
This will show all indexed pages from your site (even those not yet spidered) that don't contain the string 'blablabla'. Google will only show 1000 results, so that won't work for a site with 30,000 pages. To get a better understanding about the missing pages, you have limit the search. If 5% of the pages are about green widgets, search for
site:www.mydomain.com "green widgets"
and you will only see the 150 pages (5% of 30,000) on www.google.com and maybe 135 pages (5% of '30,000-3,000') on www-sj.google.com. To make it easier, you could change Google to display 100 pages per SERP instead of the default 10 pages (www.google.com >> Preferences >> Number of Results).
Wow, you are very Brilliant, and equally helpful. :)
After spending a lot of time analyzing (using your superb tips), I was able to exactly pinpoint some of the pages which are not being shown on the -sj dataserver. I will keep an eye on those pages, and see if they are shown on the normal server after the dance is over. (with fingers crossed)
You said "don't pay to much attention to the current data in www-sj.google.com".
How does this -sj server work. I would love to know more, and would appreciate if you could answer these below mentioned questions.
- So what are the odds that those missing pages might come after the dance. ;)
- Why do you think some pages are not shown on the -sj server? Doesn't -sj contain the old data, too?
- If those pages are not being shown on -sj, does it mean that they have been dropped by Google, or does it mean that GoogleBot is yet to crawl those pages? Do you thing GoogleBot would be crawling all those missing 3,000 pages at this very moment, and then they will again start reflecting after the dance?
Thanks.
Internet Master (IMaster)
Why do you think some pages are not shown on the -sj server? Doesn't -sj contain the old data, too?
So what are the odds that those missing pages might come after the dance?
Do you thing GoogleBot would be crawling all those missing 3,000 pages at this very moment, and then they will again start reflecting after the dance?