| 11:05 pm on Mar 15, 2013 (gmt 0)|
For starters: Pick a few of the tastiest pages and submit them manually. Once the googlebot has crawled them and noticed the absence of a "noindex" tag, it will not be able to stop itself from crawling all linked pages.
General question that is relevant here:
Once the googlebot has picked up a non-304 response and passed its findings along to the indexing computer, can Google distinguish among different amounts of change? Change one letter and it's no longer 304. Delete a "noindex" line and you're no longer a 304. But can they tell when there has been a substantive change?
I'm thinking that if they can tell the difference, then they would notice if there have been major changes. And this in turn would trigger a jump-up in crawling, almost as if it were a brand-new site.
| 3:51 am on Mar 16, 2013 (gmt 0)|
Just a picky little language point here about the difference between crawling and indexing. They are indeed to separate steps and handled by different programs/software at Google.
Googlebot does the crawling and collects the data from the URLS - then indexing is done by another program that examines the collected data. You would only need to look at your server logs to see if googlebot has crawled the problem URLs since you changed the meta tag. It can take several re-crawls after a change before the URL shows up the index again.
| 5:36 am on Mar 16, 2013 (gmt 0)|
|Just a picky little language point here about the difference between crawling and indexing. They are indeed to separate steps |
:: sitting on hands ::
In this case, the two are effectively the same, because the crawling tentacle takes its orders from the indexing tentacle. Or they both take orders from a higher-ranking tentacle. And that's what I wondered about: If the indexer-- or some still Deeper Thought-- notices significant changes, will it tell the crawler to crank up its activity level?
| 6:24 am on Mar 16, 2013 (gmt 0)|
First, try and get one or two good quality links.
Next, head over to WMT, use Submit to Index option (under Fetch as Google). It ensures crawling. Indexing, though not assured, is only inevitable.
| 12:07 pm on Mar 16, 2013 (gmt 0)|
Thank you for the tips guys!
Here is more information: I have a custom logging module that records every page request and most of the requests that are coming from Googlebot are 404 pages (I requested removal of these pages from Google index and the pages were removed around December last year). I know a little bit about how Google uses two sets of programs to crawl and index pages. And I guess that takes time but 3 months is quite a long time for Google not to re-crawl and try to index all my pages. I have made substantial changes on these pages. I only let Google index pages that have decent amount of content to avoid panda.
As for backlinks, I have contacted some colleges in my niche and managed to get 5 links. But the links are either page rank 2 or they are on pages with no page rank. I have also managed to get one site wide link on a hobby blog with page rank 1.
My website has close to 10,000 pages and I'm blocking more than 9,500 of them by robots.txt and meta tag.
| 12:20 pm on Mar 16, 2013 (gmt 0)|
You could also try posting some pages to social media (Twitter, Facebook and particularly Google+)
| 4:47 pm on Mar 16, 2013 (gmt 0)|
|I have also managed to get one site wide link on a hobby blog with page rank 1. |
Sitewides are worthless at best, possibly (probably these days?) harmful.
And don't be too concerned of the PageRank of linking pages. A better measure is what pages link to them.
A couple of simple-stupid questions (from the we've all done stupid things school of webmastering):
--Did you remove the noindex meta from the pages you want indexed?
--Are the pages reachable through your site navigation?
--Are page titles and descriptions (if used) different?
--Have you considered an XML sitemap?
| 11:20 pm on Mar 16, 2013 (gmt 0)|
I have created Google+ account but I didn't use it. I have almost very little audience there. May be I can use it for this purpose. Thanks for that.
|--Did you remove the noindex meta from the pages you want indexed? |
|--Are the pages reachable through your site navigation? |
Yes, but they are two and three pages away from homepage
|--Are page titles and descriptions (if used) different? |
My titles are different but share common key phrases a lot. Some of them varies only by local modifiers and some numbers. I don't have description for most of these pages. I'm hopping Google will figure that out on it own.
|--Have you considered an XML sitemap? |
Yes I have XML sitemap and I submitted it to Google a number of times. I see the exact number of URLs submitted in my GWT account. According to the stat on my account only 50% of the URLs submitted are indexed but actually only 70% of whats reported on GWT are indexed.
| 7:38 am on Mar 18, 2013 (gmt 0)|
There should be a reason why bod reindexed your website. it should be bad content problem.
We had the same issue.
We changed sub pages links, added some new texts and bot again started to index
| 1:29 pm on Mar 18, 2013 (gmt 0)|
I'm having a problem understanding this:
|According to the stat on my account only 50% of the URLs submitted are indexed but actually only 70% of whats reported on GWT are indexed. |
What exactly does this mean?
| 4:58 pm on Mar 18, 2013 (gmt 0)|
@jimbeetle Sorry English is not my first language. Sometime I write funny statements :)
What I was trying to say was I submitted about 400 pages via XML sitemap to GWT. Out of these 400 pages GWT says that about 200 pages are now indexed but when I search Google like this site:example.com I only see 140 pages being indexed.
I submitted 8 category pages through GWT on March 16 (these are the pages that I removed previously via GWT) and today they are all indexed. Thanks for the tip @McMohan!
| 6:17 pm on Mar 18, 2013 (gmt 0)|
|English is not my first language |
You should see some of the stuff on this board written by people whose first language is English ;)
The site: operator can be (and usually is) very erratic, I would disregard it and go by the webmaster tools number.