homepage Welcome to WebmasterWorld Guest from 54.237.213.31
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
How to encourage Googlebot to recrawl deindexed pages?
dHex



 
Msg#: 4555454 posted 8:39 pm on Mar 15, 2013 (gmt 0)

I have a custom CMS for my new website (which has been live since September last year). I launched the website while the CMS was in alpha stage and a lot of pages were indexed by Google that have very little content or duplicate content. I removed most of these pages through webmaster tools and added noindex meta tag till I get the pages ready.

Now the pages are ready to be included back into Google's index but Google is not indexing them. How can I encourage Googlebot to index these pages?
.

[edited by: Robert_Charlton at 8:48 pm (utc) on Mar 15, 2013]
[edit reason] fixed typo [/edit]

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4555454 posted 11:05 pm on Mar 15, 2013 (gmt 0)

For starters: Pick a few of the tastiest pages and submit them manually. Once the googlebot has crawled them and noticed the absence of a "noindex" tag, it will not be able to stop itself from crawling all linked pages.

General question that is relevant here:

Once the googlebot has picked up a non-304 response and passed its findings along to the indexing computer, can Google distinguish among different amounts of change? Change one letter and it's no longer 304. Delete a "noindex" line and you're no longer a 304. But can they tell when there has been a substantive change?

I'm thinking that if they can tell the difference, then they would notice if there have been major changes. And this in turn would trigger a jump-up in crawling, almost as if it were a brand-new site.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4555454 posted 3:51 am on Mar 16, 2013 (gmt 0)

Just a picky little language point here about the difference between crawling and indexing. They are indeed to separate steps and handled by different programs/software at Google.

Googlebot does the crawling and collects the data from the URLS - then indexing is done by another program that examines the collected data. You would only need to look at your server logs to see if googlebot has crawled the problem URLs since you changed the meta tag. It can take several re-crawls after a change before the URL shows up the index again.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4555454 posted 5:36 am on Mar 16, 2013 (gmt 0)

Just a picky little language point here about the difference between crawling and indexing. They are indeed to separate steps

:: sitting on hands ::

In this case, the two are effectively the same, because the crawling tentacle takes its orders from the indexing tentacle. Or they both take orders from a higher-ranking tentacle. And that's what I wondered about: If the indexer-- or some still Deeper Thought-- notices significant changes, will it tell the crawler to crank up its activity level?

McMohan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4555454 posted 6:24 am on Mar 16, 2013 (gmt 0)

dHex -
First, try and get one or two good quality links.
Next, head over to WMT, use Submit to Index option (under Fetch as Google). It ensures crawling. Indexing, though not assured, is only inevitable.

dHex



 
Msg#: 4555454 posted 12:07 pm on Mar 16, 2013 (gmt 0)

Thank you for the tips guys!

Here is more information: I have a custom logging module that records every page request and most of the requests that are coming from Googlebot are 404 pages (I requested removal of these pages from Google index and the pages were removed around December last year). I know a little bit about how Google uses two sets of programs to crawl and index pages. And I guess that takes time but 3 months is quite a long time for Google not to re-crawl and try to index all my pages. I have made substantial changes on these pages. I only let Google index pages that have decent amount of content to avoid panda.

As for backlinks, I have contacted some colleges in my niche and managed to get 5 links. But the links are either page rank 2 or they are on pages with no page rank. I have also managed to get one site wide link on a hobby blog with page rank 1.

My website has close to 10,000 pages and I'm blocking more than 9,500 of them by robots.txt and meta tag.

netmeg

WebmasterWorld Senior Member netmeg us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4555454 posted 12:20 pm on Mar 16, 2013 (gmt 0)

You could also try posting some pages to social media (Twitter, Facebook and particularly Google+)

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4555454 posted 4:47 pm on Mar 16, 2013 (gmt 0)

I have also managed to get one site wide link on a hobby blog with page rank 1.

Sitewides are worthless at best, possibly (probably these days?) harmful.

And don't be too concerned of the PageRank of linking pages. A better measure is what pages link to them.

A couple of simple-stupid questions (from the we've all done stupid things school of webmastering):

--Did you remove the noindex meta from the pages you want indexed?
--Are the pages reachable through your site navigation?
--Are page titles and descriptions (if used) different?
--Have you considered an XML sitemap?

dHex



 
Msg#: 4555454 posted 11:20 pm on Mar 16, 2013 (gmt 0)

@netmeg

I have created Google+ account but I didn't use it. I have almost very little audience there. May be I can use it for this purpose. Thanks for that.

@jimbeetle

--Did you remove the noindex meta from the pages you want indexed?

Yes

--Are the pages reachable through your site navigation?


Yes, but they are two and three pages away from homepage

--Are page titles and descriptions (if used) different?


My titles are different but share common key phrases a lot. Some of them varies only by local modifiers and some numbers. I don't have description for most of these pages. I'm hopping Google will figure that out on it own.

--Have you considered an XML sitemap?


Yes I have XML sitemap and I submitted it to Google a number of times. I see the exact number of URLs submitted in my GWT account. According to the stat on my account only 50% of the URLs submitted are indexed but actually only 70% of whats reported on GWT are indexed.

sbook



 
Msg#: 4555454 posted 7:38 am on Mar 18, 2013 (gmt 0)

There should be a reason why bod reindexed your website. it should be bad content problem.
We had the same issue.
We changed sub pages links, added some new texts and bot again started to index

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4555454 posted 1:29 pm on Mar 18, 2013 (gmt 0)

I'm having a problem understanding this:

According to the stat on my account only 50% of the URLs submitted are indexed but actually only 70% of whats reported on GWT are indexed.


What exactly does this mean?

dHex



 
Msg#: 4555454 posted 4:58 pm on Mar 18, 2013 (gmt 0)

@jimbeetle Sorry English is not my first language. Sometime I write funny statements :)

What I was trying to say was I submitted about 400 pages via XML sitemap to GWT. Out of these 400 pages GWT says that about 200 pages are now indexed but when I search Google like this site:example.com I only see 140 pages being indexed.
---------

I submitted 8 category pages through GWT on March 16 (these are the pages that I removed previously via GWT) and today they are all indexed. Thanks for the tip @McMohan!

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4555454 posted 6:17 pm on Mar 18, 2013 (gmt 0)

English is not my first language

You should see some of the stuff on this board written by people whose first language is English ;)

The site: operator can be (and usually is) very erratic, I would disregard it and go by the webmaster tools number.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved