Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot refuses to crawl new pages

It instead continues searching for old pages

         

gregdi

7:20 pm on Jan 9, 2006 (gmt 0)

10+ Year Member



About a month ago I did a complete overhaul of a portion of my site, deleting several hundred old product pages and replacing them with new pages with different filenames. The old pages had the NOINDEX NOFOLLOW meta in them for approximately 2 months before they were removed.

Instead of crawling the new pages, which are reachable from the same home page, googlebot continues to attempt to crawl the nonexistent pages. It hasn't crawled a single one of the new pages.

The new pages are all .php pages with a single query at the end. They look like:

mysite.tld/product.php?page=1

I don't think the single query parameter should have an effect on getting these pages crawled. I see other pages all the time with many more parameters in their query string in Google's index.

Help? Ideas? Suggestions?

Vadim

7:34 am on Jan 11, 2006 (gmt 0)

10+ Year Member



Disclaimer: only Google knows how it works.

1. It may bee too late, but I believe that the deleting many old pages was not a good idea. I would say that deleting even a single page is not a good idea.

What if people still link to these pages? What if Google still see these links? Could Google consider your site as reliable and authoritative if you neglect the people who links to you? If Google still crawls old pages I would return them back.

2. Google crawls the pages that have inbound links. It of course follows the links from you own pages, but only if they have (directly or not) external inbound links and some PR. However, in your case, Google might see the remaining pages as the new site and as with many new sites you may experience the sandbox.

Vadim.

gregdi

7:03 pm on Jan 11, 2006 (gmt 0)

10+ Year Member



If I follow what you are saying correctly, that would mean I have to keep pages for products I no longer carry just to satisfy Google. I don't think there is a good reason for keeping useless pages that no longer have any value for visitors to my site. Who wants to visit a page about red widgets only to find out they are no longer available? As I stated in my first post, I put the NOINDEX NOFOLLOW meta in the head of these pages about 2 months before I deleted them.

My site is regularly crawled by MSN and Google, and MSN had no problem removing every one of these pages from their index.

I believe Google should be able to do the same. Instead, they make everything complicated and confusing for site owners. Haven't they ever heard of the KISS principle?

Keep
It
Simple
Stupid

tedster

2:03 am on Jan 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First, I would make 100% sure your server is returning a 404 header for these urls and there are no old links pointing to those urls. At that point, unless those pages are being returned on some search or other, then I don't think you have a problem to be concerned with in regards to the old pages. It's just Google being a be tardy in their housekeeping. Their "supplemental results" have all kinds of debris.

Also verify that you've got good, valid links to the new pages, to stimulate googlebot to find them. Some deep linking to at least one new page from elsewhere on the web can help if you can swing that.

Pradeep

2:14 pm on Jan 12, 2006 (gmt 0)

10+ Year Member



Hi All,

Would you be able to help me as i have a similer problem.

Google has become extremely unpredictable these days both of my site has PR 2 and now its showing zero and my site age is around 6 to 8 months still both of them are not completely crawled.
Can anybody help me in knowing the reason? At the same time PR has completely reduced

I have another problem in which it says File Format: Unrecognized - View as HTML CAN ANYBODY TELL ME THE REASON BEHIND THIS.

I expect the reason for both of my quires and solutions if any

Thanks in advance

Cheers

Pradeep SV

Vadim

8:32 am on Jan 15, 2006 (gmt 0)

10+ Year Member



If I follow what you are saying correctly, that would mean I have to keep pages for products I no longer carry just to satisfy Google.

Rather to satisfy W3C guidelines. Google naturally follows them.
Read "Cool URIs don't change"
[w3.org...]

All public (indexed) links (URL, URI) should be stable and never change. It means that you should rather have the link to a class of the product or reuse specific old product link for similar new product. If you absolutely must have the temporarily URL to a product it should be NOINDEX from the start.

Vadim.

hawleyca77

9:27 pm on Jan 26, 2006 (gmt 0)

10+ Year Member



I am having the same problem with one of my sites. After redesigning the site, Google continues to crawl only the old pages.

For a while, I left these as 404 error pages, with no change (couple months). Then, I changed them to 301 redirects. (3-4 months).

I was thinking of having the URLs moved back to 404s once the site was indexed and see if eventually they drop off. However, the site continues to not be indexed. I am worried that if those files go away, I won't have any pages coming up in the serps and think it is better for users to get a redirect from htaccess than to find a non-existent page.

I have done all the standard SEO bits, but no luck. The site also has a fair amount of backlinks (most of which Google doesn't recognize) and content is updated regularly.

Suggestions?

Vadim

4:29 am on Jan 27, 2006 (gmt 0)

10+ Year Member



As I mentioned above I believe that to delete the pages is not a good idea.

However since you have already made 301 redirect, I would stick to it because 301 means "moved *permanently*". Do not be irresponsible in the eyes of Google.

Than I would place on the target (after 301) pages the content as close to the deleted pages as possible, or at least the topic should be the same. Than I would make some links from the redirected pages to the new part of your site.

Vadim