You can create your own 404 handler that will redirect the broken links to your home page or to the relevant page.
Don't wait for google to improve, do it on your side.
There are server codes which you can send with in the server header information which is sent with your web pages. A 404 means "Not Found" as I'm sure you're aware.
Perhaps a better code would be 301 - "Moved Permanently", which instructs any user agent to forget the old URL, and use only the new URL to that resource.
In a .htaccess file on Apache, you can set up your 301 headers like this:
RedirectMatch permanent /somefolder/anotherfolder/index\.html http://www.example.com/short/index.html
The RedirectMatch set as "permanent" will result in a 301 being sent. Note the period in the matched string is escaped, and then the new URL is a full URL with no escaped characters.
This isn't the only way of sending 301 codes and rewriting URLs but it's an easy method. With 20,000 pages you might want to look deeper into mod_rewrite.
when I wrote 404 handler I meant a page that will smoothly redirect the user to the right place. indeed 302 it is.
You should take all the 404 erros from the broken links, catch them and convert them to 302 to a page that is good for you.
My experience is that 302 (temporary) will leave the old urls in the Google index indefinitely, but indexed with the content of the new url. 301 is a permanent redirect that at least stands a chance of working.
Sad to tell you this trraju, but this is but one of the crazy things that are happening with Google right now.
The only thing that you can really do is just implement what some of the other WW members are saying, then hope for the best.
This is just great and awsome.. atleast you know how to solve it..
I am now just confused between 302 and 404 and 301 and other numbers, hence...
I have few more questions?
1)with 20000 wrong urls pages,what would be the best way to redirect all of them to my home page or respected pages?
2)Is it bad for google bot and can they ban me for this?
3)I am very bad at coding, can some one help me with the codes i need to add.
I am happy knowing that i am close to solution, however i will be really more than glad,if you could help me solve the problem.
Just wanted to add the responce code i am getting is 404
|Sad to tell you this trraju, but this is but one of the crazy things that are happening with Google right now. |
You wanna crazy? I have a site that I deleted its pages a year ago and the pages are still shown in the index for the site command. Maybe they are driving traffic also behind my back.
Have you tried Google sitemaps? I have had good luck with being undex or reindex using Google sitemaps.
I also have the magic number of 20,000 pages.
But when you look at them, they are all supplemental and cached back in Aug of last year.
I too have a similar problem. No where to be found in Google.
In the past two weeks I have tried using sitemaps, but not luck.
Perhaps I need to try doing this redirect also.
I am still baffeled by the thousands of good content pages that you all say you have. How many average google searchers look at all 20,000 pages? I would be not many at all. I take it you are selling a lot of products which is great. But if you are selling widgets then does google necessarily have to index every type of widget? Blue widgets, XL widgets, Pink and purple widgets, etc.... As long as you rank good for the serp keyword widget then does google have to index all your pages?
The strange part is non of the pages from my 20000 are supplement and all are content based pages..
God knows how i will repair it.. still looking for right solution.. please help..
>>>As long as you rank good for the serp keyword widget then does google have to index all your pages? <<<<
The problem is that if a site does not have a direct menu link from it's front, or landing, page then the surfer will have a hard time finding that particular information or item he is looking for.
Even with the link, it puts the surfer at least two clicks from google to his goal. Bad boogie.
|1)with 20000 wrong urls pages,what would be the best way to redirect all of them to my home page or respected pages? |
For Apache: mod_rewrite
For Windows: ISAPI_Rewrite
You'll need to be familiar with writing regular expressions and pattern matching. If you are not, your best option is to locate a professional to assist you in setting this up. You will need to permanently redirect (301) each and every one of those 20,000 pages to their new URIs (or at least most of them anyway).
|2)Is it bad for google bot and can they ban me for this? |
Bad for Googlebot? Yes, as there is now duplicate content being indexed. The longer it is up there, the longer it is going to take to undo what was done. And yes, Google may purge your site from the index temporarily while it sorts things out. How long that will take is all relative to everything else going on. It could be a long time.
|3)I am very bad at coding, can some one help me with the codes i need to add. |
Hiring someone with experience in this area would be suggested. We do have a Commercial Exchange Forum here at WebmasterWorld where you can post your requirements.
Commercial Exchange Forum
|As long as you rank good for the serp keyword widget then does google have to index all your pages? |
That is the goal of developing and promoting a website.
Well, instead of one site with 20,000 products how about 20 sites with 1000 products. Get very relevant small sites and cross link them.
If your searchers can not find what they are looking for from the home page within 3 clicks, then the site is very user unfriendly!
No wonder why google is not indexing you. That is stated plainly in their guidelines.
>>>>If your searchers can not find what they are looking for from the home page within 3 clicks, then the site is very user unfriendly!
No wonder why google is not indexing you. That is stated plainly in their guidelines. <<<<
Who said 3 clicks? We were talking two. And for large sites, there is often need for 3 clicks.
And I can find nowhere in the guidelines that Google says only two clicks.
Up until Big Daddy google was clearly indexing all pages. It was in the best interest of the searcher. And now that Google doesn't it clearly violates everything in their business footprint. I think Big G is looking for trouble.
|Up until Big Daddy google was clearly indexing all pages. |
Indexing and performing are two different animals. Google will index anything. What happens to those pages after the indexing is what counts.
Big Daddy was a major update with Google. Crawl priorities changed and it appears that many people complaining are victims of that crawl priority. Too many factors involved to try and guess what the issues are unless a complete discovery is done for each and every site affected.
There are blanket issues which may have affected a large portion of those indexed pages and it appears that pages below a certain level on sites with a certain PR are affected.
So, you wait a few months in hopes of Google reindexing and re-evaluating those pages and hopefully adding them back into their index. If your site meets any of the criteria that Matt Cutts has spoken about at his blog or any of the criteria we've discussed here at WebmasterWorld, then you'll want to make the appropriate changes.
Don't run out and make immediate changes thinking it was this when in fact it was something totally different. I've watched people keep their sites in constant upheaval because they are chasing algos. They lose a few pages and then start making rash decisions as to what caused it when in most cases it's just the constant flux of Google's index. Usually within 30-45 days many go back to what they used to be. When they don't, that is when you start looking for the cause.
We know what most of the causes are in Big Daddy and if you focus on the cause and make the appropriate changes, all you can do is wait it out. Googlebot will continue to index and at some point in the next couple of unpublished updates, you'll start to see your pages appearing again or getting indexed again at which time you can start a Google Update Topic. ;)
I cleaned up my site easy by just submitting the robots.txt file into the URL removal tool.
Make sure you have a valid robots.txt file
Log on to the google URL removal tool and use the "submit URL of robots.txt" option. it will crawl your site and remove all 404 pages and all disallowed pages.
If it does not automatically catch the 404 pages then use a wildcard * in the disallow to do a sweep of your old url's.
be careful though it is a powerful tool
Disallow: / will remove your entire site from google for 6 months
"We know what most of the causes are in Big Daddy"
Is their a more or less agreed upon listing of these causes?