Forum Moderators: Robert Charlton & goodroi
[edited by: Robert_Charlton at 11:20 am (utc) on Jan 8, 2013]
[edit reason] examplified domain [/edit]
I suppose the 50.025 pages are due to the 404 page.
I think not. This would indicate that there are pages that are either not returning proper 404/410 response or that you have too many pages that are noindexed, that are redirecting or that you have a duplicate content / thin content issue.
Right now I am in process tidying up a site that should have rougly 2000 URLs indexed, where WMT reported over 80,000 "Not selected" URLs. These were due to:
- server previously returning 200 OK for pages Not Found
- having many URLs with dates in URL that should not have allowed to be indexed in the first place.
At the begining of December we have asked for a change to be implemented to return 410 for all pages that should not have been indexed owing to dates in URL and to return proper 404 response when the page is genuinly not found.
This has resulted in "Not Selected" initially dropping daily at a rate by aproximately 500 URLs/day, and then last week WMT recording a big drop of almost 40,000 URLs from "Not Selected" chart in WMT. After 5 weeks the site is now down to 20,000 "Not Selected" URLs.
From what we can see, it seems that URLs returning 410 are dropped from "Not selected" quicker than URLs returning 404.
I would therefore carefully inspect your URLs, perhaps using "site" command narrowed down by using "inurl" string using some filters, to see where these "Not selected" are coming from. I don't think they are because of 404 errors.
[edited by: helenp at 8:18 pm (utc) on Jan 8, 2013]
Reading your post above again, now that you have fixed your redirection to home page for pages not found, and hopefully you are returning 404 Not Found status, you should see the number of "Not Selected" starting to drop over next few weeks
Quite dont understand what you mean with this:
"I would therefore carefully inspect your URLs, perhaps using "site" command narrowed down by using "inurl" string using some filters, to see where these "Not selected" are coming from. I don't think they are because of 404 errors. "
Do you mean searching in google?
What I meant is that you could do something like:
site:example.com inurl:*php*someparam=
which will (for example) return all URLs that are php pages and which Google has in its index that use this particular parameter in URL. Often you will get message "...we have omitted some entries very similar to the 3 already displayed..." in which case you should click on "repeat the search with the omitted results included" to get the number of URLs indexed but "Not selected".
Trying this with different URL patterns and different parameters can show you which URLs may have problem in (perhaps) duplicate content, thin content and similar and you can check whether you have addressed the problem with blocking these via robots or noindexing them or in some other way.
[edited by: helenp at 10:04 pm (utc) on Jan 8, 2013]
Could I do this, or is that bad manner?
In htaccess serving a 410 instead of 404:
//Custom 404 errors
ErrorDocument 404 <local-path>/error-410.html
and then a personalized 410 page with a redirection to my homepage.
how can google webmaster tool tell me there are 969 url with the parameter $propiedad
@helenp
You posted in December about a ranking drop for your english language pages in this thread: [webmasterworld.com...]
Have your rankings recovered? If they did, then perhaps the ranking problem was caused by these 50,000 URLs Google has "discovered" and now that they are returning 404, Google reports these 404 errors.
If you did fix your site so that 404 response is correctly returned (when it wasn't previously), then the message on increased number of 404 in WMT is a normal situation. You should briefly review these URLs, then declare them as "fixed" and if they are not linked from anywhere any more (e.g. they were result of some kind of error on the site/hosting) then they will not re-appear again.
From your sample URL I would imagine you have had somewhere relative path problem OR redirect problem. Relative path problem can occur if you internally linking to URL where href does not have a full path root / and where your page where the link is on has folders.
It could also be caused by badly implemented site move.
Interesting is that you said that some of these URLs that return 404 have good incoming links - what would indicate that such URLs have existed for some time - long enough to acquire good links.
From your sample URL I would imagine you have had somewhere relative path problem OR redirect problem.Exactly right. Creating links like that without the path or / can allow users and Google to resolve those pages in any public directory you have. Can turn a 100 page website into a virtual 10,000 page monster.
Exactly right. Creating links like that without the path or / can allow users and Google to resolve those pages in any public directory you have. Can turn a 100 page website into a virtual 10,000 page monster.
how can one convert a nearly 700 page sites links....No experience with Dreamweaver but FrontPage and Notepad++ have mass find and replace functions. If Google is showing they are 404, it sounds like the links they DID have are no longer valid and may never have been. Can you visit any of those links and see a real content page? It almost sounds like you need to decern between the actual 404 for never-existed pages (404) and those you deleted (410).
manually a nightmare.
No experience with Dreamweaver but FrontPage and Notepad++ have mass find and replace functions. If Google is showing they are 404, it sounds like the links they DID have are no longer valid and may never have been. Can you visit any of those links and see a real content page? It almost sounds like you need to decern between the actual 404 for never-existed pages (404) and those you deleted (410).
@helenp - I hope you aren't having to repeat yourself - Did you mention where Google is finding the links in the "Linked from" section of GWT? Are you able to judge that your site is the cause of most or all of them?
Not sure what you mean, think you mean where the links comes from to those odd 404In GWT, on the "crawl error" page you can select each error and by type of error. When you check the links by clicking them, what additional information does it give you about where the links are "Linked From" (that's the name of the tab)?
So this is bad href="svenska/index.htm"?
I guess maybe I can configure dreamwaver to do href="/svenska/index.htm" instead for future links
href="/svenska/" DirectoryIndex directive should take care of delivering the correct content.