Forum Moderators: Robert Charlton & goodroi
OK, on the 16th of May I posted that I was removing the Google Sitemap for one of my sites. After 1 week Google dropped all pages except for the homepage. This was no big deal as these were all old pages and were still indexed using SEF unfriendly URLs. For the past 6 months I have had SEF URLs turned on.Google had not indexed any new pages for this site since December last year. We'll see if Google adds any more pages from this site to the index. Home page is PR4.
It seems this experiment was successful. After bringing back the old 35 pages as supplemental for a few weeks, Google re-indexed the whole site and now has all 119 pages from the site, from all depths, in the index, without any supplementals. These pages are ranking well for the chosen terms as well.
I had a feeling submitting a sitemap is a waste of time, I know one site's results does not prove it, but I won't be wasting my time submitting sitemaps for any new sites I do. I must say though that having a sitemaps *account* (without the sitemap submitted) is VERY handy to check Google's crawlling problems with your site/s.
Now I am going to remove the sitemaps for 2 other poorly indexed sites I manage and see how they go. I'll keep you posted.
When you say sitemaps, do you mean an insite sitemap or the use of Google sitempas service?
I'll just make it clear:
I still use the Google sitemap service to check for indexing problems, missed pages, 404's etc - but I took away my sitemap - so Google had no idea what was on this site unless they spidered it themselves.
I also didn't have a sitemap on the site itself. The main page did have a PR of 4 so Google went deep enough and fetched and indexed all pages.
On another site I started less than 3 weeks ago, I put a sitemap on the index page, and Google has already indexed the whole site (although there are only 22 pages on the site). This was a brand new site, however it was an old domain I resurrected (it had not been used for 3 years). It had only 2 links pointing to it too, 1 x .gov and 1 I added 2 weeks ago. It's been a long while since I've seen Google index a site so quickly, they even beat MSN on this one!
<Sorry, no website specifics.
See Forum Charter [webmasterworld.com]>
[edited by: tedster at 2:04 pm (utc) on June 28, 2006]
the medical community has been in uproar: how many unique ways can you describe the common cold? therefore how much duplication occurs across online medical journals etc. So they all suffer because of duplication? filters might have been adjusted slightly releasing some sites. obviously large chunks replicated across sites is going to fall short, but i think they hit it a bit hard when BD was rolled out.
sitemap.xml
versus
sitemap.htm or .html (or asp or php)
I actually found a site whose hyperllink to the "Site map" actually went to the sitemap.xml file; useless for a site visitor!
Either way refering to a "sitemap.xml" file makes it all clear.
I recently dumped one of my sitemap.xml files too, but more out of concern for lack of indexing of image files (.jpg .gif, etc. not in my xml file)
(Off Topic) But I actually think it was the Frame buster code that was preventing Google from indexing any new images; and Yahoo had removed all my images! Yahoo hates frame buster code (for images at least)!
I still use the Google sitemap service to check for indexing problems, missed pages, 404's etc - but I took away my sitemap - so Google had no idea what was on this site unless they spidered it themselves.
All the sites I have put up after Big Daddy was introduced get crawled and indexed with no problems whatsoever - even a Joomla CMS site without SEF URLs. It is only sites that were put up before Big Daddy that have had problems, and I put it down to three things (in order of importance, bearing in mind that they have good content):
1) Not enough quality inbound links
2) Having Google Analytics on the site (why let Google know about your traffic? If they see low traffic, they aint gonna index ya!)
3) Having Google Sitemaps - once again, maybe it's better to keep Google in the dark - let them keep crawling to find out how many pages we have.