Forum Moderators: Robert Charlton & goodroi
I am scared to submit my site to google.
I have a site with 10,000 pages in which 4500 pages have been indexed.
Now i was told to submit my site thru Google sitemap to get the rest of the pages indexed by Google...
On my observattions i see that everyday at least 10-20 pages gets indexed by Google....as my site gets crawled daily.
Now what i was thinking if i submit my site thru Google sitemaps ....will that hurt the current scenario of daily indexing of new pages.
Or should i go ahead and submit the sitemap..
This is very critical issue for me coz i don't want to get in to any trouble.
Hope to get some good advices ...
Regards,
KaMran - White Eagle
[edited by: engine at 4:03 pm (utc) on June 21, 2005]
[edit reason] TOS [/edit]
[edited by: engine at 4:04 pm (utc) on June 21, 2005]
[edit reason] TOS [/edit]
Not worth the hassle as I change my content (some of it) daily.
Ann
Not worth the hassle as I change my content (some of it) daily. "
Google snags ours xml map twice daily. I set a cronjob to automatically update the site map in the early mornings to reflect any new pages added. Any changes made to pages I can easily set up a last modified by writing a script to handle that part. (haven't played with that yet nor do I see a huge need as of yet) I believe google will continue to inspect the xml site map possibly visiting more frequently if the site map changes more often.
>If you read the Google groups a little more you'll also see that noise on this subject is slowly picking up.
I did and came to the conclusion that most of the complaints were just that: noise. There may be bugs in Google's beta software, but to the best of my knowledge none of them got revealed yet.
When I fail on something, I prefer to start searching for the causes in my stuff. I rethink my concept, test and check the developed approach and its implementation, I even check the code and I read the developer's release notes in changed scripts, and if I don't find the cause I go back and validate the initial idea again. In most cases it's a fault at my site, and I've learned exciting new things.
> I for one do not see the sense in making the sitemap and then, when something is changed on your site, adding a new section or making major changes to a page you must delete the old map and resubmit a new one.
It makes perfect sense if you automate it.
> Warning: TEST and CHECK the auto-created file and be VERY careful about filtering out any extra pages you have laying around and .jpgs and gifs which you DON'T want found. We've already learned a few other tricks by watching it's motion, but all in all I'd say it works as advertised for us.
I second that.
[edited by: engine at 4:34 pm (utc) on June 21, 2005]
[edit reason] TOS [/edit]
Given all this, is there any real downside to NOT using Google's new sitemap feature?
My site is much smaller than kamran's, but the sitemap I have up gets spidered regularly.
So do many or most pages, and new ones are indexed within days.
Will it will hurt to NOT submit to Google Sitemap? - Larry
Warning - Do not submit, nothing but grief and Google sitemap team not answering emails just auto responses
Indigojo, would you please elaborate on that? My main site was one of the countless sites trashed in Bourbon. Several days ago I got back most of my G SERP's to the previous spots, but with some I did not. I'm wondering if doing this sitemap thing will help, or if it will hurt me. My site is only about 120 (HTML) pages of sales products, (and also about that many or more PDF files).
Thanks.
Why *could* it hurt to provide a sitemap?
1. Google could be evil and just dump the submitted URLs in the index.
- Not possible, since lots of site owners report increased crawling and improved visibility on the SERPs.
2. The sitemap system could be buggy, dumping the submissions on every 27th web site or so.
- Possible, it's a beta program, but only Google can tell us.
3. Googlebot could become greedy and slow down our servers by massive crawls.
- There is no reason why the sitemaps should not end in the standard crawling process, which does not put to much load on web servers.
4. Googlebot could find stuff in sitemaps it should better not see, and red-flag the server or a bunch of pages.
- A webmaster can test and check the sitemap's content before its submission/resubmission. There remains a risk, because Googlebot could find links to fishy stuff by following links on never crawled pages. However, this could happen without a sitemap too.
I think it's safe enough, if there is no outdated junk located on the web server, and if the sitemaps are well maintained and audited, and if the site in question is not the cash-cow or even the whole operation.
3. Googlebot could become greedy and slow down our servers by massive crawls.
- There is no reason why the sitemaps should not end in the standard crawling process, which does not put to much load on web servers.
FWIW, one of the neg's I saw regarding this was servers crashing due to a massive load. They don't know if it was created by the G bot spidering too much, the sitemap format, or what.
I think it's safe enough, if there is no outdated junk located on the web server, and if the sitemaps are well maintained and audited, and if the site in question is not the cash-cow or even the whole operation.
So are you saying that it's not a good idea to use this sitemap on your main B&B site? Your last line seems to imply that. If so, please explain.
Thanks.
Thursday - add 1 sitemap
Friday - add 3 more sitemaps covering all remaining pages
Late Friday - See huge traffic drop 45% - check logs see all google referrals gone
Late Friday - remove sitemaps
Late Saturday - Back to normal
Early Sunday - Back to zero again
Late Sunday - Decide to add back sitemaps - all pulled down by Google
Monday - Still no indexed pages
Early Tuesday - Remove sitemaps again and now waiting
Now I realize that most people appear to be OK. My word of warning is that there may be a slim possibility that sitemaps will be detrimental if you are a large content based site. As my by boss subsequently said "if it ain't broken, why fix it" - great insight with the benefit of hindsight but I really didn't expect this to happen as we are clean.
Yeah, I fully live by "if it ain't broke don't fix it", too bad G doesn't also live by that (ala Bourbon).
At least it's not a good idea before it's tested with a few not that important, but comparable sites. That's not so much an issue of Google sitemaps, but an issue of implementing a new technology or component in running systems, which are very different from client to client. Also, sitemaps is a beta program launched 3 weeks ago, that means there remains a unknown risk of failure or even damage. Honestly, I do trust Google's engineers, so after some tests with a small site of mine I've implemented sitemaps for others, successful when it comes to results by the way. Beforehand I've send out a letter explaining Google's new service and stating that there remains a unknown 'beta-risk'. A few firms decided to wait, since their sites are completely indexed and rank fine.
The smaller of the two sites has been having trouble getting indexed lately so I'm diving into the Google sitemaps headfirst with that one. I've heard good things from others on this forum (provided you're careful about what's in your sitemap file). There is some element of danger if you don't pay attention to what you're doing, but most good tools are like that. I'm lucky in that my small site has very few pages so I was able to go through the autogenerated sitemap file by hand and spot problems before the file was submitted. I know this just isn't feasible for those of you out there who have website with hundreds of thousands of pages, you'll have to be very careful.
The next step for me is to add a component to my CMS to do the sitemaps file because none of the automated tools are quite what I'm looking for and I don't want to have to eyeball every update.
As far as my larger site, I'm going to hold off... Taking the "it's not broken so I'm not going to fix it" approach for the moment until I get a bit more hands on experience with this sitemap stuff and I get the CMS module done up the way I want it.
Usually it lasts longer to phase out a bunch of pages, what you have seen here looks like a temporary replication issue. Taking sitemaps up and down in that frequency does not allow serious testing.
Did Googlebot crawl a huge amount of pages from your sitemaps shortly after downloading the sitemap file? If so, that *could* indicate a software glitch. It *could* be the case that current entries were replaced by fresh crawl results, and the delete message has overhauled the fresh data on their way to the data centers (I know it doesn't work exactly like that, but I guess it's an allowed simplification). On the other hand, that would be such an obvious point of failure, that I doubt Google's engineers weren't able to foresee it. So most probably it's coincidence.
>Thursday - add 1 sitemap
>Friday - add 3 more sitemaps covering all remaining pages
>Late Friday - See huge traffic drop 45% - check logs see all google referrals gone
>Late Friday - remove sitemaps
>Late Saturday - Back to normal
>Early Sunday - Back to zero again
>Late Sunday - Decide to add back sitemaps - all pulled down by Google
>Monday - Still no indexed pages
>Early Tuesday - Remove sitemaps again and now waiting
you know google may have to go through each of those phases before it stabalizes? Pick a method and stick with it until it propogates. The net is not instant response, sometimes it takes weeks.
Will sitemaps hurt or help?
With the internet languages available, I took the philosopy that "HTML will never die". It might even become the most usefull (brail,voice ect). The newer flashy,dynamic,posh environments have their rightfull place (such as a rock-band website) but text is best handled through HTML so for a text-based website, HTML can never go wrong. The internet will always be HTML, look at your server header.
For software I have the exact opposite philosophy - never buy the BETA, wait till at least 1.0, wait till the guinea pigs have found all the bugs, crashed their computers, reported the bug so others like me don't have to reboot.
What is google sitemaps?
a way to get your site indexed.
I thought they did that with bots?
Well they still do it with bots but this tells googlebot what to crawl and what not to crawl.
I thought they did that with robots.txt will googlebot still follow robots.txt?
well yes - always, but robots.txt will be followed for bandwidth control on the server, as it was inititially intended, not for inclusion in a search engine index.
So this only affects google index only, and not robots.txt?
yes robots.txt is an adopted standard - theres no turning back now. Even google needs robots.txt .