Forum Moderators: Robert Charlton & goodroi
Seems strange that Google are introducing this service now and not try to improve googlebot when following links.
[google.com...]
There's a lot of truth to that. Given that Google themselves said, this tool will not have an effect on ranking.
> The page priority setting might be useful in some
> circumstances. If you have two or more pages listed
> on the same page of the SERPS, you should be able to
> use the page priority to list the better one
I disagree. Relevancy of a page is not an inherent value of a page, but directly related to a search query. It would make no sense to use the priority setting when ordering results, beause the priority setting is not related to a search query in any way.
My guess is that priority says more about what you would rather like to get crawled by Google. Let's say your page has low PR and backlinks and so Google will only spent little time/effort to deep crawl your site. Up to now, it was quiet possible that you got all your intermediate and navigational pages into Google, but the bot missed more of the "meat pages".
I think the whole sitemap feature will be most helpful those sites have had little/limited exposure to Googlebot.
> Google on the other hand benefit, because they'll use
> up less bandwidth (they'll be able to crawl pages
> less frequently). Also their index size will increase
> quite substatially, which'll come in handy next time
> they want to get a bit of news coverage.
Well, less bandwidth also is a benefit to the webmaster. If you have a dynamic site, but most of the content pages don't change much once they are published, that will always send a 200 and if you don't/can't use Last-Modified, than this will be very helpful.
More pages in the index is also helpful for a webmaster. For competitive terms you may be right, that those pages won't rank well. But somehting that a lot of people here may find hard to believe: There are a lot more topics and phrases _not_ to be found anywhere in Google (or other searchengines), that might finally make their way into the index. So this will be an improvement for phrases where there are less than 10 matches overall in the index.
I'm going to set up a cron and have it run once and hour, since it doesn't seem to put any undue load on the server. We add about 20 to 30 new pages of content per day, so hopefully it'll speed up how soon those pages get indexed
"Billy" can you please explain how you do that? I don't see anyone mentioning here server crashes or overloads. Look at the thread on WebProWorld.com "Google Sitemaps: RSS For The Entire Website?" and see what happened to one site. Have any of you checked your server loads?
I'd go for "monthly" too, since this tag seems to be meant as an 'educated guess'. In <lastmod> you'll populate the ISO-8601 timestamp of the page's last modification and you'll ping Google on every change, so Googlebot should use that value and crawl the page asap, regardless what's given in <changefreq>.
But I'm having a heck of a time getting shell access and finding out if I even have python 2.2!
Do you spoze this sitemap thing is advantage enough to change to a host that has python 2.2?
Cuz for some months, many fewer of my 2000+ pages at pulsemed.org have been indexed in Google... at least according to what they say when you search the domain.
B
"Billy" can you please explain how you do that? I don't see anyone mentioning here server crashes or overloads. Look at the thread on WebProWorld.com "Google Sitemaps: RSS For The Entire Website?" and see what happened to one site. Have any of you checked your server loads?
I have checked loads during update of the sitemap and they're fine. Not sure what caused the problem that user described.
If you want to PM me I'll give you my IM and I can help you with the cron.
<urlset>
<url>
<loc>http://www.example.com/1.html</loc>
<changefreq>always</changefreq>
</url>
<url>
<loc>http://www.example.com/2.html</loc>
<changefreq>always</changefreq>
</url>
<url>
<loc>http://www.example.com/3.html</loc>
<changefreq>always</changefreq>
</url>
</urlset> And are there any limit of URLs in the sitemap, or can I add thousands of URL in it?
[edited by: ciml at 2:47 pm (utc) on June 8, 2005]
[edit reason] Examplified [/edit]
And are there any limit of URLs in the sitemap, or can I add thousands of URL in it?
As it says in the documentation, there is a limit of 50,000 URLs or (more likely to be hit first) a 10MB limit on the uncompressed sitemap.
With any more than this, you can (as documented) use a siteindex that can point to up to 1,000 sitemaps. There's no obvious limit on the number of siteindexes, so there's no limit on the pages per domain.
For one domain I've built a siteindex pointing to a few dozen sitemaps that contain 1.8 million URLs. It'll be interesting to see if this helps more pages get indexed from the site (it's recently dropped to 0.6 million indexed pages with a lot of them url only).
BTW, it's really worthwhile compressing the sitemaps for a large domain, the compression ratio can be better than 50:1 so a large saving in bandwidth.
googlebot comes and checks robots.txt and sitemap.xml about twice a day. Previously they only crawled the root file, now they have crawled, but not yet indexed 3 more. The homepage that was URL only for about 3 weeks has now got a snippet.
It certainly is not forcing googlebot into the deep crawl that I would like, but it seems that it is at least taking the hint.
BigDave - I have a url only homepage problem - is the snippet showing on all dcs - if not have you got the dc? Do you deffo think this is a result of sitemap submission? GG did talk about re-inclusion request coming in around now.
I wrote a short tutorial on Google Sitemaps which should provide answers to a couple of your questions above. I'm not sure whether a link to my site is permitted, so please use the link in my profile, go to 'Internet' and click on 'How to make use of Google SiteMaps'.
Sebastian X- this rocks :-) Thanks!
particularly helpful was getting the php tool, phpSitemap so I didn't have to mess with python.
python... bad google, bad!
"Billy" can you please explain how you do that? I don't see anyone mentioning here server crashes or overloads. Look at the thread on WebProWorld.com "Google Sitemaps: RSS For The Entire Website?" and see what happened to one site. Have any of you checked your server loads?
I have checked loads during update of the sitemap and they're fine. Not sure what caused the problem that user described.
If you want to PM me I'll give you my IM and I can help you with the cron.
BigDave - I have a url only homepage problem - is the snippet showing on all dcs - if not have you got the dc? Do you deffo think this is a result of sitemap submission? GG did talk about re-inclusion request coming in around now.
While it certainly could be coincidence that it went snippet in the middle of the sitemap activity, it would be a very big coincidence. The behaviour of the bot leads me to believe that the other pages getting crawled is almost certainly related to the sitemap.
Thanks but my hosts using cPanel don't appear to use Python. So it would appear I'm going to have to use the plain text method if I understand this correctly. Or, and the XML method still be used? They don't exactly make that clear.
I had the same issue. Forget python- it's not standard enough, evidently. I used phpSitemap, and all I had to do was chmod 666 the sitemap.xml, enter my info in the php file from the browser...
That's the way to go.
B
On the sitemap admin-page, when a sitemap has been downloaded does that at the same time mean that those sites has been crawled?
No.
There is no guarantee that they will crawl ever (or even any) URL you list. Not now, or ever.
All you are doing is telling them about the pages that you have, and how often you expect to update them and possibly what priority you would give them for crawling.