homepage Welcome to WebmasterWorld Guest from 54.226.180.223
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 188 message thread spans 7 pages: < < 188 ( 1 [2] 3 4 5 6 7 > >     
Google Sitemaps
Googlebot getting tired?
shadows2000




msg:723687
 1:13 am on Jun 3, 2005 (gmt 0)

I found an interesting new service from Google called Google sitemaps (I haven't seen this mentioned elsewhere. Seems you can give Googlebot a helping hand if some pages are not getting indexed?

Seems strange that Google are introducing this service now and not try to improve googlebot when following links.

https://www.google.com/webmasters/sitemaps/

 

SimmoAka




msg:723717
 8:21 am on Jun 5, 2005 (gmt 0)

Can you make a simple text based list of every URL on your site that you want crawled? That works just as well.

BigDave : yes you can. It sounds like the text version will receive a lower priority than the Python generated XML versions however.

twebdonny




msg:723718
 3:54 pm on Jun 5, 2005 (gmt 0)

Both my XML file and Urllist text file were accepted as
OK last night. Now what? Has anyone seen their site
spidered afterward, or have pages been showing increased
fresh dates? Now what comes next....

weela




msg:723719
 3:56 pm on Jun 5, 2005 (gmt 0)

"Has anyone seen their site
spidered afterward"

Yes, it took a few hours though.

ruserious




msg:723720
 4:03 pm on Jun 5, 2005 (gmt 0)

> I get the feeling that the only people who will
> benefit from the Google sitemaps are Google
> themselves.

There's a lot of truth to that. Given that Google themselves said, this tool will not have an effect on ranking.

> The page priority setting might be useful in some
> circumstances. If you have two or more pages listed
> on the same page of the SERPS, you should be able to
> use the page priority to list the better one

I disagree. Relevancy of a page is not an inherent value of a page, but directly related to a search query. It would make no sense to use the priority setting when ordering results, beause the priority setting is not related to a search query in any way.

My guess is that priority says more about what you would rather like to get crawled by Google. Let's say your page has low PR and backlinks and so Google will only spent little time/effort to deep crawl your site. Up to now, it was quiet possible that you got all your intermediate and navigational pages into Google, but the bot missed more of the "meat pages".

I think the whole sitemap feature will be most helpful those sites have had little/limited exposure to Googlebot.

> Google on the other hand benefit, because they'll use
> up less bandwidth (they'll be able to crawl pages
> less frequently). Also their index size will increase
> quite substatially, which'll come in handy next time
> they want to get a bit of news coverage.

Well, less bandwidth also is a benefit to the webmaster. If you have a dynamic site, but most of the content pages don't change much once they are published, that will always send a 200 and if you don't/can't use Last-Modified, than this will be very helpful.

More pages in the index is also helpful for a webmaster. For competitive terms you may be right, that those pages won't rank well. But somehting that a lot of people here may find hard to believe: There are a lot more topics and phrases _not_ to be found anywhere in Google (or other searchengines), that might finally make their way into the index. So this will be an improvement for phrases where there are less than 10 matches overall in the index.

twebdonny




msg:723721
 4:04 pm on Jun 5, 2005 (gmt 0)

Weela, so your log files actually show regular
ole Googlebot, spidering all your pages in your sitemap
file? Is that what you are saying?

dazzlindonna




msg:723722
 4:15 pm on Jun 5, 2005 (gmt 0)

For those of you who submitted a sitemap, how long did it take to get approved? I'm sitting there looking at the page that shows I submitted a sitemap, but it shows pending. Not sure if I should sit around and wait for that to change, or walk away for a few hours and come back later. Just curious.

twebdonny




msg:723723
 4:18 pm on Jun 5, 2005 (gmt 0)

It took about 2 to 3 hours to be approved in my case.
Again, my question is...what's next? Does approval indicate spidering has occurred, will occur...etc..?

MarkJH




msg:723724
 4:32 pm on Jun 5, 2005 (gmt 0)

I wonder if this will help those who have url-only pages listed in Google?

Clint




msg:723725
 5:04 pm on Jun 5, 2005 (gmt 0)

I'm going to set up a cron and have it run once and hour, since it doesn't seem to put any undue load on the server. We add about 20 to 30 new pages of content per day, so hopefully it'll speed up how soon those pages get indexed

"Billy" can you please explain how you do that? I don't see anyone mentioning here server crashes or overloads. Look at the thread on WebProWorld.com "Google Sitemaps: RSS For The Entire Website?" and see what happened to one site. Have any of you checked your server loads?

elsewhen




msg:723726
 5:54 pm on Jun 5, 2005 (gmt 0)

for those that have already submitted their sitemaps, what did you choose for the parameters:

<changefreq>

i have pages that are altered irregularly... whenever an error is found, or there is something in the news about a topic... "monthly" seems like the most appropriate: any thoughts on this?

SebastianX




msg:723727
 8:01 pm on Jun 5, 2005 (gmt 0)

>I have pages that are altered irregularly... whenever an error is found, or there is something in the news about a topic... "monthly" seems like the most appropriate: any thoughts on this?

I'd go for "monthly" too, since this tag seems to be meant as an 'educated guess'. In <lastmod> you'll populate the ISO-8601 timestamp of the page's last modification and you'll ping Google on every change, so Googlebot should use that value and crawl the page asap, regardless what's given in <changefreq>.

bbcarter




msg:723728
 1:15 am on Jun 6, 2005 (gmt 0)

Man, I can't wait to get this started

But I'm having a heck of a time getting shell access and finding out if I even have python 2.2!

Do you spoze this sitemap thing is advantage enough to change to a host that has python 2.2?

Cuz for some months, many fewer of my 2000+ pages at pulsemed.org have been indexed in Google... at least according to what they say when you search the domain.

B

Billy Batson




msg:723729
 1:36 am on Jun 6, 2005 (gmt 0)

"Billy" can you please explain how you do that? I don't see anyone mentioning here server crashes or overloads. Look at the thread on WebProWorld.com "Google Sitemaps: RSS For The Entire Website?" and see what happened to one site. Have any of you checked your server loads?

I have checked loads during update of the sitemap and they're fine. Not sure what caused the problem that user described.

If you want to PM me I'll give you my IM and I can help you with the cron.

MarkJH




msg:723730
 6:03 am on Jun 6, 2005 (gmt 0)

Has anybody trying to send a simple txt file got the OK yet? Even one created and saved using SuperEdi Unicode keeps getting a Parsing Error for me.

jozomannen




msg:723731
 3:22 pm on Jun 6, 2005 (gmt 0)

I want my pages to be crawled as often as possible, is this code correct then?:
<urlset>
<url>
<loc>http://www.example.com/1.html</loc>
<changefreq>always</changefreq>
</url>
<url>
<loc>http://www.example.com/2.html</loc>
<changefreq>always</changefreq>
</url>
<url>
<loc>http://www.example.com/3.html</loc>
<changefreq>always</changefreq>
</url>
</urlset>

And are there any limit of URLs in the sitemap, or can I add thousands of URL in it?

[edited by: ciml at 2:47 pm (utc) on June 8, 2005]
[edit reason] Examplified [/edit]

SebastianX




msg:723732
 3:37 pm on Jun 6, 2005 (gmt 0)

I wrote a short tutorial on Google Sitemaps which should provide answers to a couple of your questions above.
I'm not sure whether a link to my site is permitted, so please use the link in my profile, go to 'Internet' and click on 'How to make use of Google SiteMaps'.
HTH

robho




msg:723733
 3:43 pm on Jun 6, 2005 (gmt 0)

And are there any limit of URLs in the sitemap, or can I add thousands of URL in it?

As it says in the documentation, there is a limit of 50,000 URLs or (more likely to be hit first) a 10MB limit on the uncompressed sitemap.

With any more than this, you can (as documented) use a siteindex that can point to up to 1,000 sitemaps. There's no obvious limit on the number of siteindexes, so there's no limit on the pages per domain.

For one domain I've built a siteindex pointing to a few dozen sitemaps that contain 1.8 million URLs. It'll be interesting to see if this helps more pages get indexed from the site (it's recently dropped to 0.6 million indexed pages with a lot of them url only).

BTW, it's really worthwhile compressing the sitemaps for a large domain, the compression ratio can be better than 50:1 so a large saving in bandwidth.

BigDave




msg:723734
 3:58 pm on Jun 6, 2005 (gmt 0)

Well, the text based sitemap worked on a relatively new site without a lot of incoming links.

googlebot comes and checks robots.txt and sitemap.xml about twice a day. Previously they only crawled the root file, now they have crawled, but not yet indexed 3 more. The homepage that was URL only for about 3 weeks has now got a snippet.

It certainly is not forcing googlebot into the deep crawl that I would like, but it seems that it is at least taking the hint.

Dayo_UK




msg:723735
 4:01 pm on Jun 6, 2005 (gmt 0)

>>>The homepage that was URL only for about 3 weeks has now got a snippet.

BigDave - I have a url only homepage problem - is the snippet showing on all dcs - if not have you got the dc? Do you deffo think this is a result of sitemap submission? GG did talk about re-inclusion request coming in around now.

europeforvisitors




msg:723736
 4:04 pm on Jun 6, 2005 (gmt 0)

I wish Google would offer something like the old InfoSeek submission form where you could type in the URLs of multiple pages by hand and submit them with the click of a button. This would be a quick, simple, easy way for technophobes to help Googlebot find new content on small to medium-sized sites. (Right now Google does a great of indexing new pages that have links from the home page, but other new pages that are merely linked from those pages--such as pages 2, 3, and 4 of an article--take longer to get indexed.)

Speedmaster




msg:723737
 5:10 pm on Jun 6, 2005 (gmt 0)

I submitted a text file with all the URLs two days ago. Google site maps indicates it with a status OK. It looks like google has downloaded the file at least twice because yesterday it says it downloaded 8 hours ago and today it says it downloaded it 2 hours ago. So it looks like submitting a text file works just fine.

bbcarter




msg:723738
 5:16 pm on Jun 6, 2005 (gmt 0)

I wrote a short tutorial on Google Sitemaps which should provide answers to a couple of your questions above. I'm not sure whether a link to my site is permitted, so please use the link in my profile, go to 'Internet' and click on 'How to make use of Google SiteMaps'.

Sebastian X- this rocks :-) Thanks!

particularly helpful was getting the php tool, phpSitemap so I didn't have to mess with python.

python... bad google, bad!

Clint




msg:723739
 5:37 pm on Jun 6, 2005 (gmt 0)

"Billy" can you please explain how you do that? I don't see anyone mentioning here server crashes or overloads. Look at the thread on WebProWorld.com "Google Sitemaps: RSS For The Entire Website?" and see what happened to one site. Have any of you checked your server loads?

I have checked loads during update of the sitemap and they're fine. Not sure what caused the problem that user described.

If you want to PM me I'll give you my IM and I can help you with the cron.


Thanks but my hosts using cPanel don't appear to use Python. So it would appear I'm going to have to use the plain text method if I understand this correctly. Or, and the XML method still be used? They don't exactly make that clear.

SebastianX




msg:723740
 5:46 pm on Jun 6, 2005 (gmt 0)

You can use everything you have to produce the XML file, even vi or notepad coz it's a pretty simple format.

BigDave




msg:723741
 6:38 pm on Jun 6, 2005 (gmt 0)

BigDave - I have a url only homepage problem - is the snippet showing on all dcs - if not have you got the dc? Do you deffo think this is a result of sitemap submission? GG did talk about re-inclusion request coming in around now.

While it certainly could be coincidence that it went snippet in the middle of the sitemap activity, it would be a very big coincidence. The behaviour of the bot leads me to believe that the other pages getting crawled is almost certainly related to the sitemap.

dschreib




msg:723742
 9:07 pm on Jun 6, 2005 (gmt 0)

Has anyone had problems running the python script? I keep getting the same message:

File "sitemap_gen.py", line 474
linenum += 1
^
SyntaxError: invalid syntax

Not sure what I'm doing wrong, anyone have suggestions?

bbcarter




msg:723743
 9:41 pm on Jun 6, 2005 (gmt 0)

Thanks but my hosts using cPanel don't appear to use Python. So it would appear I'm going to have to use the plain text method if I understand this correctly. Or, and the XML method still be used? They don't exactly make that clear.

I had the same issue. Forget python- it's not standard enough, evidently. I used phpSitemap, and all I had to do was chmod 666 the sitemap.xml, enter my info in the php file from the browser...

That's the way to go.

B

jozomannen




msg:723744
 11:01 pm on Jun 6, 2005 (gmt 0)

On the sitemap admin-page, when a sitemap has been downloaded does that at the same time mean that those sites has been crawled?

BigDave




msg:723745
 11:06 pm on Jun 6, 2005 (gmt 0)

On the sitemap admin-page, when a sitemap has been downloaded does that at the same time mean that those sites has been crawled?

No.

There is no guarantee that they will crawl ever (or even any) URL you list. Not now, or ever.

All you are doing is telling them about the pages that you have, and how often you expect to update them and possibly what priority you would give them for crawling.

badtigger




msg:723746
 2:24 am on Jun 7, 2005 (gmt 0)

I tried the PHP script floating around Google Groups, and it worked like a charm, and generated the XML output perfectly according to their schema.
If anyone is interested pm me.

(Or, would it be ok to post it here with instructions?)

bbcarter




msg:723747
 4:00 am on Jun 7, 2005 (gmt 0)

If you have less than 100 pages on a site, there's a good one you can do from the browser- it came from google groups also- <snip>

I've tried several- some take forever, some hang, some don't work- that one performs well

B

[edited by: lawman at 2:56 pm (utc) on June 8, 2005]
[edit reason] No tools please [/edit]

This 188 message thread spans 7 pages: < < 188 ( 1 [2] 3 4 5 6 7 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved