Google Sitemaps

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Sitemaps

Googlebot getting tired?

shadows2000

1:13 am on Jun 3, 2005 (gmt 0)

I found an interesting new service from Google called Google sitemaps (I haven't seen this mentioned elsewhere. Seems you can give Googlebot a helping hand if some pages are not getting indexed?

Seems strange that Google are introducing this service now and not try to improve googlebot when following links.

[google.com...]

BigDave

8:13 pm on Jun 19, 2005 (gmt 0)

In my opinion, robots.txt should take preference.

While I can see your point, there is a perfectly valid opposing viewpoint that you have now specifically told the search engine to index that URL.

The real problem is that it is an undocumented situation, that should be documented no matter whether robots.txt or sitemap takes precedence.

I would recommend that you post it to the sitemap newsgroup where a Google engineer is more likely to spot it.

bnhall

8:50 pm on Jun 19, 2005 (gmt 0)

Sorry to be dense - which sitemap newsgroup?

surfer67

8:57 pm on Jun 19, 2005 (gmt 0)

Does the submitted sitemap file need to have a .xml extension?

Reason I'm asking is I have created an .asp file using an asp script.

dmorison

9:06 pm on Jun 19, 2005 (gmt 0)

Sorry to be dense - which sitemap newsgroup?

[groups-beta.google.com...]

There's a thread there right now about the sitemap / robots.txt conflict.

Edit: Looks like you started it... :)

Does the submitted sitemap file need to have a .xml extension?

No - Google will just request the URI that you submit and work out what to make of it.

That said, it is still good practice to make your script issue the Content-Type: text/xml header.

[edited by: dmorison at 9:12 pm (utc) on June 19, 2005]

surfer67

9:09 pm on Jun 19, 2005 (gmt 0)

What are your thoughts on creating a sitemap page and linking to it from all pages within your site. Only difference is that this page will contain actual links as oppossed to xml text elements.

Maybe this would be a good supplement to the Google Sitemap feature.

taps

7:05 am on Jun 20, 2005 (gmt 0)

surfer: as far as I can see that, any extension should work as long as it sends xml. I tried it with a php file that generates it's content dynamically. After submitting it Google says that it is "OK".

waitman

7:34 am on Jun 20, 2005 (gmt 0)

"Does the submitted sitemap file need to have a .xml extension?"

I have had a few people report that they receive "not found errors" when the filename has underscores or dashes. Changing the name from ex-sitemap.xml to exsitemap.xml seemed to fix it.

The Google parser appears to be picky. It does seem to require that the server report the correct mime type for xml or gzip.

Take care

Waitman

pallaton

1:22 pm on Jun 20, 2005 (gmt 0)

Hi all,

New at this forum...

wonna ask if anyone had any result from submitting google sitemap, and what do you think about submitting google sitemap for a ban website :-)

Thanks,
Pallaton.

sailorjwd

1:25 pm on Jun 20, 2005 (gmt 0)

Surfer,

I submitted a .asp sitemap and it works great.. most all my pages are now indexed - they weren't before submiting.

sailorjwd

1:27 pm on Jun 20, 2005 (gmt 0)

Using underscores...

I submitted site_map.asp

and it works fine.

surfer67

1:34 pm on Jun 20, 2005 (gmt 0)

sailor,

How long after your sitemap was downloaded did it take to index your pages? Mine was downloaded 5 hours ago and I don't see any new indexed pages.

Do you have a large site?

pallaton

2:45 pm on Jun 20, 2005 (gmt 0)

What do you use to create the sitemap... and did google index pages from the sitemap that don't have a link to it?

I wonna try sending google sitemap for a new website that don't have a link from anywhere and has not been submitted to any search engine...

what do you think?

Pallaon.

kamran mohammed

3:16 pm on Jun 20, 2005 (gmt 0)

Dear Pallaon

That would be useful and it will help sure i have seen the results...

Cheers...Good Luck

KaMran - White Eagle

surfer67

3:19 pm on Jun 20, 2005 (gmt 0)

Does google post a "crawled" status message after a site's sitemap file has been executed?

BigDave

5:06 pm on Jun 20, 2005 (gmt 0)

surfer,

Sitemaps are not a way to *make* google crazwl your site. They are just a way for you to tell google about pages that you would like to have crawled. They make no guarantees that they will crawl those pages within any specific timeframe, if ever.

That said, they do seem to do a significant crawl within a couple of days of submitting the sitemap. One new site that only had the home page indexed got 42 of the 68 pages crawled after about a day. The rest of them have just been crawled and indexed over the weekend.

digitalv

7:02 pm on Jun 20, 2005 (gmt 0)

Can anyone post a sample sitemap that I could modify?

BigDave

7:12 pm on Jun 20, 2005 (gmt 0)

Can anyone post a sample sitemap that I could modify?

Go to the google sitemaps site and check the docs. They have a sample that you can easily cut and paste.

Or, just make a list of URLs, one per line, in a text file. If you aren't going to add any additional information, then there is little point in making the XML file instead of going with straight text.

alexo

10:46 pm on Jun 20, 2005 (gmt 0)

hello

1 question.

do i need to resumbit my sitemap (via [google.com...] every time i update my site(and sitemap).

or this button (resubmit)is only, if goolge don't allow ur previous map format.?

thx

SebastianX

11:43 pm on Jun 20, 2005 (gmt 0)

>do i need to resumbit my sitemap (via [google.com...] every time i update my site(and sitemap).

Yes. Google recommends resubmits on content changes. Currently it works smoothly without resubmits. Googlebot does request sitemaps like a clockwork, and mostly does deep crawls afterwards (at least on popular sites). However, this behavior is not documented, thus you can't rely on it.

sailorjwd

11:54 pm on Jun 20, 2005 (gmt 0)

Surfer,

I have a small site <300 pages.. it took maybe 2 or 3 days for the new listings to propagate through the various DCs... I suspect it showed up in one of them within 24 hours but I don't know which gets it first.

sailorjwd

11:58 pm on Jun 20, 2005 (gmt 0)

About resubmitting..

I submitted an asp program, as I've mentioned. So it is dynamically generating the xml listing everytime G goes and opens it.

I think that is the best way to go.. I don't know what you need on a unix server.. php perhaps.

I am wondering if I should leave it up there with G or remove it until I change something on the website? As it stand now G is rerunning the sitemap at least everyday, if not twice a day.

bumpaw

1:42 am on Jun 21, 2005 (gmt 0)

I used the same PHP script on 5 similar sites (not content) and only 4 were found by googlebot. It's been two days and I have been over and over them all and can't find a reason for the lone miss. It was finding the Python generated sitemap.xml.gz but can't find the PHP generated sitemap.xml file.

webbyfro

4:46 am on Jun 23, 2005 (gmt 0)

Hey guys, I've submitted my sitemap, but it's a short list of links, generated from the home page and listing all the pages that were linked from the main page only. But since most of the links go to my archives, which are themselves linked to EVERY SINGLE PAGE ON MY SITE, would it matter that my sitemap list was small and not all-inclusive of EVERY SINGLE that exists on my site? Or is the idea to submit ALL of your pages at once? (I have quite a bit, over 9,000 plus.)

I hope I made this question clear; it's kinda hard to understand even to me, and I wrote it! Anyways, thanks.

chinook

1:22 pm on Jun 23, 2005 (gmt 0)

webbyfro
The idea is to list every page on the site including any dynamic ones. For example if you have a product page that changes depending on what product is searched (....mysite. com/product.asp?productID=widgetX)then each instance of this page should get indexed. To be really clear:
product.asp?productID=widgetY
product.asp?productID=widgetZ
product.asp?productID=widgetG

Each of the above pages should be considered a distinct and separate page. The site map helps in that it provides the googlebot (who is not really tired as the title suggests) with a hint of what pages exist.

surfer67

2:36 pm on Jun 23, 2005 (gmt 0)

Three days after submitting my sitemap I noticed this morning that google had indexed about a hundred new pages. I just ran another check and saw that the newly indexed pages are now all gone.

Any ideas or similar experiences?

webbyfro

5:07 pm on Jun 23, 2005 (gmt 0)

@chinook,

<<The idea is to list every page on the site including any dynamic ones.>>

Thanks. I thought all Google needed was a page or two, and then they would just spider everything that was linked to that page, etc. I'll resubmit a longer list, one with ALL my pages.

TerryMc

2:32 am on Jun 27, 2005 (gmt 0)

So if I submit a sitemap, and it does not have all my pages on it, will pages that are already in the index continue to be spidered and included in the index?

fearlessrick

5:14 am on Jun 27, 2005 (gmt 0)

This whole thing is a crock. Next month Google will come out with some other feature for you to feed their hungry engine. Does nobody stop to look at the incredible sums of money Google is making off YOUR websites?

It's either a crying shame or a laughable bad joke the way webmasters so readily sumbit to Google's every advisement.

Suppose next week, Bill Gates tells you all to shut your sites down for two days so that his new whizbang technocrawler can properly index the 'net. Would you do it? I'd wager a few would if promised "higher rankings."

Kinda sad, how shortsighted people are these days.

BigDave

7:49 am on Jun 27, 2005 (gmt 0)

Yup, I'm pretty short-sighted.

I submitted a new site to froogle, and so far, my only sales have come from my froogle listings. I haven't even gotten to the point of buying any ads yet, and the site is in the black. Somehow, I seem to be confused here because it seems to me that I made money off Google in this case, not the other way around.

That site also only had the home page crawled. Then I submitted a sitemap and the next day 45 pages were crawled. The next day those pages were in the index. By the end of the week, all pages were crawled and indexed.

On another site, I created a subdomain yesterday, and submitted the sitemap this morning. The sitemap was read within an hour. Crawling has begun.

Given past experience, I would not expect such fast response, and would not expect either site to be fully indexed any time soon.

If you are so bothered by Google making money off your site, block them. I on the other hand, like the traffic they send me, so I will take advantage of every tool they offer to help get indexed faster.

Janiss

4:31 pm on Jun 27, 2005 (gmt 0)

I tried using a Google sitemap generator for a site of mine (the tool worked great for another site I have), but it only listed the home page and no other pages. What sort of things would be keeping a tool like this from crawling my site?

This 188 message thread spans 7 pages: 188

Google Sitemaps

Googlebot getting tired?

shadows2000

BigDave

bnhall

surfer67

dmorison

surfer67

taps

waitman

pallaton

sailorjwd

sailorjwd

surfer67

pallaton

kamran mohammed

surfer67

BigDave

digitalv

BigDave

alexo

SebastianX

sailorjwd

sailorjwd

bumpaw

webbyfro

chinook

surfer67

webbyfro

TerryMc

fearlessrick

BigDave

Janiss

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week