Forum Moderators: open

Message Too Old, No Replies

site maps - does Googlebot recognize them as such?

What is the best url/link text for a site map?

         

HarryM

2:28 am on Oct 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does Googlebot recognize a site map as such or does it see it merely as another page with a lot of links on it? Do the words "site map" or "sitemap" in the link text or the url of a site map page mean anything to Googlebot? Is there something specific the bot looks for?

The reason for the question is I have three site maps with urls crawler1.php, crawler2.php, and crawler3.php, with the words "site map" included in the link texts from the index page. These are only infrequently visited by Google. Is there something specific I should change them to?

My apologies if this has been covered before, but a search did not turn up anything on this question.

Harry

HarryM

12:58 am on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A page is a page is a page.

Yes... unless it's a site map. Googleguy has many times urged members to use them, and the recommendation is documented in Googles' guidelines.

One can only assume that googlebot finds them a quick and easy way to index a site.

My view is that it would be surprising for Google to urge web sites to have them, but make no provision for recognizing them for what they are. And yet that is what seems to be the case.

ogletree

1:11 am on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Actually sitemaps are only good on Google when a site is new or if you add new pages. It is also very good for sites that are dynamic or have DHTML menus. Once Google knows about a page it keeps looking for it. I still get requests for pages that have been gone for a year. If your whole site is in Google you can disable the sitemap. Of course blind people really like sitemaps. It helps with the screen reader software.

HarryM

1:44 am on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually sitemaps are only good on Google when a site is new or if you add new pages

I continually add more pages, which is why I am so interested. I used to be able to rely on Google picking up new pages when it did a deep crawl, but now that it only grabs pages at random, it's purely chance whether Google finds the new pages or not.

To overcome this I create links to new pages from my index page which is some help but not a very elegant solution.

Harry

t2dman

3:07 am on Oct 31, 2003 (gmt 0)

10+ Year Member



If your whole site is in Google you can disable the sitemap.

How can you say that - once there is no link to the page, the page drops out, there can be no PR transfer, and no value of text link? I have seen sites fully indexed that have "upgraded" so that no Googlable path exists to the page, the pages drop out of Google over time. Google can know a page exists, but that does not give it a high SERP. I tell Google a page exists by putting it on my index page. Once it is found, the long term link goes on a site map page. A new page listed on an index page takes less time to be found than on an site map page, but I don't need the high PR for that inner page long term.

A page is a page is a page

An index page is only another page. Yes, recommended by Google, but what is to say that via a menu structure via css/li, every page can't have every link on it, or for that matter many other ways to index sites that have the same end result. Google is wanting to be able to access every page, and so having a site map is what it recommends. That is not to say that there is something magical about the page called "site map". DMOZ does not have a site map with every link on it, the whole site is a site map going from general to specific.

biggles

3:16 am on Oct 31, 2003 (gmt 0)

10+ Year Member



I'd be interested to hear how most people link to their sitemap. Are you featuring just 1 link to it from the homepage or are all pages linking to it?

Rightly or not, I've been under the impression that the best approach is just a single link from the homepage. This way you don't end up with the site map getting an inflated PR from all the pages linking to it & as a result appearing in the SERPS ahead of real content pages. Instead the site map simply provides a path for spiders to follow and PR is conserved for appropriate pages.

My view has in part been influenced by an article on seo-guy.com which for PR conservation reasons argues against traditional sitemaps

Or have I got it wrong? :( Some of the previous posts suggest to me people have all pages link to site maps so they can use them to funnel PR to pages.

willybfriendly

3:26 am on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd be interested to hear how most people link to their sitemap. Are you featuring just 1 link to it from the homepage or are all pages linking to it?

Every page. Absolute, not relative links.

WBF

biggles

3:40 am on Oct 31, 2003 (gmt 0)

10+ Year Member



Every page. Absolute, not relative links.

What's your rational for that approach WBF?

willybfriendly

4:00 am on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What's your rational for that approach WBF?

One of many gifts given to me by these forums. This approach to a site map has given me top SERPs in my niche across multiple search terms. I have a PR of 5 that is consistant across most the the 150 or so pages of my main site. I am spidered consistantly. I don't have to worry about the SE's confusing www.mysite.com with www.mysite.com/index/ (all of my links are absolute).

Take some time to surf the WW boards on this subject. There is a wealth of information.

WBF

biggles

5:05 am on Oct 31, 2003 (gmt 0)

10+ Year Member



WBF - thanks. But my question wasn't about absolute vs relative links. Just wondering why you chose to link all pages to the sitemap, rather than just the homepage.

willybfriendly

5:33 am on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Biggles - my answer was to your question. Absolute links was a bonus. Do I get extra credit? :o

[edited by: willybfriendly at 5:40 am (utc) on Oct. 31, 2003]

shaadi

5:38 am on Oct 31, 2003 (gmt 0)

10+ Year Member



Site maps help in getting the site crawled properly and one should use them for the same purpose.

BlueSky

5:57 am on Oct 31, 2003 (gmt 0)

10+ Year Member



My sitemap is in the header of every page. I do this because it helps get my pages indexed more thoroughly plus it's actually used by visitors. Many sites do the same. I've never had a problem of the sitemap popping to the top of the results over other pages in searches.

steveb

7:38 am on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Title your site map something relatively unfriendly for your main search terms like "twowords.com site map".

EarWig

7:55 am on Oct 31, 2003 (gmt 0)

10+ Year Member



This summary might help.

1. GoogleGuy has always advocated the use of a site map.

2. Google recommend using a site map -http://www.google.com/webmasters/guidelines.html
"Offer a site map to your users with links that point to the important parts of your site. If the site map is larger than 100 or so links, you may want to break the site map into separate pages"

3. Google use a site map themselves - [google.com...] (It is called sitemap.html)

4. All links on Google's site map use the full url i.e. [google.com...]

5. All or most pages on your site should IMO, and it seems by most other people at WW link to the site map as does most of the "Help" pages in Google.

Regards

EW

Namaste

10:15 am on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



a rose by any other name is still a rose :)

HarryM

3:30 pm on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been under the impression that the best approach is just a single link from the homepage. This way you don't end up with the site map getting an inflated PR from all the pages linking to it & as a result appearing in the SERPS ahead of real content pages. Instead the site map simply provides a path for spiders to follow and PR is conserved for appropriate pages.

This doesn't work in a low PR site. The PR inflation is needed to trigger Googlebot into following the links from the site map. Presumably due to resource restrictions, Google puts a low priority on following links from low value pages. Furthermore site map links on every page are useful to users.

This business of optimising PR distribution within a site has given me one of my main headaches. During the deep-crawl days, PR was less important on the home page than on the specific keyword-optimised pages.

My mental image of a web site is not a pyramid hierarchy, but a globular-shaped organism floating in space. There is a nucleus of an index page, sitemap(s), robots.txt, help and copyright pages, etc. From there grow the second level theme pages, which then fan out like fronds with sticky pages at the ends to catch the unwary visitor. With that scenario in mind, I structured the site so that PR arriving at the index page (or any other) was directed out to maximise the pages at the business end of the fronds. It worked very well too.

Unfortunately Google moved the goal posts. I responded by reverting to a standard link structure which puts PR back on the index page. But until Google has re-indexed all my pages and re-calculated the PR, my index page PR remains artificially low. The result is Catch 22. I need Google to re-index my entire sight, but due to the perceived sites low status, this is going to take forever.

Even getting more high PR links doesn't help as it should because the PR is still being sucked out to the end pages. As my old granny used to say: he's been so sharp, he cut himself.

HarryM

3:33 pm on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This summary might help.

Thanks EarWig, I think that's really useful.

HyperGeek

6:56 pm on Oct 31, 2003 (gmt 0)

10+ Year Member



I've found that providing site maps actually help get your content indexed faster.

My site maps rarely include anything other than a blank page with links. No logos, no template HTML frame, just links to pages within my site between the <body> tags.

HarryM

11:12 pm on Oct 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



EarWig,

Something else for your summary.

Google uses the words "Site Map" as link text, not "Sitemap".

Whether this is of any significance, who knows. :)

EarWig

9:42 am on Nov 1, 2003 (gmt 0)

10+ Year Member



HarryM

Yes I also noticed the variants in the words site map and sitemap

GoogleGuy uses the term site map
[webmasterworld.com...]

Strange how some words only evolve on the web.
Take for example website
Google search for website returns 41,000,000 results
An online dictionary says:
"The development of website as a single uncapitalized word mirrors the development of other technological expressions which have tended to evolve into unhyphenated forms as they become more familiar."

As For sitemap?
No entry found for sitemap
Yet there are 21,500,000 results for sitemap on Google!
:-)

Anyone else seem sitemap in a dictionary?

EW

HarryM

10:38 am on Nov 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



EW

GoogleGuy uses the term site map
[webmasterworld.com...]

I took a look at the above and noted GoogleGuy's comment:

My fave tip from Brett's suggestions: add a page of content a day to your site

Presumably this was in the days you could count on pages getting indexed. :)

BlueSky

11:19 am on Nov 1, 2003 (gmt 0)

10+ Year Member



Anyone else seem sitemap in a dictionary?

There are more English words (and their local variations) NOT in the dictionary than are in it.

Oaf357

2:59 pm on Nov 2, 2003 (gmt 0)

10+ Year Member



This is how I do things with my site map:

1) My site map is named "sitemap.php". Why? Google's is named "sitemap.html". I once had it named "site-map.php" but after doing some looking around WW I changed it to its current name (and created the appropriate 301 redirect).

2) My site map is linked to from just about every page (in a header include) using an absolute link.

3) The site map itself resides in the root directory of the web site and uses relative links. I think this method (with a simple, well formed directory structure) cuts down on the possibility of spiders "getting lost". Using relative links keeps page size down and I've yet to hear a solid argument for absolute links. I've yet to have a spider not find its way across my site.

4) My site map doesn't link to every page. Just the major sections and sub-sections. Linking to every page could be helpful if your site is small but remember the 100 Link Rule.

But, the biggest key to Google is if it doesn't help your visitors/users then it isn't worth it. That's the best tip I've taken away from these forums.

I hope this answers a few questions and at the least provides an example of how someone implements their site map.

illudium

4:43 pm on Nov 3, 2003 (gmt 0)

10+ Year Member



Google indexes without problem my sitemap files which I've broken out into chunks per letter of the alphabet. I've got ones that are well over 100k and well over this "100 link limit". Mine get indexed just fine, show up in the results just fine and drive quite a bit of traffic for me.

HarryM

7:49 pm on Nov 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Illudium wrote:

Google indexes without problem my sitemap files

Pleased for you. But it would be helpful if you could say what the names of your site maps are, what is the linking text, and how often they are indexed.

Harry

gefilte

6:36 am on Nov 4, 2003 (gmt 0)

10+ Year Member



I have a question along these lines. One I'm unsure of the technical difference between sitemap and index.
I have an index page widgets.com/index.html that google won't index. My homepage is indexed. The homepage links to the index, which links to all the other pages. A few of the other pages are indexed, those with inbound links.
A while ago, people on Webmasterworld suggested I split the index since it had over a hundred links (all internal). I did this two months ago, but no luck. Does the Google Algo dislike index.html? I can't see what else is wrong.

HarryM

2:13 pm on Nov 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One I'm unsure of the technical difference between sitemap and index. I have an index page widgets.com/index.html that google won't index. My homepage is indexed.

What do you call your home page?

"/index.html" is generally used as the default page for the site, and therefore the natural "home" page. You may be confusing Google, especially if it has indexed your default home page as "widgets.com/" without specifying the actual name.

As to site maps, you appear to be using your index page as one. In book terminology, the home page provides the contents list you would find at the beginning of a book, while the site map is like the detailed index you would find at the back of the book.

I would suggest renaming your pages to the more usual names, and set up a redirect in htaccess for the old home page. Your home page is renamed to "/index.html" and your old "/index.html" becomes your site map.

For best chances on getting your site map indexed, I would call it "site-map.html", or "site-map-2.html" etc., and use "site map" as the linking text.

[added] You will also need to ensure your server knows that /index.html is the default page [/added]

Harry

gefilte

7:04 am on Nov 6, 2003 (gmt 0)

10+ Year Member



My homepage is www.widgets.com
My site map is www.widgets.com/index.html
My inbound links are all towards www.widgets.com
If I change this now, they'd all be indalid. Is this really necessary? How could I get my index page in google without changing the address of either of these pages?

steveb

9:54 am on Nov 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To ask the question again:

"What do you call your home page?"

widgets.com is not a page. The URL widgets.com needs a file/page for anything to show. These files are commonly called default.htm or index.htm or index.html or index.php or index.shtml... etc. Yours has to be called something. Using default.htm for the homepage and index.html for a sitemap is non-standard to say the least, and it isn't much of a surprise it would trick a bot.

HarryM

4:24 am on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My homepage is www.widgets.com
My site map is www.widgets.com/index.html
My inbound links are all towards www.widgets.com

If someone clicks on a link "www.widgets.com" at another site, your server will serve the default home page. This is set in your server. If it is not, the attempt will fail.

To check what page is your default you could browse to one of the sites that have a link to you, click on it, and see which page of your site is displayed. Or just type in www.widgets.com at the address bar.

It is probable that Google has indexed that page as www.widgets.com/ without giving the full name. To check see what page Google has cached under www.widgets.com.

There are two possibilities that I can think of:

1) the page displayed will be www.widgets.com/index.html.
2) it will be something else, probably www.widgets.com/default.html.

1) If it is index.html then all you have to do is create a new page (any name) which will become your new index page. Then rename index.html to site-map.html (or similar), and rename your new page to index.html.

Your index.html page should have links to the major areas of your site and also to site-map.html (Two or more site maps if the link content is more than 100).

It is best to put the links to the site map or maps at the top of index.html. You want Google to spot these easily.

All Google will notice is that the content of your index.html page has changed. If it was indexed it will stay indexed.

2) The case where the page displayed is not index.html. If it is default.html, than that is fine. All you have to do is rename your index.html page to site-map.

As SteveB pointed out, home pages are usually called default.html or index.html, etc. Your hosting provider may insist on a specific name. My feeling is that index.html is becoming the standard, but default.html is still used.

Harry

This 66 message thread spans 3 pages: 66