Forum Moderators: open
The reason for the question is I have three site maps with urls crawler1.php, crawler2.php, and crawler3.php, with the words "site map" included in the link texts from the index page. These are only infrequently visited by Google. Is there something specific I should change them to?
My apologies if this has been covered before, but a search did not turn up anything on this question.
Harry
A page is a page is a page.
Yes... unless it's a site map. Googleguy has many times urged members to use them, and the recommendation is documented in Googles' guidelines.
One can only assume that googlebot finds them a quick and easy way to index a site.
My view is that it would be surprising for Google to urge web sites to have them, but make no provision for recognizing them for what they are. And yet that is what seems to be the case.
Actually sitemaps are only good on Google when a site is new or if you add new pages
I continually add more pages, which is why I am so interested. I used to be able to rely on Google picking up new pages when it did a deep crawl, but now that it only grabs pages at random, it's purely chance whether Google finds the new pages or not.
To overcome this I create links to new pages from my index page which is some help but not a very elegant solution.
Harry
If your whole site is in Google you can disable the sitemap.
How can you say that - once there is no link to the page, the page drops out, there can be no PR transfer, and no value of text link? I have seen sites fully indexed that have "upgraded" so that no Googlable path exists to the page, the pages drop out of Google over time. Google can know a page exists, but that does not give it a high SERP. I tell Google a page exists by putting it on my index page. Once it is found, the long term link goes on a site map page. A new page listed on an index page takes less time to be found than on an site map page, but I don't need the high PR for that inner page long term.
A page is a page is a page
An index page is only another page. Yes, recommended by Google, but what is to say that via a menu structure via css/li, every page can't have every link on it, or for that matter many other ways to index sites that have the same end result. Google is wanting to be able to access every page, and so having a site map is what it recommends. That is not to say that there is something magical about the page called "site map". DMOZ does not have a site map with every link on it, the whole site is a site map going from general to specific.
Rightly or not, I've been under the impression that the best approach is just a single link from the homepage. This way you don't end up with the site map getting an inflated PR from all the pages linking to it & as a result appearing in the SERPS ahead of real content pages. Instead the site map simply provides a path for spiders to follow and PR is conserved for appropriate pages.
My view has in part been influenced by an article on seo-guy.com which for PR conservation reasons argues against traditional sitemaps
Or have I got it wrong? :( Some of the previous posts suggest to me people have all pages link to site maps so they can use them to funnel PR to pages.
What's your rational for that approach WBF?
One of many gifts given to me by these forums. This approach to a site map has given me top SERPs in my niche across multiple search terms. I have a PR of 5 that is consistant across most the the 150 or so pages of my main site. I am spidered consistantly. I don't have to worry about the SE's confusing www.mysite.com with www.mysite.com/index/ (all of my links are absolute).
Take some time to surf the WW boards on this subject. There is a wealth of information.
WBF
1. GoogleGuy has always advocated the use of a site map.
2. Google recommend using a site map -http://www.google.com/webmasters/guidelines.html
"Offer a site map to your users with links that point to the important parts of your site. If the site map is larger than 100 or so links, you may want to break the site map into separate pages"
3. Google use a site map themselves - [google.com...] (It is called sitemap.html)
4. All links on Google's site map use the full url i.e. [google.com...]
5. All or most pages on your site should IMO, and it seems by most other people at WW link to the site map as does most of the "Help" pages in Google.
Regards
EW
I've been under the impression that the best approach is just a single link from the homepage. This way you don't end up with the site map getting an inflated PR from all the pages linking to it & as a result appearing in the SERPS ahead of real content pages. Instead the site map simply provides a path for spiders to follow and PR is conserved for appropriate pages.
This doesn't work in a low PR site. The PR inflation is needed to trigger Googlebot into following the links from the site map. Presumably due to resource restrictions, Google puts a low priority on following links from low value pages. Furthermore site map links on every page are useful to users.
This business of optimising PR distribution within a site has given me one of my main headaches. During the deep-crawl days, PR was less important on the home page than on the specific keyword-optimised pages.
My mental image of a web site is not a pyramid hierarchy, but a globular-shaped organism floating in space. There is a nucleus of an index page, sitemap(s), robots.txt, help and copyright pages, etc. From there grow the second level theme pages, which then fan out like fronds with sticky pages at the ends to catch the unwary visitor. With that scenario in mind, I structured the site so that PR arriving at the index page (or any other) was directed out to maximise the pages at the business end of the fronds. It worked very well too.
Unfortunately Google moved the goal posts. I responded by reverting to a standard link structure which puts PR back on the index page. But until Google has re-indexed all my pages and re-calculated the PR, my index page PR remains artificially low. The result is Catch 22. I need Google to re-index my entire sight, but due to the perceived sites low status, this is going to take forever.
Even getting more high PR links doesn't help as it should because the PR is still being sucked out to the end pages. As my old granny used to say: he's been so sharp, he cut himself.
Yes I also noticed the variants in the words site map and sitemap
GoogleGuy uses the term site map
[webmasterworld.com...]
Strange how some words only evolve on the web.
Take for example website
Google search for website returns 41,000,000 results
An online dictionary says:
"The development of website as a single uncapitalized word mirrors the development of other technological expressions which have tended to evolve into unhyphenated forms as they become more familiar."
As For sitemap?
No entry found for sitemap
Yet there are 21,500,000 results for sitemap on Google!
:-)
Anyone else seem sitemap in a dictionary?
EW
GoogleGuy uses the term site map
[webmasterworld.com...]I took a look at the above and noted GoogleGuy's comment:
My fave tip from Brett's suggestions: add a page of content a day to your sitePresumably this was in the days you could count on pages getting indexed. :)
1) My site map is named "sitemap.php". Why? Google's is named "sitemap.html". I once had it named "site-map.php" but after doing some looking around WW I changed it to its current name (and created the appropriate 301 redirect).
2) My site map is linked to from just about every page (in a header include) using an absolute link.
3) The site map itself resides in the root directory of the web site and uses relative links. I think this method (with a simple, well formed directory structure) cuts down on the possibility of spiders "getting lost". Using relative links keeps page size down and I've yet to hear a solid argument for absolute links. I've yet to have a spider not find its way across my site.
4) My site map doesn't link to every page. Just the major sections and sub-sections. Linking to every page could be helpful if your site is small but remember the 100 Link Rule.
But, the biggest key to Google is if it doesn't help your visitors/users then it isn't worth it. That's the best tip I've taken away from these forums.
I hope this answers a few questions and at the least provides an example of how someone implements their site map.
One I'm unsure of the technical difference between sitemap and index. I have an index page widgets.com/index.html that google won't index. My homepage is indexed.
What do you call your home page?
"/index.html" is generally used as the default page for the site, and therefore the natural "home" page. You may be confusing Google, especially if it has indexed your default home page as "widgets.com/" without specifying the actual name.
As to site maps, you appear to be using your index page as one. In book terminology, the home page provides the contents list you would find at the beginning of a book, while the site map is like the detailed index you would find at the back of the book.
I would suggest renaming your pages to the more usual names, and set up a redirect in htaccess for the old home page. Your home page is renamed to "/index.html" and your old "/index.html" becomes your site map.
For best chances on getting your site map indexed, I would call it "site-map.html", or "site-map-2.html" etc., and use "site map" as the linking text.
[added] You will also need to ensure your server knows that /index.html is the default page [/added]
Harry
"What do you call your home page?"
widgets.com is not a page. The URL widgets.com needs a file/page for anything to show. These files are commonly called default.htm or index.htm or index.html or index.php or index.shtml... etc. Yours has to be called something. Using default.htm for the homepage and index.html for a sitemap is non-standard to say the least, and it isn't much of a surprise it would trick a bot.
My homepage is www.widgets.com
My site map is www.widgets.com/index.html
My inbound links are all towards www.widgets.com
If someone clicks on a link "www.widgets.com" at another site, your server will serve the default home page. This is set in your server. If it is not, the attempt will fail.
To check what page is your default you could browse to one of the sites that have a link to you, click on it, and see which page of your site is displayed. Or just type in www.widgets.com at the address bar.
It is probable that Google has indexed that page as www.widgets.com/ without giving the full name. To check see what page Google has cached under www.widgets.com.
There are two possibilities that I can think of:
1) the page displayed will be www.widgets.com/index.html.
2) it will be something else, probably www.widgets.com/default.html.
1) If it is index.html then all you have to do is create a new page (any name) which will become your new index page. Then rename index.html to site-map.html (or similar), and rename your new page to index.html.
Your index.html page should have links to the major areas of your site and also to site-map.html (Two or more site maps if the link content is more than 100).
It is best to put the links to the site map or maps at the top of index.html. You want Google to spot these easily.
All Google will notice is that the content of your index.html page has changed. If it was indexed it will stay indexed.
2) The case where the page displayed is not index.html. If it is default.html, than that is fine. All you have to do is rename your index.html page to site-map.
As SteveB pointed out, home pages are usually called default.html or index.html, etc. Your hosting provider may insist on a specific name. My feeling is that index.html is becoming the standard, but default.html is still used.
Harry