homepage Welcome to WebmasterWorld Guest from 54.226.173.169
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 188 message thread spans 7 pages: < < 188 ( 1 2 3 [4] 5 6 7 > >     
Google Sitemaps
Googlebot getting tired?
shadows2000




msg:723687
 1:13 am on Jun 3, 2005 (gmt 0)

I found an interesting new service from Google called Google sitemaps (I haven't seen this mentioned elsewhere. Seems you can give Googlebot a helping hand if some pages are not getting indexed?

Seems strange that Google are introducing this service now and not try to improve googlebot when following links.

https://www.google.com/webmasters/sitemaps/

 

lamoya




msg:723777
 9:20 am on Jun 11, 2005 (gmt 0)

that is another google trick to nuke your sites.

Dawg




msg:723778
 11:35 am on Jun 11, 2005 (gmt 0)


that is another google trick to nuke your sites.

Guys, don't be paranoic.

There are several good reasons why google could have launched this sitemap stuff. Catching a few spammers might be a side-effect (if at all).

nemo2




msg:723779
 12:03 pm on Jun 11, 2005 (gmt 0)

"that is another google trick to nuke your sites."
looks to me G$$ becomes more and more EVIL try to keep its loosing position as the number 1 search engine (Look at bourbon update gigathread)

SEOtop10




msg:723780
 3:10 pm on Jun 11, 2005 (gmt 0)

With due respect, why can't we limit our discussion to the topic on hand - that is Google sitemaps?

Please don't misunderstand me, I have suffered very badly in the last 1 year because of the Google upheaval. But I have no hope of getting anything back by cursing anyone. However if we do meaningful analysis of the happenings, I am sure all of us will learn a lot of strategies to avoid being penalised and improve our ranking.

<ducks to avoid a barrage of flames coming his way>

Bluepixel




msg:723781
 3:16 pm on Jun 11, 2005 (gmt 0)

I added the sitemap to my newly created domain.

Google so far requested the sitemap 30 times, but never visited any pages, allthough the site is linked from other domains as well.

bazzais




msg:723782
 4:00 pm on Jun 11, 2005 (gmt 0)

I have a mixed reaction to the new Sitemaps service. In theory it should be a good one, but I can imagine that it will remain as one of the last things google looks at before applying rank to the page.

my experience this week:-
I registered two domains last week one the .co.uk and one the .com

I placed them over exactly the same page and proceded to make a sitemaps file for the .com domain (I have not even linked to the .co.uk at this point so shouldn't be known to google)

the xml sitemap was downloaded in an hour and googlebot came calling - listed the .com straight away.

two day later the .com disappears and the .co.uk appears (even though there are no external links to it)

a week later and the .com is invisable but the .co.uk is still listed.

I am aware that most new sites have a speed woble in the first week or two (depending on googles update cycle) and it is not enough evidence to make any assumptions - but if your having a similar ride, you know your not the only one. It appears preliminary that this is not a fast entry route to the google index, probably exactly the same as using the 'add url' page or having an inbound link from an indexed page.

dazzlindonna




msg:723783
 4:53 pm on Jun 11, 2005 (gmt 0)

I added the sitemap to my newly created domain.

Google so far requested the sitemap 30 times, but never visited any pages, allthough the site is linked from other domains as well.

Bluepixel, that is EXACTLY my experience. Well, maybe not 30 times exactly, but at least that many times.

willamowius




msg:723784
 5:44 pm on Jun 11, 2005 (gmt 0)

Did anyone have success submitting a sitemap without an account? The FAQ says it can be done.

While sitemaps submitted via an account seem to be crawled within a few hours, that doesn't seem to be the case for sitemaps that were only submitted via the ping URL.

chinook




msg:723785
 6:32 pm on Jun 11, 2005 (gmt 0)

Google will be posting third party apps to the sitemaps at the following:
[code.google.com...]
As they say, the list is incomplete, they didn't include our asp.net tool.

bazzais




msg:723786
 12:01 am on Jun 12, 2005 (gmt 0)

They included my tool though ;)

Still there is more time for development and progression of any tools that utilise the sitemaps program.

seunosewa




msg:723787
 8:28 am on Jun 12, 2005 (gmt 0)

Same experience. They've been downloading my sitemap several times a day for the past week and seem to have stopped deep-crawling my site: now they only pick up the home page :(

dmorison




msg:723788
 12:08 pm on Jun 12, 2005 (gmt 0)

I've just posted an experimental free tool (URL in profile) that uses a different method to generate the sitemap - one that will pick up unlinked and database driven pages.

Users of the tool simply need to place a tracking pixel on every page of their website (using HTML or CSS) and then browse through the site, or wait for their normal traffic to have covered every page for them. The great thing about this approach is that priority values can then be scaled based on the actual popularity of pages on the site.

Having said that, I just checked the popular hit-counter and website stats services (who obviously have the database in place to offer this for their customers) and none seem to have picked up on it yet, so it might not be that great an idea... :o

bumpaw




msg:723789
 6:11 pm on Jun 12, 2005 (gmt 0)

I just got the phtyon sitemap script running on one site and wondering how much trouble to go to in filtering images and other non page files.

Billy Batson




msg:723790
 10:14 pm on Jun 12, 2005 (gmt 0)

I have started filtering for "printer friendly" pages.

wattsnew




msg:723791
 2:55 am on Jun 13, 2005 (gmt 0)

Sitemaps might be a way to get GoogleBot interested in crawling:

a) the 50% of site pages that went URL only just before Bourbon
b) a new section of the site added during Bourbon (and mainly tagged "Google noindex" to keep the bot from getting confused...)

GoogleBot has only hit my index page and one other each day since Bourbon began. I can't tell if my changes ( canonical - 301 redirect and removal of anything remotely duplicated ) will be effective with no crawl!

BUT...

I'm going to be doing a lot of tinkering with the site for a while and I don't want to make a new SITEMAPS file ( .txt probably, it's only 140 pages) every day. Do I?

What happens if you post a sitemap for Google today and then remove it two weeks from now (assuming good things happen in that time) letting GoogleBot do its normal thing?

Could GoogleBot miss the SITEMAPS and decide the site is GONE? :o I guess it's a bit early to know this.

panicbutton




msg:723792
 6:47 am on Jun 13, 2005 (gmt 0)

Sitemaps looks like a band-aid solution to get around the fact that Google's black box algorithm is more and more often "refusing" to crawl a large number of sites. No one at Google knows how the algorithm works anymore and anyway, the black box nature of the algorithm prevents direct tweaking, so they are force-feeding it via Sitemaps.

Or perhaps a better analogy is that they have removed the algorithm's gag reflex.

phpmaven




msg:723793
 6:53 pm on Jun 13, 2005 (gmt 0)

Did anyone have success submitting a sitemap without an account? The FAQ says it can be done.

While sitemaps submitted via an account seem to be crawled within a few hours, that doesn't seem to be the case for sitemaps that were only submitted via the ping URL.


I had the same thing happen to me. When I ran the Python script and it finished saying that Google had been pinged, the sitemap didn't get downloaded by googlebot. When I submitted the same sitemap through my account, it got downloaded within a few hours.

Ocean10000




msg:723794
 8:25 pm on Jun 13, 2005 (gmt 0)

Small Update:
I can say my sitemap files have been downloaded about 3 times days a day since I have submitted them via the Google account, first day it was made public here. Google was visiting maybe tops 4 or 5 pages a day if that since Feb. Googlebot is now doing a deep crawl of my site. I am usually adding 20 to 30 pages a month about comic books. It at last time I checked only has only vested 54% of the total pages on my site, maybe it will actually beat MS at how many pages it will have visited, for those who are curious MS only has visited 71% of my available pages.

sailorjwd




msg:723795
 1:06 am on Jun 14, 2005 (gmt 0)

I used the .asp tool today that is pointed to on the support page for google sitemaps.

I only had to limit the filetypes it was picking up.

Now what is this about not being able to use this for a commerical site? If you make any money any way (like Adsense) then you shouldn't use this?

I think I'm going to change the plaque on my desk from 'Murphy's Law' to 'Google's Law'

Johan007




msg:723796
 11:25 am on Jun 14, 2005 (gmt 0)

"We also accept RSS 2.0 and Atom 0.3 syndication feeds, using the link/lastMod fields."
lol saves alot of work! https://www.google.com/webmasters/sitemaps/docs/en/faq.html

Simple Text file: https://www.google.com/webmasters/sitemaps/docs/en/faq.html#s9

Armi




msg:723797
 12:11 pm on Jun 14, 2005 (gmt 0)

It´s a good idea to put URLs with a 301-redirect into a Google Sitemap?

I have the problem that Google don´t delete this URLs, because Google don´t visit these because they have no links.

I hope with the Google Sitemap that Google will visit this URLs and will get the 301-information -> delete!?!?

cyberfyber




msg:723798
 3:40 pm on Jun 14, 2005 (gmt 0)

Errr, I don't know if this'll help anyone but, what I've got over 900 pages for one site, and maybe some of you who are stuck with hundreds of pages might be able to make use of this method.

What I did was first take my original sitemap and paste it into MS Word. There I used Find & Replace (in as creative a fashion as possible) to remove all extras such as CSS coding and other words and such. (Wildcards came in very very handy).

I then ended up with a list of URLs situated one atop the other, line by line.

I then took a look at the sample XML code for a couple of pages and did the following Find and Replace to my list:

Replaced every instance of http://
with

<url>
<loc>http://

Then did a Search and Replace for .htm
replacing it with

.htm</loc>
<lastmod>2005-06-14</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>

From there of course I'd have to go back in and change all those specifications for lastmod, changefreq and priority. But it all turned out to being a hell of alot easier than other methods...unless of course you're code saavy enough to go the other way.

Those online SiteGenerators I found only went up to a maximum number of pages. Thus I was stuck

Hope this helps someone.
Note: I'm not even finished yet here on my end. ;-((

Johan007




msg:723799
 6:52 pm on Jun 14, 2005 (gmt 0)

For dynamic websites I really think using the <lastmod> date is too dangerous and I dont use it and its not needed because:

1.Changes to the non HTML like Title tags etc will not register and Google will not spider you!?
2.Unlike MS SQL etc… Access only has the created date automatically generated and you don’t want to use that in case you make changes to the db!

This sitemap stuff is pretty cool idea as my entire site map is only 50k for Google to use.

Johan007




msg:723800
 11:49 am on Jun 15, 2005 (gmt 0)

If you have a DB driven site Feel free to use my classic ASP code (easy to convert to PHP), call it sitemap.asp. To save on file size do not convert this code to inline code (its worth the slight server hit for multiple "response.write" becuase only Google will be using it. Again I would not use the Lastmod date as explained in my above post.


<!-- #Include virtual="/database-connection.asp" -->
<%

Response.Buffer = true
response.ContentType = "text/xml"
response.write "<?xml version='1.0' encoding='UTF-8'?>"
response.write "<urlset xmlns='http://www.google.com/schemas/sitemap/0.84'>"

' List your static URL's

response.write "<url>"
response.write "<loc>http://www.domain.co.uk/</loc>"
'response.write "<lastmod>" & Danger & "</lastmod>"
response.write "<priority>0.5</priority>"
response.write "<changefreq>daily</changefreq>"
response.write "</url>"

response.write "<url>"
response.write "<loc>http://www.domain.co.uk/sub-home</loc>"
'response.write "<lastmod>" & Danger & "</lastmod>"
response.write "<priority>0.5</priority>"
response.write "<changefreq>daily</changefreq>"
response.write "</url>"

' List your dynamic URL's

Dim n

n = 0

strSql = "SELECT *, Table.ID AS [pageID] WHERE Table.Delete<>'Y' ORDER BY Created DESC"
Set db = Server.CreateObject("ADODB.Connection")
Set Rs = Server.CreateObject("ADODB.Recordset")
db.Open strDBConnection
Rs.Open strSql, db

Do While Not Rs.EOF
intID = Rs.Fields("pageID").Value
response.write "<url>"
response.write "<loc>http://www.domain.co.uk/article.asp?ID=" & intID & "</loc>"
'response.write "<lastmod>" & Danger & "</lastmod>"

If n < 10 Then
response.write "<priority>1.0</priority>"
response.write "<changefreq>daily</changefreq>"
ElseIf n < 20 Then
response.write "<priority>0.7</priority>"
response.write "<changefreq>monthly</changefreq>"
Else
response.write "<priority>0.2</priority>"
response.write "<changefreq>yearly</changefreq>"
End If

n = n + 1

response.write "</url>"
Rs.MoveNext
Loop

Rs.Close
db.Close
Set Rs = Nothing
Set db = Nothing

' End Dynamic URL (maybe have another table?)

response.write "</urlset>"
%>

lastmod, priority and changefreq are all optional tags! Its unlikly they will be used but if they are then don’t forget its all relative so do try and have low values for all you’re old pages, I suggest priority for homepage 0.5 old pages 0.2 and new pages 1.0 which you can do dynamically. Same goes for changefreq.

Limitations:
This code obviously does not limit the URL count to 50,000 so keep an eye on that if you have a mega site – maybe add another counter to show number of URL in admin mode. The way it’s coded (non inline) the file size would never get to 10MB for 50,000 url’s and more like 5MB.

sailorjwd




msg:723801
 4:10 pm on Jun 15, 2005 (gmt 0)

Wow!

Sitemap really works.

I finally got the coding tweaks right last night for the sitemap.asp and submitted.

Today I see ALL my pages are now indexed and most with fresh cache date after having gone 80% url-only mid last month.

<added>
Not the sitemap.asp in the previous message
</added>

netmeg




msg:723802
 4:50 pm on Jun 15, 2005 (gmt 0)

I set this up on one of my small sites (50 pages) as a test. So far, in a little over a week, Google comes and gets my XML file maybe eight times a day (oddly enough, it comes once, and then comes back after five minutes, every time) but it was still only fetching the index page. Now it hasn't picked up either the XML file NOR any other page in over 24 hours. Prior to this, the site was being crawled pretty regularly by Googlebot, although it wasn't getting all the pages, at least the ones it was getting were regular. I'm kind of flummoxed by this behavior.

Johan007




msg:723803
 5:35 pm on Jun 15, 2005 (gmt 0)

netmeg what do you expect with a 50 page site? We are talking about sites with hundreds/thousands of pages.

Well done sailorjwd my pages too get indexed quickly, what code did you use - custom or an asp script? Did you use the last mod date etc...

silverbytes




msg:723804
 5:58 pm on Jun 15, 2005 (gmt 0)

What about including image files urls such [mydomain.com...] in your site map?
I think that would help your images search. But
I searched google newsgroup and the straigh answer to that question was: Include only document urls.

Why?

sailorjwd




msg:723805
 6:31 pm on Jun 15, 2005 (gmt 0)

Johan,

I used sitemap.asp from one of the recommended sites.

Note to anyone not getting it to work:

Make sure you don't reference the sitemap or file in the list of files! I think that's why I initially had problems because it had sitemap.asp as one of the files to parse which would put bot in loop.

StuffOfInterest




msg:723806
 7:19 pm on Jun 15, 2005 (gmt 0)

What about including image files urls such [mydomain.com...] in your site map?
I think that would help your images search. But I searched google newsgroup and the straigh answer to that question was: Include only document urls.

Really simple answer. Google indexes images based on the text appearing on the page referencing the image. If you pass just an image URL, how will Google know how to categorize it?

I doubt GoogleBot is smart enough to say "naked girl sunning on a beach" based on the image alone. :)

As for sitemaps in general, I'm seeing really good results. I added several pages last night and updated the sitemap to include them. They were picked up by GoogleBot this afternoon. If the pattern holds, they'll show up in the index tomorrow or Saturday.

GoogleBot seems to be paying plenty of attention to some of the features like "lastmod". I'm guessing this will really help GoogleBot be more efficient in crawling a site and not recrawling unchanged content too often.

netmeg




msg:723807
 7:43 pm on Jun 15, 2005 (gmt 0)

Johan, I have other sites with hundreds and thousands of pages, but I'm not going to make the effort on those sites till I find out if it's worth the time it takes to do it, and won't damage my current listings/spidering. I don't necessarily expect that the sitemap feature will do any better; what I DON'T expect is that it will do worse than before I created and submitted the file, which is what is seeming to be the case.

This 188 message thread spans 7 pages: < < 188 ( 1 2 3 [4] 5 6 7 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved