Forum Moderators: Robert Charlton & goodroi
Seems strange that Google are introducing this service now and not try to improve googlebot when following links.
[google.com...]
Please don't misunderstand me, I have suffered very badly in the last 1 year because of the Google upheaval. But I have no hope of getting anything back by cursing anyone. However if we do meaningful analysis of the happenings, I am sure all of us will learn a lot of strategies to avoid being penalised and improve our ranking.
<ducks to avoid a barrage of flames coming his way>
my experience this week:-
I registered two domains last week one the .co.uk and one the .com
I placed them over exactly the same page and proceded to make a sitemaps file for the .com domain (I have not even linked to the .co.uk at this point so shouldn't be known to google)
the xml sitemap was downloaded in an hour and googlebot came calling - listed the .com straight away.
two day later the .com disappears and the .co.uk appears (even though there are no external links to it)
a week later and the .com is invisable but the .co.uk is still listed.
I am aware that most new sites have a speed woble in the first week or two (depending on googles update cycle) and it is not enough evidence to make any assumptions - but if your having a similar ride, you know your not the only one. It appears preliminary that this is not a fast entry route to the google index, probably exactly the same as using the 'add url' page or having an inbound link from an indexed page.
Users of the tool simply need to place a tracking pixel on every page of their website (using HTML or CSS) and then browse through the site, or wait for their normal traffic to have covered every page for them. The great thing about this approach is that priority values can then be scaled based on the actual popularity of pages on the site.
Having said that, I just checked the popular hit-counter and website stats services (who obviously have the database in place to offer this for their customers) and none seem to have picked up on it yet, so it might not be that great an idea... :o
a) the 50% of site pages that went URL only just before Bourbon
b) a new section of the site added during Bourbon (and mainly tagged "Google noindex" to keep the bot from getting confused...)
GoogleBot has only hit my index page and one other each day since Bourbon began. I can't tell if my changes ( canonical - 301 redirect and removal of anything remotely duplicated ) will be effective with no crawl!
BUT...
I'm going to be doing a lot of tinkering with the site for a while and I don't want to make a new SITEMAPS file ( .txt probably, it's only 140 pages) every day. Do I?
What happens if you post a sitemap for Google today and then remove it two weeks from now (assuming good things happen in that time) letting GoogleBot do its normal thing?
Could GoogleBot miss the SITEMAPS and decide the site is GONE? :o I guess it's a bit early to know this.
Or perhaps a better analogy is that they have removed the algorithm's gag reflex.
Did anyone have success submitting a sitemap without an account? The FAQ says it can be done.While sitemaps submitted via an account seem to be crawled within a few hours, that doesn't seem to be the case for sitemaps that were only submitted via the ping URL.
I only had to limit the filetypes it was picking up.
Now what is this about not being able to use this for a commerical site? If you make any money any way (like Adsense) then you shouldn't use this?
I think I'm going to change the plaque on my desk from 'Murphy's Law' to 'Google's Law'
Simple Text file: [google.com...]
What I did was first take my original sitemap and paste it into MS Word. There I used Find & Replace (in as creative a fashion as possible) to remove all extras such as CSS coding and other words and such. (Wildcards came in very very handy).
I then ended up with a list of URLs situated one atop the other, line by line.
I then took a look at the sample XML code for a couple of pages and did the following Find and Replace to my list:
Replaced every instance of http://
with
<url>
<loc>http://
Then did a Search and Replace for .htm
replacing it with
.htm</loc>
<lastmod>2005-06-14</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
From there of course I'd have to go back in and change all those specifications for lastmod, changefreq and priority. But it all turned out to being a hell of alot easier than other methods...unless of course you're code saavy enough to go the other way.
Those online SiteGenerators I found only went up to a maximum number of pages. Thus I was stuck
Hope this helps someone.
Note: I'm not even finished yet here on my end. ;-((
1.Changes to the non HTML like Title tags etc will not register and Google will not spider you!?
2.Unlike MS SQL etc… Access only has the created date automatically generated and you don’t want to use that in case you make changes to the db!
This sitemap stuff is pretty cool idea as my entire site map is only 50k for Google to use.
<!-- #Include virtual="/database-connection.asp" -->
<%Response.Buffer = true
response.ContentType = "text/xml"
response.write "<?xml version='1.0' encoding='UTF-8'?>"
response.write "<urlset xmlns='http://www.google.com/schemas/sitemap/0.84'>"
' List your static URL's
response.write "<url>"
response.write "<loc>http://www.domain.co.uk/</loc>"
'response.write "<lastmod>" & Danger & "</lastmod>"
response.write "<priority>0.5</priority>"
response.write "<changefreq>daily</changefreq>"
response.write "</url>"
response.write "<url>"
response.write "<loc>http://www.domain.co.uk/sub-home</loc>"
'response.write "<lastmod>" & Danger & "</lastmod>"
response.write "<priority>0.5</priority>"
response.write "<changefreq>daily</changefreq>"
response.write "</url>"
' List your dynamic URL's
Dim n
n = 0
strSql = "SELECT *, Table.ID AS [pageID] WHERE Table.Delete<>'Y' ORDER BY Created DESC"
Set db = Server.CreateObject("ADODB.Connection")
Set Rs = Server.CreateObject("ADODB.Recordset")
db.Open strDBConnection
Rs.Open strSql, db
Do While Not Rs.EOF
intID = Rs.Fields("pageID").Value
response.write "<url>"
response.write "<loc>http://www.domain.co.uk/article.asp?ID=" & intID & "</loc>"
'response.write "<lastmod>" & Danger & "</lastmod>"
If n < 10 Then
response.write "<priority>1.0</priority>"
response.write "<changefreq>daily</changefreq>"
ElseIf n < 20 Then
response.write "<priority>0.7</priority>"
response.write "<changefreq>monthly</changefreq>"
Else
response.write "<priority>0.2</priority>"
response.write "<changefreq>yearly</changefreq>"
End If
n = n + 1
response.write "</url>"
Rs.MoveNext
Loop
Rs.Close
db.Close
Set Rs = Nothing
Set db = Nothing
' End Dynamic URL (maybe have another table?)
response.write "</urlset>"
%>
lastmod, priority and changefreq are all optional tags! Its unlikly they will be used but if they are then don’t forget its all relative so do try and have low values for all you’re old pages, I suggest priority for homepage 0.5 old pages 0.2 and new pages 1.0 which you can do dynamically. Same goes for changefreq.
Limitations:
This code obviously does not limit the URL count to 50,000 so keep an eye on that if you have a mega site – maybe add another counter to show number of URL in admin mode. The way it’s coded (non inline) the file size would never get to 10MB for 50,000 url’s and more like 5MB.
Why?
I used sitemap.asp from one of the recommended sites.
Note to anyone not getting it to work:
Make sure you don't reference the sitemap or file in the list of files! I think that's why I initially had problems because it had sitemap.asp as one of the files to parse which would put bot in loop.
What about including image files urls such [mydomain.com...] in your site map?
I think that would help your images search. But I searched google newsgroup and the straigh answer to that question was: Include only document urls.
Really simple answer. Google indexes images based on the text appearing on the page referencing the image. If you pass just an image URL, how will Google know how to categorize it?
I doubt GoogleBot is smart enough to say "naked girl sunning on a beach" based on the image alone. :)
As for sitemaps in general, I'm seeing really good results. I added several pages last night and updated the sitemap to include them. They were picked up by GoogleBot this afternoon. If the pattern holds, they'll show up in the index tomorrow or Saturday.
GoogleBot seems to be paying plenty of attention to some of the features like "lastmod". I'm guessing this will really help GoogleBot be more efficient in crawling a site and not recrawling unchanged content too often.