Forum Moderators: open
But I canīt get my new documents crawled very often... in fact some html are uploaded since a week and google doensīt seem to access those resources at least in last 2 weeks...
Will it be?
May someone explain about the google bot behaviour? Visits, crawls, deep...
Use absolute links rather than relative.
Get links from external sources to the afore mentioned pages.
Make sure your pages are linked from more than one page on your site, sitemap helps a bit.
Doing the above might help a bit. Sometimes its a waiting game, but increasing your links will definately increase your chances.
Validate = yes it does
8 months, 3 months listed in google...
documents in root and 1 folder deep.. linked with standard html,
links using absolute paths from index to internal pages, and viceversa.
some backward links to home recognized by google up to 7
Robots.txt
Also silverbytes, you don't need to use a robots.txt file to allow all bots. That's the default setting. If you're only using the robots.txt file to allow bots, then remove it.
Is this true or just superstition?
Absolute links tell the bot exactly where on your site a file is located whereas relative links kind of push it along which makes it easier for the bot to get lost.
I can only see this being true if:
a) The spider somehow "forgets" which page it is on.
b) The link is a 404 anyway.
I can't really see it being likely that absolute/relative URLs affect spidering at all unless the spider is buggy.
Maybe MSNs new bot? ;)
I just wonder if waiting a month will make my site to be deep crawled because some frequency of google bot or if I'm having a problem really.
Is that myth or not: googlebot crawls your site in deep once a month, and several times or daily does a light scan.
May someone explain?
It's only been up and going for about 3 weeks, but already has about 15 pages listed in their search engine, and I have about 400 that should be spidered. Each of with have their own meta description and unique title.
Does the deep bot come on a "normal schedule" or is it different for every site.
Also I do not have a robots.txt, but i do have the meta tag on all my pages meta robots=all. Should I change this?
p.s. most of those pages are dynamic pages... ex:
?whatever=whatever after the page file... will this effect the bot in anyway, all of the pages link back to other pages, and every "content page" links back to the main page.
I have a jokes page that has for the jokes a <previous next> link for the jokes... do you think this would confuse the bot?
My index page is being crawled every 1-2 days. I know because every day, I hard code a line that says "Site updated on mm/dd/yy). Then, I simply look at Google's cache for the last bot visit. However, pages one and two layers deep have dates that goes back several weeks, maybe longer.
I thought that Google did away with the deep crawl and only makes adjustments based upon periodic fresh bots. Or, is it only the monthly "dance" that has disappeared, but deep crawls still occur on weekly or monthly basis? If so, how often does Google run the deep crawl? Perhaps only my deep-layered pages are getting updated by the deep crawl only.
Are you absolutely sure about this?
I've always used the robots.txt file to allow everything so the bot can a) see that the file exists, and b) it knows for sure that it's allowed to crawl (ie. I haven't just forgotton to put one up).
If it should be removed, then why?
I agree, bots should be able to find both of them.
Also, and I was hesitant to say this before because I whole-heartedly disagree, but I heard that absolute links might be treated as external links by some bots because the bot actually calls your domain again from the outside, whereas with relative links the bot never leaves your site. And we all know external links are good. But like I said, I have no reason to believe such a statement, but what the hell, I use external links just in case ;) What's it hurt?
a) see that the file exists
b) it knows for sure that it's allowed to crawl
If it should be removed, then why?
googlebot has switched to a rolling update... its nothing like it was a few months ago... my site gets hit every day on many various pages... i see no pattern to what the bot requests, when it requests, or what triggers it to drop by... i've been updating existing pages and they are in the index and cache within days...
Dave
On the subject of Absolute vs Relative links, the question for me is 'why wouldn't you use Absolute links'?
Absolute links that start with http:// use up an http socket on your server when accessed. Relative links and absolute links without http:// do not. Unless the server gets a decent volume of traffic, the difference is probably not very noticable. There's only so many sockets on it though. When they are all used up at the exact same time, the next requester will either get an error message (like a 404, 500, etc) or experience slowness until one is freed up. Those who use overloaded budget hosts servers often experience this condition and they'll see their pages hang and/or time out.
Those who opt to use relative links really ought to use a spidering software to make sure the bot will be feed the correct page and not a 404.
A long time ago (1996?) I read that absolute links take longer to load than relative links
I guess that would be true as more text = larger page. However, I would like to think that most (buyers) now have better connections. As my site is mainly business software and the vast mojority of sales occur during US work hours I suspect most have a high speed connection.
They're more time consuming, especially for bigger sites.
One thing I never do anymore is type links. I always copy/paste (from a link that I works)so this doesn't apply in my case.
BlueSky, your comments are of great interest to me. I would like to think that my host isn't a "Budget host" as it costs me plenty! How would one find out?
Dave
Some hosts will oversell their servers counting on the fact that most sites will stay very tiny and only use a small fraction of bandwidth as well as other services they purchased. These hosts will often put 400, 500, or more domains per server. I know of one that puts in excess of 2,000. What usually happens is a few sites will start to significantly grow at the same time. That's when everyone on the server will see the problem I previously described until they're moved off. Then it will stabilize until the next wave starts growing. Although the number of domains may give an indication of oversoldness, it really boils down to the amount of traffic that is hitting the server at the same time. You can have 400 low traffic sites and never run out of http sockets or have five busy sites or even one and regularly run out.
If your server has been running fine and you don't see periodic page timeouts and/or errors, then I don't think you should worry about it. When/if your site starts monopolizing the sockets, your host will come knocking then and say fix your site.
How long before new pages show up in Google?
[webmasterworld.com...]
/claus