Google visit index page and stops

Forum Moderators: open

Message Too Old, No Replies

Google visit index page and stops

max363

4:22 am on Aug 30, 2002 (gmt 0)

Googlebot visits my site and asks for the homepage and robots.txt but doesn't go farther than that, the first time googlebot visited was about 2 months ago and my site hasn't been listed, anyone know what could be wrong? If googlebot doesn't deep crawl does that mean googlebot doesn't like my site?

Brett_Tabke

4:29 am on Aug 30, 2002 (gmt 0)

Welcome to the board.

Get some inbound links. Google likes to crawl sites that have inbound links. There is no garantee that it will index and entire site that doesn't have links (pr value).

dcheney

4:30 am on Aug 30, 2002 (gmt 0)

Make sure that you don't have any problems on your site (i.e., validate the code) and make sure that the links off the main page are regular html (not java, javascript, flash, etc.).

Other than than, just wait. Eventually googlebot well settle in and grab it all.

Woz

4:36 am on Aug 30, 2002 (gmt 0)

Hi max363, Welcome to WebmasterWorld.

Check out Paynt's Getting Started [webmasterworld.com] thread for some more details about finding information here.

Brett is right about getting some incoming links as they will certianly help.

I would also question if it is only Google that has the challenge with deep spidering, or whether you have noticed other spiders sniffing but not coming in. If the latter, then you might want to double check your robots.txt file. First, make sure you have one (I assume you do) and second make sure the syntax is correct. You may think you arfe inviting spiders, but you may be inadvertantly sending them away by mistake.

Use the Search link at the top of the page and search for robots.txt and you should get some good clues on how to fine tune your robots file.

Onya
Woz

max363

5:12 am on Aug 30, 2002 (gmt 0)

I currently have only a couple of inbound links and most of my pages are php and a few are html.

The fast crawler also visits my sites and does that same thing as googlebot.

I currently don't have a robots.txt, will creating one help my changes of being indexed?

Beachboy

5:13 am on Aug 30, 2002 (gmt 0)

No.

Woz

5:31 am on Aug 30, 2002 (gmt 0)

>I currently don't have a robots.txt, will creating one help my chances of being indexed?

Theoretically no, but I believe there was a time an engine failed to index some sites because they didn't have a robots.txt. Apparently a programming error on the part of the engine. However, I like to be sure these days and include on even if it is a standard "let every one in" file.

The other advantage of a robots file is being able to restrict parts of the site you don't want indexed. This may not be an issue at the moment, but it may be later on. Now is as good a time as any to start learning about implementing your robots file.

Onya
Woz

hutchins13

5:45 am on Aug 30, 2002 (gmt 0)

What do the links from the main page look like? Do they contain a user ID? If so, that could be the problem. It kept Google from moving through one of my sites (uses coldfusion) until the user ID was removed from the URL.

max363

6:22 am on Aug 30, 2002 (gmt 0)

all my links look like this: <a href="example.php">link</a>

max363

8:08 pm on Aug 30, 2002 (gmt 0)

looking through my logs, it looks like googlebot came this morning and actually went through some of my pages :)

Most of my pages have includes (example: <?php include ("menu.html"); ?>, would this have any affect to search engine crawlers?

SebastianX

10:59 pm on Aug 30, 2002 (gmt 0)

> Most of my pages have includes (example: <?php include ("menu.html"); ?>, would this have any affect to search engine crawlers?

No. The server will deliver the same content to the spider and the browser (when you don't cloak).

If you worry you can add this line to your root's .htaccess file
AddType application/x-httpd-php .htm
and *.htm files will be parsed for PHP. Just use *.htm instead of *.php.

colemanator

12:36 am on Sep 1, 2002 (gmt 0)

I had a similar problem about 2 weeks ago, visited the forum, made many adjustments and now I am being crawled just fine. I did add a robots.txt, not sure if it made a difference. I did notice that URLs with an undefined argument seemed to stump the spider.

EX: widgets.com/widgets.asp?type=&

They didnt like to go any further than that. Changed them to following:

EX: widgets.com/widgets.asp?type=black

Does anyone know if the size of a file prevents a crawl?

bcc1234

4:08 am on Sep 1, 2002 (gmt 0)

Check your links with something like xenu, cause I just published a site 3 days ago and it only has 2 links from other sites with pr 2 or 3.
And I already had about 300 pages crawled.

The domain is new, there is no robots.txt file.
I did not submit the site to google, just let it re-spider the pages with links and find the new site.