Forum Moderators: open

Message Too Old, No Replies

Google not spidering deep into site

         

uci_bink

7:19 pm on Apr 4, 2003 (gmt 0)

10+ Year Member



Hey everyone, sorry if this is a dumb question but I have been going over the forums and have asked a couple other people and have not gotten an answer....its time for some of your expert help :)

I have a site that google knows about and the bot visits often but it does not go through my internal site. I have created a site map, a robots.txt and everything I can think of to get google to go through everything.....this is upwards of 10,000 pages that google is not getting too! Please help me! :)

The only problems I can think of is that my site is dynamic using PHP and some variables in the URL but not session id no ID= or anything that I read can mess up search engines...

Will google not go deep into pages if PR is low? (I just started my site a couple months ago)

It is weird though because even some links on my home page arent getting followed.

So I was hoping someone would take the time and check out my site and let me know if they could see what was wrong.

I would really appreciate it!

Kevin

deft_spyder

7:27 pm on Apr 4, 2003 (gmt 0)

10+ Year Member



im not going to be able to answer this question well, but the helpful people here that will answer it are going to need to see the string to really give you a good answer.... so post a www.widgets.com/page.php=345.4i5hg345=345?variablesexample123 for us to see what you've got going.

[edited by: deft_spyder at 8:00 pm (utc) on April 4, 2003]

uci_bink

7:53 pm on Apr 4, 2003 (gmt 0)

10+ Year Member



[widgets.com...]

Those are the links for my site map....one fore each state which then links to a listing of stuff in that state.

However google does not even follow this link from my site map.

Hope that info helps a little!

garylo

12:46 pm on Apr 5, 2003 (gmt 0)

10+ Year Member



Make sure your site map does not contain more than 100 links. Googlebot probably spider up to 100 links per page.

nirelan2

4:04 pm on Apr 5, 2003 (gmt 0)



Alltheweb searches all the way through a site! Maybe its just time to forget about Google until it cleans up its act.

MetropolisRobot

4:34 pm on Apr 5, 2003 (gmt 0)

10+ Year Member



Sometimes if your site is slow and the response is not generated in enough time, a googlebot that was attempting to get into certain pages will timeout and that's the end of that googlebot's run on your site it would seem (from my experience).

In the early days I had a java/jsp site and my host had issues with more than X pages being produced from the JVM/server in a short space of time and I saw the same behavior.

One way around this (apart from changing host which I did) is to improve the links between your pages so that Google has different ways to reach parts of the site. This improves your site's resilience to failure in any one dynamic page, which is always a possibility no matter what language and methdology you use.

garylo

4:41 pm on Apr 5, 2003 (gmt 0)

10+ Year Member



Alltheweb searches all the way through a site! Maybe its just time to forget about Google until it cleans up its act.

nirelan2, looks like you have a full stomach against Google. Let go man.

nirelan2

5:02 pm on Apr 5, 2003 (gmt 0)



Hey if its ok for you to critisize MS why cant I complain about Google's wrongdoings? When a company spams in addition to delisting only some link farms (PIZZA HUTT),and harasses bloggers complaing is justified.

ronin

5:26 pm on Apr 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, Google and that damn Saddam bin Laden, they're probably in it together.

Curiously I find the same thing - there are links on my front page which don't seem to get spidered. I have no idea why not. Only thing I can suggest is making sure each of your pages is linked to by at least a couple of your other pages... most importantly this makes it easier for the user.

We are for users, right? Not for the robots...

nirelan2

5:28 pm on Apr 5, 2003 (gmt 0)



Why edit your page to get google to index it when other search engines can index it correctly? Its time for Google to get lost.

uci_bink

9:13 pm on Apr 5, 2003 (gmt 0)

10+ Year Member



Thanks for the replies everyone...

My page has under 100 links on it and I think it is generated by the server relativly quickly.

My links look like this:

[widgets.com...]

And here are a couple of lines from my page source:

<html>
<head>
<META NAME="keywords" CONTENT="widgets">
<META NAME="description" CONTENT="widgets">
<META NAME="ROBOTS" CONTENT="ALL">
<META NAME="revisit-after" CONTENT="31 days">
<title>Site Map : widgets.com</title>
<LINK REL=STYLESHEET TYPE="text/css" HREF="main.css" TITLE="Main css">

<body>
<center>
<a href="http://www.widgets.com"><img src="logo.jpg" width="350" height="172" alt="" border="0"></a>
</center>
<div class="content"><br>
Some content<br><br>
</div>
<div class="map">
<strong>Courses:</strong><br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.widgets.com/index.php?show=demo&country=USA"><strong>United States:</strong></a><br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.widgets.com/index.php?show=demo&state=AL">Alabama</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.widgets.com/index.php?show=demo&state=AK">Alaska</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.widgets.com/index.php?show=demo&state=AZ">Arizona</a><br>

_____________________________________________________________

Google gets to this page fine but then it does not continue on with the links...also google doesnt even spider some of the links on my main page even though it gets to other links on my index just fine!

Any further ideas or help would be greatly appreciated!
Thanks,
Kevin

johnsmith2003

9:21 pm on Apr 5, 2003 (gmt 0)



Google crawls are weird. I happen to know how googlebot works, but I won't go into details.

ikbenhet1

9:24 pm on Apr 5, 2003 (gmt 0)

10+ Year Member



You might wanna put </HEAD> before <BODY> in there.

<added>what mvadic said, but make sure you put the </HEAD> tag in there because your html structure is not correct. try a search for "webmasterworld validator" on google</added>

[edited by: ikbenhet1 at 9:32 pm (utc) on April 5, 2003]

mcavic

9:27 pm on Apr 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd suggest getting some sites to link to you, get listed in Yahoo and DMOZ, and give it another 2 or 3 months.

If you search site:www.debian.org debian, it'll come up with
43,600 pages. So obviously, Google goes deep if it likes the site. :)

uci_bink

10:04 pm on Apr 5, 2003 (gmt 0)

10+ Year Member



Thanks a lot for the heads up on the </HEAD> tag!

Just missed it I guess....I will take a look at the HTML validator and check that out...but like others have suggested maybe it is just a matter of time.

Hopefully after the next update my PR will go up a little bit and then maybe the bot will have a little more love for me and crawl the site a little deeper, it just doesnt make much sense for it to only hit pages linked from the home page and go no further than that.

Well...we will see when update begins, hopefully the deep crawler will go through everything next time. If not then you will see me on here asking a bunch more questions :)

Thanks a lot!
Kevin