Forum Moderators: open

Message Too Old, No Replies

New site not being indexed at all

spiders (especially googlebot) are not going beyond index page to spider

         

shakaal

7:09 pm on Nov 19, 2003 (gmt 0)

10+ Year Member



Hello Fellow Forum Visitors,

I will appreciate any insight into my question. I searched the forums but could not find a satisfactory answer.

Problem.
Few months ago we noticed that all pages except index page of our domain xyz.com were no longer indexed by googlebot. The content for xyz had been taken from our main domain zyx.com. So we thought that googlebot was penalizing us for probably having similar content and had therefore excluded the inside pages. We did a redesign of xyz and added lot of new content. The website has now been up for one and a half month. Googlebot still continues to index only the index page. It has been spidering the website very frequently but it refuses to include any of the inside pages other than the index page. We are not able to figure out what the problem is. Does anyone know what exactly is happening? How long will googlebot take to include all the pages of xyz in its database?

We will appreciate any honest attempt at answring these questions.

thank you

DerekH

11:30 pm on Nov 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unless your site already has lots of links from other, popular, pages, this isn't at all unusual.

I have a number of sites on the Internet, and the most recent that I have put up for a local charity went up with only two links from other sites. It has 50 pages, but only the index page got indexed. After two months, one more page has now made it into the index.

The others will arrive in due course, I'm sure, but it's a measure of how many pages Google has to visit, and how many popular paths there are to your own sub-pages, that determines how quickly you'll get indexed.

Three months doesn't seem to be all that long for a site that's not well connected to other sites on the web.
DerekH

Jesse_Smith

1:10 am on Nov 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you have session IDs in the URL? Google hates those.

Are they static, or .cgi/php/asp?blah1=1&blah2=2&blah3=3...? Use mod_rewrite if there not static.

Do you have If-Modified-Since with .shtml files? Those could be keeping Google from crawling the page.

shakaal

2:35 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



thanks Derek and Jesse
Derek you are right, this particular site does not have too many inbound links....infact google reports only 23 for its link popularity...your explanation has really given me some hope and encouragement.

hi Jesse,
The site is completely static with no session ids but regular html pages. Its an information site....so I do not think that would be my problem
thank you again to both of you for your input.

takagi

2:45 pm on Nov 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you use frames? Are the links just HTML or is it Javascript?

How did you search the sub pages? Try:

site:www.xyz.com -qwerty

that should show all the indexed pages of your site not containing the word 'qwerty'.

trillianjedi

2:53 pm on Nov 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



infact google reports only 23 for its link popularity

23 inbounds from >=PR4 pages should be sufficient to get more than just the index page in google.

I'd look at the other suggestions - dynamic URL's, takagi's thoughts on your own internal link structure and poor use of frames.

TJ

shakaal

3:14 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



hi tikanegi
I tried the suggestion that you gave...and only the index page comes up...
Almost all links are html...very few links are in javascript..i would say maybe two or three...
we are not using any frames...so that is not the problem...we did change the internal structure of the website...that has resulted in plenty of dead links at least in alta vista...google I do not have to worry since it indexes only the index page...so far altavista too has not updated its database they have included one or two links and that is it....yahoo..too just indexes the inex page.

lasko

3:36 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



Can you sticky me your index page?

Your link structure may be causing this, either that or you have been very unlucky. I created a site with 2300 pages and was indexed with in a month, every single page.

The most common mistake is the lack of links on your main index page or your links on embedded in a javascript/flash type nav bar.

lasko

3:55 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



Your structure seems to be in order the only small thing i can see in this is your Robots.txt file.

When I call it I receive your error page which does show a 404 however it redirects with in a few seconds.

Google and all search engines are looking for the Robots.txt file before they spider. If no file can befound then Google will index the whole page.

I may be wrong but in your case you don't have a robots.txt file and the redirect maybe causing the problem.

If you can't turn the redirect off I suggest you create an empty robots.txt file so Google finds it properly but reads nothing.

I could be wrong but maybe someone else could add to this more.

Otherwise your site is nice :)

Added
---------------------

After testing more some times your robots.txt file is producing a 200. The file is not a txt file and is redirecting I am almost 90% sure this is the problem

takagi

4:07 pm on Nov 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don' see any problems with the robots.txt (Shakaal sent me the URL). But I did notice the sitemap has the home page as backlink although the site map is not to be found in the SERPs. Same for other sub pages. Trying backlinks for not existing pages (e.g. changing extension from .html into .htm) gives no backlinks. So the backlinks are really found by Google. Other strange thing is that the backlinks for the home page and the backlinks for an almost identical home page (hyphenated domain name) are identical. So it looks the problem is that Google thinks the 2 domains are to be treated as one.

lasko

4:16 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



You may have seen his robots.txt file after he added a few minutes ago.

Before he had no txt file and the page was a redirect.

His txt file looks ok now, but before Google could have a problem with the redirect because their was no page found and didn't produce the normal 404 error.

takagi

4:17 pm on Nov 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The hyphenated domain (like 'www.x-yz.com') is redirected to the normal one (like 'www.xyz.com') with a 301 error:

HTTP/1.1 301 Moved Permanently
Date: Thu, 20 Nov 2003 16:13:47 GMT
Server: Apache/1.3.26 (Unix)
Location: http*//www.xyz.com/
Connection: close
Content-Type: text/html; charset=iso-8859-1
real URL replaced by http*//www.xyz.com/

takagi

4:25 pm on Nov 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You may have seen his robots.txt file after he added a few minutes ago.

Yeah, you're right. That file is only 25 minutes old. So its was made just after you wrote message 9.

[edited by: takagi at 4:25 pm (utc) on Nov. 20, 2003]

lasko

4:25 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



As far as I know the 301 direct from the hyphenated domain is the correct way, and Google should discard the old site and re-index the new site.

It could be that Google is getting very confused, by the old site redirecting with a 301 to a new site then being redirected again when it tries to find the robots.txt file.

Two redirects usually sets the alarm bells ringing.

Just another thought :)

shakaal

5:11 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



hello guys..
when I type in www.x-yz.com...there is a seamless transfer to www.xyz.com...i do not see a 301 redirect...explicitly mentioned unless its being done in the background...
like I mentioned...earlier..google has an index for www.x-yz.com which we hardly use..and is full of old dead links..what beats me is that if it treats the two domains as one (which they are anyways) than why is it not updating its database for www.x-yz.com...any pointers..and any suggestions?

lasko

5:46 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



You are redirecting correctly i presume from x-zy to xzy with 301 which we see in the server headers, and now that the robots.txt file is not redirecting the site xyz should get indexed and x-yz will get completely deleated.

I can't see anyother reasons for you not to get indexed, i think only GoogleGuy can tell us if we are right or wrong and at them moment he's a little busy pursading people not to jump out of the windows :)

We can only try this and be patient if nothing gets added with in the next 30 days try something else.

shakaal

6:13 pm on Nov 20, 2003 (gmt 0)

10+ Year Member



thank you lasko
I am keeping my fingers crossed..hopefully we will get indexed now that we have the robots.txt file