homepage Welcome to WebmasterWorld Guest from 54.161.192.130
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Googlebot Visits But No Google Listing
Three months of crawling, yet nothing in site: search.
DanB

10+ Year Member



 
Msg#: 27894 posted 5:46 pm on Feb 7, 2005 (gmt 0)

My site's been up since around Sept 2004. Googlebot has come by since Nov and indexed the pages but when I do a site:mydomain it says no pages to display. Last GB visit was yesterday (2/07/05).

I haven't found any info that my domain was blacklisted and there's no entry for it in archive.org.

How can I figure out what's going on?

 

lorax

WebmasterWorld Administrator lorax us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27894 posted 2:31 pm on Feb 9, 2005 (gmt 0)

Hello DanB,
Welcome to WebmasterWorld!

There could be any number of issues related to why your site hasn't been indexed.

You could be in the sandbox [google.com], or you could have some error or be explicity preventing googlebot from indexing your site in your robots.txt or htaccess file, or perhaps a noindex tag in the meta tags?

DanB

10+ Year Member



 
Msg#: 27894 posted 2:56 pm on Feb 9, 2005 (gmt 0)

I don't think it's a sandbox. From what I've read, sandboxing doesn't prevent site:domain.com from returning indexed pages, it prevents keyword searches from returning pages.

I've looked at the pages with a spider simulator and I'm not seeing anything that would prevent the spider from indexing the page, and looking at the log files it's apparent that the spider is picking up the internal links and following them.

Here's the meta snippet:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<META HTTP-EQUIV="EXPIRES" CONTENT="0">
<META NAME="RESOURCE-TYPE" CONTENT="DOCUMENT">
<META NAME="DISTRIBUTION" CONTENT="GLOBAL">
<META NAME="ROBOTS" CONTENT="INDEX, FOLLOW">
<META NAME="REVISIT-AFTER" CONTENT="30 DAYS">

Here's my robots.txt:
User-Agent: *
Disallow: amateur_in.jsp
Disallow: /amateurs/images
Disallow: /amateurs/objects
Disallow: /amateurs/site
Disallow: clickin.jsp
Disallow: clickout.jsp
Disallow: clickthru.jsp
Disallow: errorpage.jsp
Disallow: /images
Disallow: /objects
Disallow: /outclicks
Disallow: /temp
Disallow: /templates
Disallow: /users
Disallow: /scripts
Disallow: /bling

Shurik

10+ Year Member



 
Msg#: 27894 posted 3:46 pm on Feb 9, 2005 (gmt 0)

DanB, I have the same problem. My robots.txt file looks awfully similar to yours. Lots of daily spidering not a single page in the index.

Sanenet

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27894 posted 4:26 pm on Feb 9, 2005 (gmt 0)

Don't know if it makes any difference (depends on your file structure), but try putting / after the directory names.

"Disallow: /help" disallows both /help.html and /help/index.html, whereas "Disallow: /help/" would disallow /help/index.html but allow /help.html.

Also, in theory the attribute "follow, index" will be ignored as you cannot allow indexing in robots (either metatag or .txt file), only disallow.

DanB

10+ Year Member



 
Msg#: 27894 posted 5:35 pm on Feb 9, 2005 (gmt 0)

re: index,follow

I've read some spiders default to noindex, so it's wise to have the tag there.

Sanenet

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27894 posted 5:44 pm on Feb 9, 2005 (gmt 0)

Never heard of anything like that. The standard for robots is "Allow" unless otherwise specified.

lammert

WebmasterWorld Senior Member lammert us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 27894 posted 5:54 pm on Feb 9, 2005 (gmt 0)

The Inktomi Slurp bot didn't have "index,follow" as the default in the ancient times, but that is solved now. There is a thread about all the Search engine metas at [webmasterworld.com...]

Using these meta tags doesn't hurt the page, but it also doesn't help much. My guess is that there are other reasons that your site does not appear in the SERPs. Maybe not enough incomming links, or you have content that is a duplicate of someting already present in the index.

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27894 posted 7:13 pm on Feb 9, 2005 (gmt 0)

Duplicate content, as with the the sandbox, shouldn't be an indexing issue, just ranking.
Try any of these.
1. Double check to make sure you don't have something obvious like an open <head> tag.
2. Run your site trough a spider simulator.
3. Visit your site with a googlebot ua and see what that brings up.
4. Write to webmaster at google.com with the subject line of "reinclusion request".

DanB

10+ Year Member



 
Msg#: 27894 posted 7:50 pm on Feb 9, 2005 (gmt 0)

I don't see any HTML errors that would prevent indexing (using a HTML editor with rules checking turned on).

I've run the site through various simulators, like PoodlePredictor, with no problems.

Not sure how to do a user agent that mimics gb. Do some browsers let you change the UA they send in the headers?

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27894 posted 8:08 pm on Feb 9, 2005 (gmt 0)

I sent you a sticky about a user agent spoofing tool.

DanB

10+ Year Member



 
Msg#: 27894 posted 8:35 pm on Feb 9, 2005 (gmt 0)

Found a User Agent switcher plug-in for Firefox. No difference when viewing the site as googlebot.

Also checked site with Lynx browser and things look fine as well.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27894 posted 8:43 pm on Feb 9, 2005 (gmt 0)

You can safely dump these tags:

<META NAME="RESOURCE-TYPE" CONTENT="DOCUMENT">
<META NAME="DISTRIBUTION" CONTENT="GLOBAL">
<META NAME="ROBOTS" CONTENT="INDEX, FOLLOW">
<META NAME="REVISIT-AFTER" CONTENT="30 DAYS">

Additionally, make your site future proof by using lower case for the HTML code. A simple site-wide search and replace for each type of tag, one type of tag at a time, is fairly quick and easy to do.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved