Forum Moderators: open

Message Too Old, No Replies

Has lack of a robots.txt confused Googlebot?

site not penalized, but not included in index.

         

dragonlady7

1:39 pm on Jul 23, 2003 (gmt 0)

10+ Year Member



OK, I hate to add to the white noise in this forum, but I have been engaged in debate on another forum about a truly bizarre situation, and I think it's something worth debating on here where there's a lot more knowledge floating around. A member of this other forum has had a website indexed in Google since 1997 for a completely non-competitive keyphrase (his strange hobby), and has dozens of high-quality backlinks and a PR of about 5.
This past month, he's checked and actually typing in his domain name doesn't yield his site as a result. However, his backlinks all show up as links to his domain, and if he looks at his site the toolbar shows his PR intact. It's very weird and nobody over there, including me, has ever seen anything like it. Basically, everything around and pertaining to his site is there, but the pages actually in his site aren't returning as results. (No adult filter would be triggered, either; it's an utterly innocent and innocent-sounding hobby.)
He has never used anything remotely sketchy to optimize his site. The only thing he changed was that he switched his entire site from HTML to XHTML. It validates in every validator to be found on the Web-- XHTML 1.0 Strict, I think. I checked it myself and there's really nothing wrong with it at all-- it's more than correct.
Another person on that board insists that it's his XHTML-- Google isn't indexing it. I countered that there was no way that was true-- Google indexes HTML, XHTML, Word, PowerPoint, PDF, TXT... All kinds of formats. And XHTML was designed to be backwards-compatible in older browsers, besides. So that's simply not the answer, right?
A possibility I've thought of, though, for a site being excluded but not penalized could be this:
Robots.txt.
Last month, I noticed Googlebot was coming to my site, asking for my robots.txt, and not getting it because I didn't have one. It would then repeat the request over and over again, up to maybe a dozen times. And since it never got robots.txt, it would leave.
So, I uploaded a blank robots.txt and all was well. I'm indexed, i'm in, everyone's happy.
(Since then I've gone back and added some lines to the robots.txt so it's actually functional, but that hasn't really changed anything except protecting my testing directory from embarrassingly getting spidered. It was careless of me not to protect it before.)

Could that possibly be the problem? I'm not sure if the site owner in question has a robots.txt, but I am sure it's not his XHTML that's made his site go missing. (I'll post back if he did have one so that's not the answer. I think that's the solution. It's my hunch.)

Anyhow, thanks for any suggestions and any knowledge you can impart.

rainborick

4:09 pm on Jul 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



When a site for all the world looks well-positioned in Google except for the fact that it won't show up in a search result, you have to suspect its being penalized. And since this is a recent occurance, I'd check two things that I believe Google is focusing on with their recent changes - invisible text and bad neighborhood links. So, I'd go over the HTML by hand to make sure he hasn't accidentally made some text invisible. I've seen people who have used WYSIWYG page editors to re-edit a page so often that the multiple contradictory <font> tags are so nested and intertwined that its a miracle any of the page gets rendered. Then check the PageRank of any site he's linking to, especially the main page of any web ring or other similar co-operative linking scheme. I wouldn't be surprised if one of these two things is at the root of your friend's problems with Google. Good luck!

BigDave

4:37 pm on Jul 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are there custom 404 pages on these sites?

If there are, do they return the 404 status that they are supposed to, or do they redirect and return a 200.

If the custom 404 page returns a 200, then google will try in interpret the page returned and won't be able to. It will then decide to assume that you do not want to be crawlled.

By putting up a blank robots.txt you will no longer return the defective 404 page.

dragonlady7

4:43 pm on Jul 23, 2003 (gmt 0)

10+ Year Member



It turns out that he did, indeed, have a robots.txt up, so that theory's been shot down.
The page's code is definitely squeaky-clean. Just totally redone.

Which leaves the question of links. I noticed he's in a web ring. I never understood how those could be bad; before search engines, those are how I got around. But I'll tell him to look into that. It seems the most likely.

So the toolbar still isn't sorted out then? Because his page was showing up as PR 5 but not showing in Google searches. Most odd.

Well, thanks for help anyway. :D

dragonlady7

5:07 pm on Jul 23, 2003 (gmt 0)

10+ Year Member



OK, upon further consultation we have inspected the webrings he belongs to.
They're cute webrings-- for people sharing his odd style of hobby, etc., and one is The Original WebRing, the first one ever. He is very fond of these webrings and is distressed to think that linking to them could incur a penalty.
Does it seem likely that the webrings are it? I don't understand how webrings are bad things-- they're a collection of people with similar websites. Why not link to it?
How can we tell where the problem is?

rainborick

5:44 pm on Jul 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'd check the main page of the web ring master site. If the Toolbar PageRank shows 0 or graybar, I'd suggest that your friend stop linking to it.

ogletree

6:05 pm on Jul 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What comes up if you type in site:domain.com -asdf? If you see old results put 301's on those. If you just get the domain then you will just have to wait for Google to spider him. Has googlebot ever spidered the new files? There has been a lot of people just falling off. Google just may be acting weird. I agree I don't think it is right for Google to penalize for legit webrings. I think they are spending to much time trying to hurt SEO and not enough time on making results better where there is no SEO. There are a lot of keywords out there that bring up worthless results.

dragonlady7

6:18 pm on Jul 23, 2003 (gmt 0)

10+ Year Member



But the original site in question that may or may not be penalized but either way is excluded has a toolbar PR of 5... is the toolbar working or not or is it unknown?

Hmm... if that's not it, someone else has pointed out that his doctype declaration has a linebreak in it and perhaps that's confused Googlebot.

Man, I do not know. These things are why webmasters who really really know what they're doing make the big bucks. There is too much crap that's essential for machines and nearly invisible to humans. Blechh, I need a nap!

ogletree

6:36 pm on Jul 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



PR won't matter if there are no entries in google. I type in site:domain.com -adsf every day to see how many entries I have in Google. It changes all the time even if I don't make any changes. Google is very weird. I have pages that are not indexed but the page right above it and below it on my sitemap are indexed. Google is random sometimes.

HitProf

7:01 pm on Jul 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi dragonlady,

I hope the problem is solved with fixing the doctype. If not:

Do you have acces to the log files? Does Googlebot try to index pages or not? Are the links plain hrefs?

Did the file names change while redoing the site? Perhaps the old pages are now a 404 and the now not yet indexed.

There may be a problem if the old pages are redrected tot a custom 404 page with a 301 / 200. You can check this with the server header checker:
[searchengineworld.com...]
(type in one of the old file names that no longer exist).

dragonlady7

7:06 pm on Jul 23, 2003 (gmt 0)

10+ Year Member



Thanks! Very helpful. We'll have to give your suggestions a shot if the doctype thing doesn't work out. Not like anyone will know anytime soon... but I'll tell him to keep an eye on the logs and let us all know. :D

tschild

7:24 pm on Jul 23, 2003 (gmt 0)

10+ Year Member



One other thing that might have gone wrong: I heard of a site in another forum a few months ago which also had suddently dropped out of Google. What we finally found out was that the web hoster had, to stop people downloading sites' content with wget or similar, barred user-agents that did not have "Mozilla" or "MSIE" in their user-agent string. That made the site display fine to browsers but excluded search engine spiders...

So it might also be a good idea to check if the site's pages can be retrieved with Wget, with the user-agent string set to 'Googlebot/2.1 (+http://www.googlebot.com/bot.html)'

dragonlady7

7:57 pm on Jul 23, 2003 (gmt 0)

10+ Year Member



>That made the site display fine to browsers but excluded search engine spiders...

Yo! That's a bad one. Wow. A true "d'oh" moment.
I'll pass that one on...