Forum Moderators: open
Please help!
[edited by: Marcia at 4:14 pm (utc) on Dec. 12, 2002]
[edit reason] Please see forum charter [/edit]
Except, maybe?
The searchengineworld robots.txt validator results says everything is okay and shows your robot.txt file as two lines:
1 User-agent: *
2 Disallow:
Notepad also shows it as two lines.
When I access the robots.txt files through the browser they look like this:
User-agent: * Disallow:
Noticed that Alta Vista also only indexed the home page of each and isn't going any deeper.
Can't see any other reason other than the robots.txt anomaly but it's going to take someone with more savvy on that than me to say that that's the problem.
Do your logs show other spiders crawling deeper?
I don't think it's the robots.txt file that is creating the problem. The sites were having problems even when I didn't use the file. Actually, I only started using the file after my post in September and this is what I was advised at that time. I didn't believe that it would solve the problem, but I figured it was worth trying.
Actually, the file is configured correctly to allow robots to index the sites.
Correct me if I'm wrong, but I believe that the formatting of the robots.txt is incorrect.
The robots.txt in question is written on a single line.
The specifications say that it should be written on two lines.
I use a large and complex robots.txt, and this method works well for me.
There was a report of a french(?) search engine that followed the robot exclusion standard to the letter, and required a blank line at the end of robots.txt, but Google doesn't require it, and will even accept a blank file for robots.txt if you want to try that.
Jim
Knock the robots meta tag out:
<META NAME="ROBOTS" CONTENT="All, Index, Follow">
It's incorrect syntax. The robots meta content accepts either one or two values, not three.
"All" or "None" are simply shorthand:
All equals "index,follow"
None equals "noindex,nofollow"
<meta name="robots" content="all">
<meta name="robots" content="none">
Think that's it,
Jim
The <META NAME="ROBOTS" CONTENT="All, Index, Follow"> I added this about a month or so ago and only tried it out of desparation. I noticed in on a site which is listed on Google with no problems. I don't think this line of code is causing the problem. I also don't believe the robots.txt file is causing the problem. There has to be something else.
Obviously, the symptom is that the search engines are pulling a blank for your url, like it doesn't exist. Exactly the type of response you would expect if you were banning the bots from your web site.
What this possibly points to is that either the robots meta is getting in the way, or the robots.txt is doing it. I would advise you to ditch the both of them, and wait two months for a new spidering and indexing cycle.
You must select File->Save As->Other encoding->US ASCII + End Lines with: LF only to get this option. I'm not sure if the option is available in all versions of Word, though.
What is the history of your domain names? Is it possible that they were previously-owned and therefore could have been blacklisted for past offenses before you got them?
Jim
Are there any 404s in your logs prior to you adding your robots.txt?
If so, it could mean that the SE can't understand your robots.txt
Go to [textpad.com...] and download this shareware editor. Find out what type of server the robots.txt file is on (NT, MAC, or UNIX). Open the file in textpad and push F12 (save as) It will give you the choice to save the file in either PC MAC or UNIX format. Save it and upload it and cross your fingers.
Nick.
It looks like the two sites you stickied to me and all the rest of the "Networked" sites that are interlinked -- as well as the host -- are hosted on the same server. It might be possible that you finally hit "spider overload" and they refuse to index any more interlinked sites on that server.
You might take a closer look at all of that linking before any more of your sites get hurt. There's a few discussions of interlinking among sites on the same server on this board. My own observation is that corporate whales can get away with it while us small fry have to be very, very careful.
Other than that this is still very perplexing.
Jim
Could be:
- Source code problem (though I checked it)
- A previous owner of the URL could have gotten the site blacklisted (However I don't think there was a previous owner)
- Act of God
I really don't know. :(
When checking your 2 domains, it appears that none of them have a valid DOCTYPE declaration or character encoding labeling... I think this could be it as the W3C validator gives a fatal error on the first one and complains about the character encoding for the second.
Greetings from France, although I'm not a frenchie ;-)
Dan