Forum Moderators: open

Message Too Old, No Replies

Google loves my RSS files

But not my site

         

wackybrit

7:43 pm on Sep 16, 2004 (gmt 0)

10+ Year Member



Is Google developing an RSS only crawler? Here's why I have my suspicions..

I've been enjoying the discussion about the sandbox lately, and my new site is having fun there (only just over a month online, and only a few days since I started updating it daily). Unfortunately, despite having several links for a little while now (a couple for about 6-8 weeks, several more in the last 2 weeks), GoogleBot hasn't turned up on any of my HTML pages.

However (and this is where it gets weird) it HAS been turning up to read three files.. robots.txt, index.rdf, and index.xml. The latter two files are RSS feed files (a syndication format used by weblogs and news sites). I can't work out how it knows about these files (which do exist) but then doesn't bother to look at ANY of my HTML files (not even index.html). There's nothing special going on on the site technically, and it's getting those other files fine.

My logs date back a month, and it turned up on September 15th, read those three files, then did it again at almost the same time today. Always just robots.txt, index.rdf, and index.xml.

A bug in the GoogleBot.. or is it just getting lazy with all the new sites? Or maybe they're making an RSS only crawler? From my reading in the Sandbox thread, it seems most Sandbox sites are at least getting their pages crawled, even if they don't get good SERPs. Is anyone else experiencing this weird behavior? I haven't had it happen on any of my sites in the past.

encyclo

2:40 pm on Sep 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Allowing Googlebot to index your RSS files is not doing you any good at all - you really need to block the index.rdf and index.xml files in the robots.txt.

But before you do that, the real question is: are you trying any server-side sniffing to deliver the RSS stuff by default to certain user agents instead of your usual index.html? Are you sure that index.html is properly defined as the DirectoryIndex file? Where are your backlinks pointing to? If they are only to the RSS files, that would explain things too.

oshatz

2:57 pm on Sep 18, 2004 (gmt 0)

10+ Year Member



Maybe you should also check if your robots.txt file is bad. Maybe you should try to remove it for a while, see if it helps.

encyclo

3:12 pm on Sep 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might also want to try the robots.txt validator [searchengineworld.com] to check the syntax if you're worried that your robots.txt file is causing problems.

wackybrit

5:36 pm on Sep 18, 2004 (gmt 0)

10+ Year Member



I have no robots.txt file at all. Unlike some of my sites, the site in question is very plain, just a bunch of .html pages, backlinks all point to the front page, and the DirectoryIndex defaults to index.html first.

There is only one thing that is.. 'unusual' about my domain.
I looked it up in archive.org, and it used to have some sort of Korean Web site on it. I did a Google Groups search and it was never involved in spamming or the like.. but I'm wondering if this has caused GoogleBot to 'ignore' it for a while. However, if this were the case, I can't see why it's only going to my RSS files..

I shall continue investigations! If I get nowhere, I'll have to send Google a nice e-mail, as this seems to be very unorthodox behavior from GB :-) The site in question has no commercial use or aim, but is still something I want to get indexed to help people searching for information on the topic.. so perhaps they will be nice :) Thanks for your ideas so far.

designaweb

5:41 pm on Sep 18, 2004 (gmt 0)

10+ Year Member



It may sound a bit silly, but have you considered submitting your site manually into google? Worked for me with a certain site I had...

wackybrit

5:58 pm on Sep 18, 2004 (gmt 0)

10+ Year Member



Actually, no, but I will give that a try now! I've got several sites with good SERPs into Google and have never had to submit manually, just relying on backlinks.. but, hey, it's worth a try :-)