Forum Moderators: open
It appears that they're only going about 250 charactors deep into a page to gather news content but why isn't showing up daily, or hourly like Reuters or the Chicago Tribune? I have not gotten a second answer as to whether these are two seperate spiders or the same one. It seemed odd to accept us for their News section and then stop taking our content. My web pages show up sooner! I can't figure it out?
[edited by: Brett_Tabke at 5:08 am (utc) on Mar. 10, 2003]
[edit reason] no self urls please [/edit]
We were also accepted for Google News, but unlike you we never even had a spider come around. I thought that it might take some time for them to add new sites to the list, but it has now been a couple of weeks and still no sign of our content.
By the way, the same thing happened with another site I submitted many months ago. I got a note saying it had been accepted, but it never appeared!
That's not good news, that the same thing happened to you - and others. TELL ME MORE! What did you do? How did they respond to your inquiries? Surely you inquired?
Bret - thank you for welcoming me and I have taken it up with google and got this message that read more like I was inquiring about a website being spidered when I'd asked about news pages being spidered. I sent a reply to their message and did not hear back. I was just sending a second message when I got an alert on these posts.
It appears that the spider is not going very deep in the pages either as it only grabbed the story off the top of the page. I must admit I am not to good at page construction; far to many charactors. I cannot afford help - at all. I'm just so frustrated!
Ive got a funny feeling they crawl RSS news feeds too where the info is provided in a structured way and is easier and quicker for them to crawl. Maybe through certain RSS aggregator methods or sources. (newsisfree?, things like blogdex?) or far more likely using their own sofware. I say that because our news items are always a fair way down our news index page, so if they stopped crawling at a certain stage, they may not get them.. but they do. However a RSS feed is an obvious simple and pain free thing to spider with no problems in parsing compared to a regular html page.