Forum Moderators: open

Message Too Old, No Replies

News Beta - Getting Spidered n Depth of Crawl

Once approved for News beta, how often do they spider the site?

         

Axces2

4:25 am on Mar 10, 2003 (gmt 0)



I provide business news online and got approved by Google News to show my news pages in their News section! Great! They seemed to only take the top two stories (they had excerpts) one on Feb 26 and the other on Feb 28. So I started putting fresh stories up daily and none of them show! Its been over a week, nothing! I sent them an email and they responded talking about spiders crawling the web and that it takes time to index the pages. The answer read more like I was waiting for my website to be have pages listed, which I am doing that with html news content, but, this is the NEWS section not the WEB section.

It appears that they're only going about 250 charactors deep into a page to gather news content but why isn't showing up daily, or hourly like Reuters or the Chicago Tribune? I have not gotten a second answer as to whether these are two seperate spiders or the same one. It seemed odd to accept us for their News section and then stop taking our content. My web pages show up sooner! I can't figure it out?

[edited by: Brett_Tabke at 5:08 am (utc) on Mar. 10, 2003]
[edit reason] no self urls please [/edit]

Brett_Tabke

5:08 am on Mar 10, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Welcome to the forums and congrats on the news feed.

That's one you will have to take up with Google.

JLindsay

5:22 am on Mar 10, 2003 (gmt 0)

10+ Year Member



Hi Axces, I was wondering the same thing.

We were also accepted for Google News, but unlike you we never even had a spider come around. I thought that it might take some time for them to add new sites to the list, but it has now been a couple of weeks and still no sign of our content.

By the way, the same thing happened with another site I submitted many months ago. I got a note saying it had been accepted, but it never appeared!

Axces2

5:55 am on Mar 10, 2003 (gmt 0)



To Jlindsay -

That's not good news, that the same thing happened to you - and others. TELL ME MORE! What did you do? How did they respond to your inquiries? Surely you inquired?

Bret - thank you for welcoming me and I have taken it up with google and got this message that read more like I was inquiring about a website being spidered when I'd asked about news pages being spidered. I sent a reply to their message and did not hear back. I was just sending a second message when I got an alert on these posts.

It appears that the spider is not going very deep in the pages either as it only grabbed the story off the top of the page. I must admit I am not to good at page construction; far to many charactors. I cannot afford help - at all. I'm just so frustrated!

danny

5:57 am on Mar 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I got a reply saying the "new content" page of my site had been accepted, but nothing ever appeared on News searches so I mailed them back three months later - they seemed to think it was a technical problem and that they'd look into it, but nothing ever came of it.

chiyo

6:56 am on Mar 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google news put all our news sites on from the start we didnt ask. It appears they crawl at last every hour as our news appears in the index within the hour. And all new items appear.

Ive got a funny feeling they crawl RSS news feeds too where the info is provided in a structured way and is easier and quicker for them to crawl. Maybe through certain RSS aggregator methods or sources. (newsisfree?, things like blogdex?) or far more likely using their own sofware. I say that because our news items are always a fair way down our news index page, so if they stopped crawling at a certain stage, they may not get them.. but they do. However a RSS feed is an obvious simple and pain free thing to spider with no problems in parsing compared to a regular html page.