Forum Moderators: goodroi

Message Too Old, No Replies

Thousands of New Pages Daily

Dynamic website crawling

         

placedigit

5:22 pm on Dec 2, 2019 (gmt 0)

5+ Year Member



One website I have come across posts thousands of NEW pages daily. This website is fairly large serving data for multiple countries.
The challenge right now is to submit these thousands of NEW pages to Google on daily basis.
What should be the right approach when comes to managing this type of website so that Google crawl it properly ?

tangor

6:49 pm on Dec 2, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I suspect that sites of this high activity might exceed g's crawl budget per site ... and might break site maps into chains (limit is 50k urls per site map)...

Does any of this data go evergreen, or it is under constant update?

If more than 50k urls, use a site map INDEX which can list 50k named sitemaps, for 2.5 billion entries in total.

placedigit

4:55 am on Dec 3, 2019 (gmt 0)

5+ Year Member



Thanks for your reply Tangor. I have already thought of breaking sitemap by using INDEX technique.

The content is not evergreen. Basically, this content is useful for readers/customers for 60 days and then after it not useful for customers but we do not delete these pages.

We publish the thousands of pages on a day to day basis (no updation happens after publishing). Now, everyday I have to submit these newly added pages to GSC then next day and so on.

Giving this scenario what should have been best/ideal practice for this website ?

(I hope I have elaborated enough for you to understand)

phranque

7:13 am on Dec 3, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



how is the internal linking to those thousands of new pages per day that are being published?

placedigit

12:35 pm on Dec 3, 2019 (gmt 0)

5+ Year Member



These newly published pages do get reference from the pages which are existing. For eg: existing category pages has the LATEST ARTICLE section from where new pages are linked.

tangor

9:48 am on Dec 4, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How will g (or other se's) know these are the same date, just a new day, if they are not collected under a INDEXABLE AND EXPECTED url?

Not saying it can't be done, just seems like there is nothing PERMANENT to entire the search engine (or the users).

Or am I missing something?

widget_report-date.html and only one of each, and only important that date, then gone with next day...?

placedigit

5:13 am on Dec 5, 2019 (gmt 0)

5+ Year Member



@Tangor, as a part of new pages going live, I do submit the sitemap on daily basis for all thousands of pages. This is one way, that I am doing to "inform" Google about all the new pages.

My question is, if anyone from you all readers have worked on such type of site, then what should have been the right practice to for a website which publishes thousands of new pages daily ?

apart from informing Google via sitemap is there anything more one should do ?