Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Adding 50,000 items to directory of plants, with just the name; Avoiding Crawl Budget, Thin Content

         

guarriman3

8:05 am on Nov 12, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi,

I've got a web project showing 200,000+ animal and plant species, coming from public databases. Each species has one URL to highlight the species name in the URL and in the title:
http:// example.com/species/pennantia-baylisiana

Now, I've found an extra database with 50,000 plant species, which are "varieties" and "subspecies" and actually we can consider them below the level of the species, and such database cointains only two fields: "name" and "coordinates" (where the 'species' is more common to see). Yes, this is very weak information, but it might be interesting for some users searching data about the 'species'. The rest of the 200,000 items shows not only the name and the coordinates, but also photos, descriptions (500+ words), academic references, etc.

I do not want to suffer from 'crawl budget' and 'thin content' issues, because this is a website with thousands of high-quality URLs and these new 50,000 URLs with weak information would harm the quality of the content.

This is my plan. I've got the website classified with several levels:
plants > division > subclass > family > species 

I'm considering to create, per each subclass, a single webpage with all these 'varieties' of that subclass (e.g. "Varieties of Asteridae"). This webpage would contain the list with just the names of all the varieties of the subclass, and a map placing them.

My questions is: would these single webpages for varieties be enough to help users to find each of the names in a single Google search? How could I 'enrich' these webpages?

Thank you very much.

[edited by: engine at 9:01 am (utc) on Nov 12, 2021]
[edit reason] Please use example.com [/edit]

NickMNS

3:12 pm on Nov 12, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Typically the solution to having a lot of thin data is to aggregate it. Basically as you suggest, creating a table with all the species of a variety. But in this case I doubt that it will be enough, because, not enough people care about lists of species within a variety and of those that do care, they can get the information from Wikipedia plus a lot more information that you don't have, like images and relationships to other plants and species.

Showing the data on a map sounds like it would provide good value, but one page per variety with a static map and a table still seems thin, and not enough to compete with Wikipedia and other established websites. I would instead look at making a single page app, with an interactive map that allows you to map different varieties at the same time, so that you can compare them. If you want to get really fancy you could also invert the relationship and allow the user to click the region and then show the varieties within that region.

Be sure to provide the tabular data for each map view and provide a unique url that allows users to link to a specific view and share the links. As such you'll end up with way more than 50k pages and no thin content.

lucy24

5:37 pm on Nov 12, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



not enough to compete with Wikipedia and other established websites

Yes, I was going to say that I manage just fine* between GBIF and Animal Diversity. Adding 50,000 pages of new worthwhile content in one fell swoop is pretty impressive if it can be done.

:: idly wondering if “division > subclass” means “phylum > class > order” ::

* I’m developing a taste for early, say pre-1850, natural history.

guarriman3

11:12 am on Nov 15, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi @NickMNS, @lucy24, thank you very much four your answers.

I would instead look at making a single page app, with an interactive map that allows you to map different varieties at the same time, so that you can compare them. If you want to get really fancy you could also invert the relationship and allow the user to click the region and then show the varieties within that region. Be sure to provide the tabular data for each map view and provide a unique url that allows users to link to a specific view and share the links. As such you'll end up with way more than 50k pages and no thin content.


A cool idea, thank you!

Yes, I could not compete with Wikipedia, GBIF or Animal Diversity, and a different 'product' to show the information could attract the interest of the visitors.

idly wondering if “division > subclass” means “phylum > class > order” ::


I tried just to simplify the content tree to help with the question, it's not exact, right :-)

tangor

6:37 am on Nov 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is there any reason this new info is not appended to the bottom of the related entry, thus increasing the content of the related page and providing more info for the user?

guarriman3

4:35 pm on Nov 27, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



Is there any reason this new info is not appended to the bottom of the related entry, thus increasing the content of the related page and providing more info for the user?


Yes, I could append the list of the "most similar varieties and subspecies" at the bottom of the page of each one of the species :-)

It's a cool idea, thank you @tangor.