Forum Moderators: phranque

Message Too Old, No Replies

How to make SEO site audit of large forum? (over 1.5M links)

         

Hogar315

10:17 am on Feb 16, 2022 (gmt 0)



Hey everyone,

I have done tons of SEO audits on small local pages, or eCommerce stores with let's say a few thousand of pages.

But I was always wondering how to make site audits on forums? Especially on those that are big like this one and have over 1M links and inner pages and over 1M backlinks...

Tools like Screaming Frog crawl every Thanks.php?do= page, and every profile.php, every thread, every page on every thread, every tag, every tag page (seo page 1, seo page 2, seo page 3...)...

So, what would you check on such a big forum?

Duplicated Title (tons of them probably because 50k members have definitely same questions)...so we can do nothing about this?!
404 pages? Mostly on threads where users used some free image services to upload their image or their websites that after 2 years does not exist anymore. So we can do nothing about them?
Also duplicated title because of pagination. How to fix that? eg. 1st page of thread have "Title" 2nd page also have the same "Title".... should we add something like "Title | Page 1"?
Missing ALT Image tag...users on threads are not adding these,

Did I forget something?

Thanks a lot!
/* And most of standard packages of SEO tools like SEMrush, aHrefs and others are not capable of crawl such big forum */

buckworks

9:33 pm on Feb 16, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Could you audit the site in sections rather than trying to crawl it all in one go? Even if a tool could give you a report on the whole million-page shebang at once, it would take you some time in any case to review and act on the issues. You could do a lot of cleanup working section by section.

To add to your list of things to check, I'd want to review the meta descriptions to see what kind of impression they might make when they appear in the search results.

phranque

9:44 pm on Feb 16, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld! [webmasterworld.com], Hogar315!

do you have access to the Google Search Console for this site?

csdude55

5:19 am on Feb 17, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My sites are like what you've described, going back for 20 years!

In all honesty, though, I gave up on SEO several years back and haven't noticed any change at all. Google is lightyears ahead of anything I can think of, so any attempt on my end to manipulate anything is a fool's errand. I spent a LOT of time implementing schema tags, thinking it would solve my problems; spoiler alert! It had no impact :-/

Duplicated Title (tons of them probably because 50k members have definitely same questions)...so we can do nothing about this?!

Assuming that you can change the programming in the forum, you could always modify the title to include the author's username, the date, etc.

404 pages? Mostly on threads where users used some free image services to upload their image or their websites that after 2 years does not exist anymore. So we can do nothing about them?

For links, just include a rel="nofollow" tag and forget about it.

For images, on my end I do two things:

1. I only embed images if the reader is logged in; otherwise, it's just another link. Search engines see the link with the "nofollow", so no problem.

2. I'm rebuilding right now to use cURL to fetch the headers before embedding the image for those logged in users. That might turn in to a problem, though, so I'm releasing that slowly on a trial basis.

Also duplicated title because of pagination. How to fix that? eg. 1st page of thread have "Title" 2nd page also have the same "Title".... should we add something like "Title | Page 1"?


That's pretty much what I do, but I see that when you search on Google and go on to page 2 then the title doesn't change. So maybe it's irrelevant?

I'm also changing my site to use Infinite Scroll by default, but making pagination an option for registered users. This would eliminate your problem entirely, too. I posted a complete script for it, if you're interested; I think it's under the Javascript sub-forum, sometime around 2018

Missing ALT Image tag...users on threads are not adding these,

If you can modify the code, I'd plug in an empty ALT tag if there's nothing available.

If you have access to the code but you're not sure how to do any of that, let us know the language and we'll help you figure it out.

Brett_Tabke

11:39 am on Jun 4, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Ran into this one, and I found it a interesting question.

> Duplicated Title

That is a good one. First, the forum software should give moderators and admins easy ways to edit titles. Train mods early and often in editing titles.

Programatically, you should export all the page titles to a spreadsheet - sort by duplicate status - then sort the dupes by time.
Start editing a dozen a day until you get back six months. Anything over six months is pretty much consider dead content by Google on any site it classifies as a forum.

The admin boards I am on, have a standing theory that Google classifies sites as forums if it is running specific software (Vbulletin, phpbb, XenForo, etc) After both Panda and Penguin, VB admins saw a 1-to-1 drop in referrals from Google.

> 404 pages? Mostly on threads where users used some free image services

You know, there are a couple of mods for specific software, where you can just pull the urls from back posts after six months. They are still in the db, just not generated.

If you could ID maybe 10 of the services they used to upload images, a well crafted SQL statement could delete them from the database (if using MySql or Maria).

> Also duplicated title because of pagination.

Honestly, duplicated titles is not that big of an issue from what I have seen in the SEO community. (After all, google is rewriting most titles these days). Duplicate content is an issue - but not titles.

> Missing ALT Image tag...

No big whoop. What some peeps do is to repeat the filename, or meta description for the page in the alt tags.
<edit>(eek typos)</edit>

martinibuster

4:57 pm on Jun 4, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You have to set screaming frog to focus on canonicals (because some forum software creates dupes), and also to crawl as googlebot because forum software shows different content to googlebot and normal users.

You may also want to exclude crawling member profiles.

Do a member profile crawl later as a separate standalone audit as the only thing you need to focus on for prifiles is external links to find spam.

Additionally set the screaming frog settings so that it stores crawl db on your hard drive and not in ROM otherwise it won't be able to handle all the pages.

One last thing you should connect your computer directly into your modem through Ethernet so that you get the full bandwidth of your internet. What can take 24 hours to crawl on Wi-Fi will take only a several hours on a fiber optic direct connection through Ethernet.

Good luck!

Roger Montti

Brett_Tabke

5:16 pm on Jun 4, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



minor quibble:

>wifi speed

It isn't as cut and dried as it used to be just a few years ago.

My newest fiber modem has WiFi-AX on it. Wifey is getting north of 700mbps on her macbook pro (which is faster than the thunderbolt to ethernet adapter gets).

martinibuster

3:15 am on Jun 5, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Maybe the thunderbolt is a limiting factor?

I connected via Ethernet to the router (not the modem) and got significantly faster speeds than I've ever experienced.

But I'm on municipal fiber, zero limits.

csdude55

9:03 pm on Jun 9, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I hate both of you.

I'm lucky to get 3MB at home!

tangor

2:54 am on Jun 10, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Recently forced to go fiber by my ISP. FORTUNATELY it was ultimately cheaper than what I had been paying. 1gb possible, opted for 300mb for half price---and that was significantly faster than my 25mb previously.

(Large urban area, major internet provider)

All that speed, however, simply means you get the data faster, you still have to PROCESS it!

Brett_Tabke

3:20 am on Jun 10, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I am pretty stunned here - we can get 5gig fiber. It is a quite a bit more per month than the 2gig we just upgraded too. Even Google Fiber is forced to up theirs to 2 and 5gig as well since AT&T and Spectrum are starting to offer it.

csdude55

3:53 am on Jun 10, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm in a relatively rural area, and my target demographic is primarily rural. I know we've talked about this before, but there are neighboring counties that still only have dial-up!

I seriously wouldn't know what to do with a 300M connection, it sounds like a fairy tale...

robzilla

9:22 am on Jun 10, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Duplicated Title (tons of them probably because 50k members have definitely same questions)...so we can do nothing about this?!

Not really, unless you feel like specifying all those titles to make them more unique..

404 pages? Mostly on threads where users used some free image services to upload their image or their websites that after 2 years does not exist anymore. So we can do nothing about them?

I'd probably replace those with a notice about the broken link.

Also duplicated title because of pagination. How to fix that? eg. 1st page of thread have "Title" 2nd page also have the same "Title".... should we add something like "Title | Page 1"?

Don't do it for page 1, but I would do it for subsequent pages.

Missing ALT Image tag

Don't bother. Search engines don't need ALT to figure out what's in an image.

I'd also try to reduce the number of links that don't really need to be links, e.g. Thanks.php?do=. Better off doing that with javascript.

Member profiles aren't all that interesting to non-members, so I would only link to those when a user is logged in.

thecoalman

5:39 am on Jul 25, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The 404's are certainly an issue but my forum is approaching 20 years old, I have larger concerns about old links going to who knows where because domain ownership changed.

I have a custom script that I use every few years that parses every post for outbound links and compiles them into a list of just the domains. Second script does a screenshot through the browser and names the file as the domain.tld.jpg. This is where the work starts. I'll split them into batches and upload to moderator forum to get some help. Each image is manually viewed deleting the good ones.

Last but not least the third script parses the file directory list to get a list of domains which is used in MySQL query with REGXP_REPLACE.

csdude55

5:09 pm on Jul 25, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@thecoalman, there might be an easier way. Why not just modify your script to only write the anchor tag if it's posted within XX days? After 30 days, I suspect that it's mostly just seen by search engines, anyway.

thecoalman

10:40 pm on Jul 25, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Due to the nature of the topic the posted information and any valid links are just as relevant today as they were 20 years ago. I don't want to remove relevant links. I have a lot of older in depth discussions that do quite well in long tail search results.

I'm a moderator on phpbb.com and the easier method is what I suggested to the devs. Have a whitelist for domains, moderators would need to approve the domain when it was first posted and then another list for reapproval at set interval like every year. Not only would it help keep old posts tidy but it would also dissuade spammers.