Server logs indicate that for almost every url Googlebot visits it also attempts the same url with index.php. e.g. example.com/index.php or example.com/index.php/great-content. Logs indicate it usually hits the index.php version first.
htaccess redirects any search containing an index.php to the non-index.php equivalent, needless to say that's a lot of redirecting going on and a lot of wasted crawl budget. To compound matters if I make a page return 404 or 410 Googlebot ends up on a chain of redirects.
- Almost no incoming links contain index.php in the url and certainly no urls on my site do either.
For the first 6 months of this sites life the urls DID contain index.php due to limited access to htaccess by the host but I changed host and implemented the fix mentioned above, this was over 6 years ago.
Should I bite the bullet and have any search for index.php versions of a url return 410? Other options?