Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Crawl Budget and Image Folders

         

mywebguytaylor

6:17 am on Jul 4, 2022 (gmt 0)

5+ Year Member



I have a large website with Wordpress image folders going back to 2009.

I am currently redesigning my website, and I am trying to determine if there is any benefit to trying to shrink down / delete those images and image folders which I am no longer using.

I really do not have time to go through all of those image folders, and see which ones I am still using, and which ones I am not using anymore. I am hoping this does not matter?

Does anyone here know if this matters when it comes to Google's Crawl Budget?

All of the images are completely optimized and crunched, however, my question is whether it would be worth the time investment to go through every single folder and thousands of images and try to delete the ones which are not being referenced on any of my pages?

Does anyone have a definitive answer regarding Crawl Budget?

RedBar

9:32 am on Jul 4, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does anyone have a definitive answer regarding Crawl Budget?

I'm not too sure what you mean by this. All search engines do this for free, are you concerned about Google's power consumption or something else?

It is estimated that there are about 1.17 billion websites, do you feel that by deleting images of your one site will make a big difference to Google's resources?

FWIW, I have some 20+ year old sites that I have never deleted a single image and I also have some 20+ year old sites that I regularly ensure the images are always relevant, used and linked correctly.

This is purely a personal attitude and nothing to with G, if you have images no longer in use and they're easy enough to delete, why not?

But I do realise that WordPress is a completely different animal to those of us who construct our own sites and actually see our local system resources that are used whereas with WP 99+% of users haven't a clue!

Dimitri

1:22 pm on Jul 4, 2022 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



I'm not too sure what you mean by this. All search engines do this for free, are you concerned about Google's power consumption or something else?

It is estimated that there are about 1.17 billion websites, do you feel that by deleting images of your one site will make a big difference to Google's resources?

LOL. This is not about this.

To simplify, Googlebot decides on how many hits (and certainly MB, and time) it will fetch from a given site per day, week, or month.

lucy24

3:32 pm on Jul 4, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Related but relevant question: Search engines know what page(s) a given image belongs to. If the page associated with a particular image ceases to exist, how does this affect the crawling and indexing of the image in question? Might it still show up in search results--and if so, does that mean search engines remember an image’s <alt> even if the said <alt> no longer exists?

RedBar

3:49 pm on Jul 4, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@lucy24 - I've been image intensive from 1993, when Google first started it did remember who first posted an exclusive image however at some point in the noughties Google, for my images, "lost" this ability.

The same does not apply to urls though, I remember seeing urls I killed at least 10 years ago appearing in my logs, now whether that was Google or website links I can't remember since I simply don't bother looking at them these days.

@Dimitri - I think G does "allocate" bandwidth / visits to less frequented sites and especially so if that site has millions of variable quality pages and, probably, is simply there to earn advertising money. There are many such sites cluttering The Net therefore it would not surprise me if many of these have been manually checked and approved / demoted.

buckworks

4:40 pm on Jul 4, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'm no expert on crawl budgets but intuitively it seems a good idea to keep unused assets away from search spiders.

When your site redesign is done, use a crawling tool to round up a list of the images that <are> being used. That would make it easier to identify those that aren't.

As you figure out which images are no longer in use, you could either delete them or move those worth keeping into an archive folder that's blocked by robots.txt.

In the meantime, one action to take is to make sure your image folders have index pages of your creation, to avoid the default situation that displays a list of the files within the folder when someone tries to access it.

It could be extremely simple, something like this:


<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Empty Page</title>
<meta name="robots" content="noindex,noarchive"/>
</head>
<body>
<a href="/">
<p style="width:9999px;height:9999px"></p>
</a>
</body>
</html>


That example gives a blank page that links to the home page but doesn't reveal what's in the folder. Add some styling or other content if you wish, although few humans would ever see it. I like to add a CSS gradient to style the <body> and leave it at that. Using the site's main colours, of course! :-)

lucy24

6:51 pm on Jul 4, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



make sure your image folders have index pages of your creation, to avoid the default situation that displays a list of the files within the folder when someone tries to access it
Eh? MY default is to disable auto-indexing sitewide. Even on WP sites, users do have enough control over their htaccess-or-equivalent to add the appropriate directive.

Anyway, search engines shouldn't be asking for image directories; at least I’ve never known them to do so. And it's not because they are all individually roboted-out.

:: quick detour to logs to ensure I’m not talking out of my hat ::

Nope, nothing from search engines, ever, and only a scattered handful from malign robots (or possibly extra-stupid browsers, but I doubt it).

buckworks

1:12 am on Jul 5, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



search engines shouldn't be asking for image directories


FWIW, I became interested in index pages when I found evidence of requests for some directories I wouldn't have expected.

disable auto-indexing sitewide


Yes, that's smart but I sometimes like to use a purpose-built page rather than just delivering a default error. Chalk it up to whimsy rather than logic, but either way, it's a good idea to prevent the contents of a directory from being displayed.

lucy24

1:27 am on Jul 5, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



sometimes like to use a purpose-built page rather than just delivering a default error
Funny you should say that. For the longest time--before I had my own website--I interpreted 403 as "no directory". Why? Because, as an ordinary human, the only time I ever met a 403 was when I was manually stepping back through an URL, and stopped at some point in the directory structure that didn't correspond to a page. (Especially common in academic sites for some reason.)

On one site I actually do have URLs in the form example.com/directory/pagename.html where example.com/directory/ doesn't exist; it's just a way of organizing things. To avoid humans making understandable mistakes, I redirect them to the root instead.

But, oops, this thread is about images isn't it.

buckworks

1:52 am on Jul 5, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Something I'm wondering about ... what would happen if one used robots.txt to block directories which contain images that are actually in use? Any bad effects / unintended consequences?

tangor

3:38 am on Jul 6, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't worry about crawl budget. Theirs to set, mine to ignore---since I can't beg for more. :)