Advanced image management
Introduction
Where is the best place to store images on your Web site? If you have a content-managed Web site and store all your images in a single folder called "images", the odds are you have created a "black hole" on your Web site where images go in but never come out.
This article provides an alternative to the single "images" folder and will help your CMS delete unused images and provide a better user experience for content creators.
Problems with using the single "images" folder
The "black hole"
"Black hole" is an appropriate name for the "images" folder, because images go into this folder but never come out. Over time, as new pages are added to the Web site or existing content is updated, more and more new images are added to the "images" folder. However, when pages are removed from the Web site or content on the Web pages changes, images that are no longer useful are left in the "images" folder. Why? The primary reason is because images stored in the "images" folder are shared by many Web pages and it's hard to tell if a given image is being used. So the "images" folder is left unmanaged and becomes a stale collection of any image that has ever been used on the Web site.
Name conflict
One of the first problems content creators encounter is file name conflict, when a user uploads an image with a file name that already exists in the image repository. The more images you have in a single "images" folder, the greater the chances that a name conflict will occur. The approach recommended in this article does not totally eliminate name conflicts but it will reduce them significantly.
Access control
Your CMS may control access to editing documents, but cannot control access to editing images if all the images are stored in a single "images" folder and content authors are permitted to delete, rename or replace images in that folder. Content creators can manipulate images used by documents that they don't have access to, and can even change the branding of the Web site if images in the folder are used by Web page layout.
Scalability
Most CMS provide an image library that either displays a list box or thumbnails of images in the library. If the "images" folder contains hundreds or thousands of images, it becomes impractical for applications to present these images in this way to content authors and for content authors to find the appropriate image.
File systems don't perform well when a large number of files are stored in a single folder. The image library may open slowly for CMS users, and if the image library displays metadata about images such as dimensions, file size or date last modified, this slows performance even further. Large image libraries also require more data to be transmitted over the network, which can be resource intensive and slow.
A better way to manage images
In this article, we will limit the discussion to storing images on the file system, although this approach can be adapted to storing images in the database as well.
The solution to the problems created by using a single "images" folder can be addressed by creating a different folder on the file system for each document in the CMS. The folder name should be the ID of the document in the CMS and all the images used by a given document should be stored in this folder. Although these folders can be created anywhere on your site, the examples in this article will assume all these folders are sub-folders stored in a folder called "resources" of the root of the Web site.
As discussed previously, file systems don't perform well when a large number of files are stored in a single folder. The same is true for sub-folders. The solution is therefore to create grouping folders that will significantly reduce the number of immediate sub-folders in any given folder. If your CMS uses GUIDs for document IDs, you can take the first 2 characters of the ID and create a grouping folder. Then all folders whose name begins with these 2 characters will be created as sub-folders to the grouping folder. In your main "resources" folder you will then have a maximum of 256 grouping folders. And if your CMS has 100,000 documents, then each grouping folder will contain about 390 sub-folders. The following illustrates how these folders would be nested:
- /resources
- /A7
- /A7490129-F72E-4843-BCFE-784AC9BF8F97
- /A764FA83-9F3D-47E7-97EA-12585A6579A8
- /B5
- /B56BC454-70B9-47E3-BDC2-530DBC4F154B
- /B535A81F-1085-4E33-B636-DDA458D8BB67
- /B5933B6E-1B9C-4F02-8582-C486E8B7ECEE
- /CE
- /CE936918-C9CC-4A79-8892-5EF251A9BD29
- /chart.jpg
- /product1.jpg
- /product2.jpg
If your CMS uses sequentially numbered document IDs, then you can use a modulo arithmetic operation to create the name of the grouping folders. Modulo operations return the remainder from a division of two numbers. So to get 256 grouping folders, simply modulo the document ID by 256, which will produce numbers from 0 to 255, which you can use to name your grouping folders. The following illustrates how these folders will be nested:
- /resources
- /232
- /233
- /234
- /1002
- /chart.jpg
- /product1.jpg
- /product2.jpg
So how does this address the "black hole" problem where images get added but are never deleted? First, when a document is deleted from your CMS, the folder whose name was formed from the ID of the document can be safely deleted as well, since it only contains images used by that document. Second, when content in a document has been updated, your CMS can parse the content and delete from the corresponding image folder any images that are not referenced by the markup. See code examples of how to delete unused images.
Essentially what you are creating is a dynamic image library where content authors can only browse, select from and upload to an image folder associated with the document being edited. A good label for this dynamic image library would be "Images for this document". See how to implement a dynamic image library using XStandard.
So does this mean you should remove the "images" folder from your content managed site? No. The "images" folder should be used for images used by the layout of the site, or should be used to store images that content authors can only select, rather than upload/delete/rename/replace. You can label this image library "Common images", for example.
Then, when content authors browse for images, the CMS should provide authors with 2 image library choices: "Images for this document" and "Common images", as shown in the screen shot below. See how to setup multiple image libraries using XStandard.
Conclusion
Storing images used by content-managed documents in separate folders solves many of the problems caused by using a single "images" folder. This approach is simple and reliable and can also be used for attachments (documents like PDF, Microsoft Word, Zip files, etc.). In fact, you can store both images and attachments in the same folder.