Reducing File Bloat

Did you ever wonder why your seemingly simple Excel file is so large (in terms of KB or MB) or why it seems “sluggish”? We recently received a file from a client who was frustrated because it was taking Excel way too long to open the file or process any changes (i.e. a few seconds each time – how annoying!). The file size was over 20 MB, despite the fact that it didn’t contain a lot of data or any complex calculations. It turns out the cause was something we call “Excel bloat”. Bloat means there are likely “hidden” issues with the file that cause performance problems. These issues can be easily corrected if you know where to look and what to do. With a couple of simple techniques we reduced the file size to less than 2 MB.

This article is the first in a series that will help you put some of your Excel files on a diet and get them in tip-top shape.

PART 1: Finding Bloated Tabs

Where do you start looking within a file for the bloat?  How do you know which sheets are responsible for the disproportionate share of the huge file size?  It might be obvious if there are a few sheets that have the most data, but what if it’s not obvious? Here’s a quick way to see how each sheet in your file ranks in terms of file size.

Step 1:  Do a “Save As”, and in the “Save as type” area, choose “Web Page”. This will create a folder in the save location.  You may get a warning that certain features are not compatible with a Web Page. Just ignore this and continue the save.

Step 2: Go to the folder (wherever you saved it) and open it.  You’ll see files for each sheet (they will be numbered sequentially and not with their proper sheet names) as well as a bunch of other files.

Step 3: Simply sort the files by size to see which ones are the biggest.  Note that since the file was converted to a Web Page, the aggregate size of these files will be much bigger than your original Excel file.  Don’t worry about that.  What you’re interested in is the relative size of all the sheets, and this trick will give you a quick sense of that so you know where to start looking.

marqueereducingfilebloat-img1

PART 2: Reducing Tab Size

As you can see, the 5th and 7th tabs in our example file are the biggest. But how do we reduce their size?

The biggest contributor to bloat is not actually the amount of data or the number of cells containing data on a sheet, but rather the size of the area of cells which Excel thinks are being used in some way. This is called the ‘Used Range’.

For example, you might have 10 columns of data spread across 5,000 rows of the sheet. In this case, your sheet size would be based on a Used Range of 50,000 cells. This might seem like a lot, but Excel has 17.2 billion cells available on a sheet.

Quite often, the Used Range itself is bloated because Excel not only keeps track of the contents of a cell, but also assigns cells dozens of other attributes (colour, borders, number formats, data validation, etc.). Many of these are not visible to the user. The assignment of some of these attributes can cause Excel to think a cell is important, and therefore include the cell in the sheet’s Used Range. For example, if you select an entire column and apply formatting, the Used Range now includes all 1,048,576 rows in the file, increasing the file size and causing Excel to operate slower.

Since you typically only care about the cells that contain actual data, you can quickly reduce the Used Range (and therefore the size) of the sheet by doing the following:

Step 1: Select the sheet that you have identified as being larger than expected.

Step 2: Press CTRL + END at the same time. This will bring you to the bottom right corner of the sheet’s Used Range.

Note: In Sheet007 in the example above, this took us to cell IV503 (see graphic below). IV is the 256th column, so the Used Range was 256 columns by 503 rows, or 128,768 cells, in size. However, we only had data in cells A1 to cell Q53, or 901 cells. The Used Range was 143x the intended size.

marqueereducingfilebloat-img2

Step 3: Now that we know the size of the Used Range, we need to shrink it down to the size of the data. To do this we delete unused columns and rows. Where to start deleting depends on the size:

  • Small: If the size of the Used Range is not very large (less than a few hundred columns by a few thousand rows), then you can try deleting all the unused columns and then all the unused rows.
  • Large: If the Used Range is very large, deleting thousands of rows or columns at a time might cause Excel to ‘not respond’, hang, or even crash. If this is a concern, start by deleting one row or column at a time from the bottom or right-hand side of the Used Range. Sometimes an oversized Used Range can be caused by a single stray cell with no other used cells in between that cell and your data. Try deleting just the rightmost column or bottom row of the Used Range and then press CTRL + END again. If you are lucky, that will be enough to reset the Used Range to a more reasonable size.