Normalizing

Tags

Normalizing means to convert data to a predictable format.

Pattern

Normalize a variable

Convert a variable that can have many formats, into one format.

This makes it easier to write validation and other code. Easy is good.

Let's see how you could do normalization while processing a Range.

Goats like sweaters

Goat

​Thanks to Teagan for the photo

Cthulhu is thinking about learning to knit. It tells its minions to survey some goaty friends, asking them which of three styles of sweater they like best:

  • Goatneck
  • Turtleneck
  • Cardigan

Here's some of the data:

Data

As you can see, the minions weren't careful about how they typed in the data. There are extra spaces everywhere. The capitalization is all over the place.

Hiding goat
Hiding goat
Photo: Holly Vickery

Does a goat like goatnecks?

Suppose we want to test whether a goat likes goatneck sweaters. Because of all the variations, we would have to use an If like this:

  • If tGoatFaveSweater = "Goatneck" _
  •    Or tGoatFaveSweater = "goatneck" _
  •    Or tGoatFaveSweater = "GOATNECK" _
  •    Or tGoatFaveSweater = " goatneck " _
  •    ...

Yuck! What a pain!

It would be easier to normalize the data, that is, convert it to one format. The If would be simpler, as we'll see.

The code pattern is something like:

  • For each record:
  •   Grab the record's data from the Range
  •   Normalize the data
  •   Test the data

So, grab data each record's data, and test it. But before testing it, normalize the data, so that testing is easier.

Counting favorite sweater types

Here's code that will count the number of each sweater type.

  1. Set rSweaterData = Range(Cells(11, 1), Cells(11, 1).End(xlDown).End(xlToRight))
  2. sNumberOfRecords = rSweaterData.Rows.Count
  3. sGoatNeckCount = 0
  4. sTurtleNeckCount = 0
  5. sCardiganCount = 0
  6. 'Work through the records, one at a time.
  7. For sRecordNumber = 1 To sNumberOfRecords
  8.     'Grab the current record's fields.
  9.     tGoatName = rSweaterData.Cells(sRecordNumber, NAME_COLUMN)
  10.     tGoatFaveSweater = rSweaterData.Cells(sRecordNumber, FAVE_SWEATER_COLUMN)
  11.     'Normalize
  12.     tGoatName = Trim(tGoatName)
  13.     tGoatFaveSweater = LCase(Trim(tGoatFaveSweater))
  14.     'Add to counts.
  15.     If tGoatFaveSweater = "goatneck" Then
  16.         sGoatNeckCount = sGoatNeckCount + 1
  17.     ElseIf tGoatFaveSweater = "turtleneck" Then
  18.         sTurtleNeckCount = sTurtleNeckCount + 1
  19.     ElseIf tGoatFaveSweater = "cardigan" Then
  20.         sCardiganCount = sCardiganCount + 1
  21.     End If
  22. Next sRecordNumber
  23. 'Output results.
  24. Cells(6, 2) = sGoatNeckCount
  25. Cells(7, 2) = sTurtleNeckCount
  26. Cells(8, 2) = sCardiganCount

Lines 8 to 10 get the current record's data, with one variable for each field. Lines 14 to 21 looks at the sweater type in the current record, and adds to the appropriate count. However, before the counting code starts, lines 11 to 13 normalize the data. The sweater type in the current record is stripped of spaces, and made all lowercase.

So instead of...

  • If tGoatFaveSweater = "Goatneck" _
  •    Or tGoatFaveSweater = "goatneck" _
  •    Or tGoatFaveSweater = "GOATNECK" _
  •    Or tGoatFaveSweater = " goatneck " _
  •    ...

... we can write...

  • If tGoatFaveSweater = "goatneck" Then

M0000t! Much easier.

You can download the workbook, if you want to try it out.

What's stranger than goats in sweaters? Goats in trees!

Up next

We've normalized the data read from the Range. How about checking it for errors?

Attachments