Validate then process

Noms!
Noms!
Photo: Holly Vickery

The pattern

Here's what we did in the previous lesson.

  • For each record:
  •   Grab the record's data from the Range
  •   Normalize the data
  •   Validate the data
  •   Process the data

Let's add to the last step. We'll keep it simple: computing the average of valid data.

Goat cuteness data

Cthulhu has some data on goat cuteness it wants you to analyze. Here's part of the start worksheet:

Data

Cuteness should be a number, from 1 to 10.

When the Run button is pressed, show average cuteness, for valid data only.

Output

Adela, could you give it a try?

Code outline

Adela
Adela

OK.

Let's see... I'll work it backwards, but instead of doing everything backwards, I'll work out the big steps backwards. Then, we'll see.

We want an average for valid data, and the count of valid data records.

You know what... I'll start VBE, and type in code comments as I go.

  • 'Output average cuteness (valid data only)
  •  
  • 'Output number of valid records

Note

I often do this, too. I start by typing comments into the code editor I'm using, for VBA, C#, PHP, JavaScript, whatever language I'm using

Adela
Adela

OK, to work out the average, I'll need the total of valid cuteness, and the count. Oh! The count is in the output as well.

  • 'Compute average cuteness (valid data only) from total and count.
  •  
  • 'Output average cuteness (valid data only)
  •  
  • 'Output number of valid records
Adela
Adela

I know how to total. That's the accumulator pattern...

Pattern

Accumulator

Loop over a data table. For each record, add a field to a total, count, or whatevs.

  • 'For loop
  •  
  •     'Total = Total + Cuteness
  •  
  • 'Next
  •  
  • 'Compute average cuteness (valid data only) from total and count.
  •  
  • 'Output average cuteness (valid data only)
  •  
  • 'Output number of valid records
Adela
Adela

Oh, but only for valid data. So I need to wrap the totaling thing in an If.

  • 'For loop
  •  
  •     'If cuteness is valid
  •  
  •         'Total = Total + Cuteness
  •  
  • 'Next
  •  
  • 'Compute average cuteness (valid data only) from total and count.
  •  
  • 'Output average cuteness (valid data only)
  •  
  • 'Output number of valid records
Adela
Adela

Wait... I'll use the flag pattern, so I'll change the If to show that.

Pattern

Flag

A variable that summarizes the results of a bunch of code.

  • 'For loop
  •  
  •     'Data is OK = yes
  •  
  •     'If cuteness is invalid:
  •  
  •         'Data is OK = no
  •  
  •     'If data is OK
  •  
  •         'Total = Total + Cuteness
  •  
  • 'Next
  •  
  • 'Compute average cuteness (valid data only) from total and count.
  •  
  • 'Output average cuteness (valid data only)
  •  
  • 'Output number of valid records
Adela
Adela

Oh, it needs to say that cuteness is from the current record.

Some more finishing touches. Get cuteness for the current record, make the Range object, oh, and gotta count the valid records, too.

And... done!

  • 'Make a Range
  •  
  • 'For loop
  •  
  •     'Get cuteness for current record
  •  
  •     'Data is OK = yes
  •  
  •     'If cuteness is invalid:
  •  
  •         'Data is OK = no
  •  
  •     'If data is OK
  •  
  •         'Total = Total + Cuteness
  •  
  •         'Count valid = Count valid + 1
  •  
  • 'Next
  •  
  • 'Compute average cuteness (valid data only) from total and count.
  •  
  • 'Output average cuteness (valid data only)
  •  
  • 'Output number of valid records
Georgina
Georgina

Wow, great job, Adela! I really saw your thinking, there.

Ethan
Ethan

Yeah, that was nice!

OK, let's look at that again, after the code is written.

Initialize

To compute an average, we need a total, and a count. Let's make those variables, and initialize them. Also, let's create the Range.

  • Sub run()
  •     Dim rData As Range
  •     Dim sNumberOfRecords As Single
  •     Dim sNumberOfValidRecords As Single
  •     Dim sTotalCutenessValidOnly As Single
  •     Dim sAverageCutenessValidOnly As Single
  •  
  •     'Initialize.
  •     sNumberOfValidRecords = 0
  •     sTotalCutenessValidOnly = 0
  •  
  •     'Create the Range.
  •     Set rData = Range(Cells(4, 1), Cells(4, 1).End(xlToRight).End(xlDown))
  •     sNumberOfRecords = rData.Rows.Count

I named the variables to remind ourselves what we're doing. Instead of sTotalCuteness, use sTotalCutenessValidOnly. This is optional, but if it helps you, do it.

OK, add the loop, and the code that reads in the current record. I'm going to start leaving out code you've already seen. You can download the solution to see it all.

  • Sub run()
  •     ...
  •     Dim sRecordNumber As Single
  •     Dim tUncheckedCurrentCuteness As String
  •     Dim tCurrentName As String
  •     Dim sCurrentCuteness As Single
  •     ...
  •  
  •     Const FIELD_NAME = 1
  •     Const FIELD_CUTENESS = 2
  •  
  •     'Initialize.
  •     sNumberOfValidRecords = 0
  •     sTotalCutenessValidOnly = 0
  •  
  •     'Create the Range.
  •     Set rData = Range(Cells(4, 1), Cells(4, 1).End(xlToRight).End(xlDown))
  •     sNumberOfRecords = rData.Rows.Count
  •  
  •     'Loop over records.
  •     For sRecordNumber = 1 To sNumberOfRecords
  •         'Get fields for current record
  •         tCurrentName = rData.Cells(sRecordNumber, FIELD_NAME)
  •         tUncheckedCurrentCuteness = rData.Cells(sRecordNumber, FIELD_CUTENESS)
  •  
  •     Next sRecordNumber

Notice the Consts. They make the later code easier to read, and maintain.

Validation

We'll only validate the cuteness field. The flag pattern.

  • 'Initialize validation flag.
  • tIsDataOk = "yes"
  • 'Validate
  • If Not IsNumeric(tUncheckedCurrentCuteness) Then
  •     tIsDataOk = "no"
  • Else
  •     sCurrentCuteness = tUncheckedCurrentCuteness
  •     If sCurrentCuteness < 1 Or sCurrentCuteness > 10 Then
  •         tIsDataOk = "no"
  •     End If
  • End If
  • 'Is the data valid?
  • If tIsDataOk = "yes" Then
  •     'Processing.
  •  
  • End If

Processing

We want to have a count and total. If the data is valid, add to total and count variables.

  1. Option Explicit
  2.  
  3. Sub run()
  4.     ...
  5.     'Initialize.
  6.     sNumberOfValidRecords = 0
  7.     sTotalCutenessValidOnly = 0
  8.     ...
  9.     'Loop over records.
  10.     For sRecordNumber = 1 To sNumberOfRecords
  11.         'Get fields for current record
  12.         ...
  13.         tUncheckedCurrentCuteness = rData.Cells(sRecordNumber, FIELD_CUTENESS)
  14.         '(Move into sCurrentCuteness if valid)
  15.         ...
  16.         'Is the data valid?
  17.         If tIsDataOk = "yes" Then
  18.             'Processing.
  19.             sNumberOfValidRecords = sNumberOfValidRecords + 1
  20.             sTotalCutenessValidOnly = sTotalCutenessValidOnly + sCurrentCuteness
  21.         End If
  22.     Next sRecordNumber
  23.     ...
  24. End Sub

Lines 6 and 7 initialize variables for count and total.

Lines 10 to 22 loop over records. Line 13 gets a cuteness value for the current record. If it's valid, line 19 increases the count of valid records by one. Line 20 increases the total, by the new cuteness value.

Average and output

OK, after the loop ends, we'll have the count and total we need to compute the average.

  • 'Compute average.
  • sAverageCutenessValidOnly = sTotalCutenessValidOnly / sNumberOfValidRecords
  •  
  • 'Output.
  • Cells(4, 6) = sNumberOfValidRecords
  • Cells(5, 6) = sAverageCutenessValidOnly

That's it! Remember, you can download the solution, and trace it with the debugger.

Ray
Ray

That backwards planning Adela did, that really helped me understand why the code was written that way.

Aye, I agree.

Notice Adela didn't get bogged down in detailed coding when she was working backwards. I recommend that. If you think about all the deets when you're planning, you'll lose track of the big picture. And it's the big picture you're working on.

Do the planning thing first. Then go back and write the code. Your brain will thank you for it.

Adela
Adela

A question. What if all of the data is invalid? The count will still be zero after the loop ends, right?

Oh, wow! Great thinking! Here's what Adela means:

  • Sub run()
  •     ...
  •     'Initialize.
  •     sNumberOfValidRecords = 0
  •     sTotalCutenessValidOnly = 0
  •     ...
  •     'Loop over records.
  •     For sRecordNumber = 1 To sNumberOfRecords
  •         ...
  •         'Is the data valid?
  •         If tIsDataOk = "yes" Then
  •             'Processing.
  •             sNumberOfValidRecords = sNumberOfValidRecords + 1 Never happens!
  •             sTotalCutenessValidOnly = sTotalCutenessValidOnly + sCurrentCuteness
  •         End If
  •     Next sRecordNumber
  •     sNumberOfValidRecords is still 0
  •     'Compute average.
  •     sAverageCutenessValidOnly = sTotalCutenessValidOnly / sNumberOfValidRecords
  •     Crashy crashy
  •     ...
  • End Sub

sNumberOfValidRecords starts at 0. If all of the data is invalid, it never increases. It's still 0 at the end of the loop.

What happens when you divide by 0? Crashy crashy.

Division by zero error message

You could use an If to allow for this:

  • Next sRecordNumber
  • 'Guard against no valid data.
  • If sNumberOfValidRecords = 0 Then
  •     Cells(4, 6) = "No valid records. Cannot compute average."
  •     End
  • End If
  • 'Compute average.
  • sAverageCutenessValidOnly = sTotalCutenessValidOnly / sNumberOfValidRecords
  •  
  • 'Output.
  • Cells(4, 6) = sNumberOfValidRecords
  • Cells(5, 6) = sAverageCutenessValidOnly

This is the guard pattern at work.

Pattern

Guard pattern

Prevent your code from doing something crazy.

Exercises

Exercise

Goat weight program

Cthulhu thinks its goat friends are looking a little thin, so it sponsors a weight gain program. Goats are weighed at the start and end of a week. The more weight gained, the better!

You can download the start workbook. It looks like this:

Start

Write a program that computes the average start and end weights, and the average gain. Only include valid data. The names are all OK, so you don't need to check them. However, as you can see, there's some bad numeric data. Exclude it from the analysis. Also exclude negative weights. No floating goats!

Here's what the output should look like:

Output

Upload your solution. The usual standards apply.

Exercise

Goat ball spread

Cthulhu is thinking about betting on goat ball point spreads. It wants you to figure out some statistics.

Point spread is the difference between scores in a game. So if:

  • The New York Nannies scored 10 points and the Buffalo Billies scored 7, the spread would be 3 for the game.
  • The Brisbane Bed Bugs scored 11 points and the Yosemite Yolks scores 16, the spread would be 5.
  • The average of the spreads for the games would be (3+5)/2 = 4.

You can download the start workbook with the data. It looks like this:

Start

The scores should all be numbers that are zero or more. There are data errors.

Write a program to compute the two counts and the mean, as well as the score differences. Here's the output:

Output

Differences are always positive, or zero.

You can assume that at least some of the data is valid.

Upload your solution. The usual standards apply.

Exercise

Bouncy boots

Goats love to bounce. Like Tiggers.

Boing!

Cthulhu got its goaty friends bouncy boots. Do the boots work? Write a program to find out.

Your worksheet should start like this:

Start

You can download the start worksheet.

Each data row has three values:

  • Goat name. Should not be blank. A name that is just spaces counts as blank.
  • Height goat jumps, without bouncy boots (meters). Should be a number that's zero or more.
  • Height goat jumps, with bouncy boots (meters). Should be a number that's zero or more.

Your program should validate the data, and show some statistics.

Output

Upload your solution. The usual standards apply.

Exercise

Database lesson

Goats love databases. They like nothing better than writing SQL queries.

Cthulhu teaches an SQL class. It had an idea for improving a lesson. It gave some goats the old lesson, then a test (the "before" test), then the new lesson, then another test (the "after" test). What happened to the scores?

Download the starting worksheet with the data. It's like this:

Start

Notice that scores can go down. That's normal; scores have some randomness to them.

Each record has three values:

  • Name.
  • Before. Number, zero or more.
  • After. Number, zero or more.

Some data might not fit the rules. You'll need to validate it.

Write a program to compute averages and counts. Here's the output.

Output

Your program should warn the user if more than 10% of the records are invalid. Don't forget to remove the message each time the program runs.

Upload your solution. The usual standards apply.

Attachments