# Validate then process

Tags

Noms!
Photo: Holly Vickery

### The pattern

Here's what we did in the previous lesson.

• For each record:
•   Grab the record's data from the Range
•   Normalize the data
•   Validate the data
•   Process the data

Let's add to the last step. We'll keep it simple: computing the average of valid data.

### Goat cuteness data

Cthulhu has some data on goat cuteness it wants you to analyze. Here's part of the start worksheet:

Cuteness should be a number, from 1 to 10.

When the Run button is pressed, show average cuteness, for valid data only.

Adela, could you give it a try?

### Code outline

OK.

Let's see... I'll work it backwards, but instead of doing everything backwards, I'll work out the big steps backwards. Then, we'll see.

We want an average for valid data, and the count of valid data records.

You know what... I'll start VBE, and type in code comments as I go.

• 'Output average cuteness (valid data only)
•
• 'Output number of valid records

Note

I often do this, too. I start by typing comments into the code editor I'm using, for VBA, C#, PHP, JavaScript, whatever language I'm using

OK, to work out the average, I'll need the total of valid cuteness, and the count. Oh! The count is in the output as well.

• 'Compute average cuteness (valid data only) from total and count.
•
• 'Output average cuteness (valid data only)
•
• 'Output number of valid records

I know how to total. That's the accumulator pattern...

Pattern

Loop over a data table. For each record, add a field to a total, count, or whatevs.

• 'For loop
•
•     'Total = Total + Cuteness
•
• 'Next
•
• 'Compute average cuteness (valid data only) from total and count.
•
• 'Output average cuteness (valid data only)
•
• 'Output number of valid records

Oh, but only for valid data. So I need to wrap the totaling thing in an If.

• 'For loop
•
•     'If cuteness is valid
•
•         'Total = Total + Cuteness
•
• 'Next
•
• 'Compute average cuteness (valid data only) from total and count.
•
• 'Output average cuteness (valid data only)
•
• 'Output number of valid records

Wait... I'll use the flag pattern, so I'll change the If to show that.

Pattern

A variable that summarizes the results of a bunch of code.

• 'For loop
•
•     'Data is OK = yes
•
•     'If cuteness is invalid:
•
•         'Data is OK = no
•
•     'If data is OK
•
•         'Total = Total + Cuteness
•
• 'Next
•
• 'Compute average cuteness (valid data only) from total and count.
•
• 'Output average cuteness (valid data only)
•
• 'Output number of valid records

Oh, it needs to say that cuteness is from the current record.

Some more finishing touches. Get cuteness for the current record, make the Range object, oh, and gotta count the valid records, too.

And... done!

• 'Make a Range
•
• 'For loop
•
•     'Get cuteness for current record
•
•     'Data is OK = yes
•
•     'If cuteness is invalid:
•
•         'Data is OK = no
•
•     'If data is OK
•
•         'Total = Total + Cuteness
•
•         'Count valid = Count valid + 1
•
• 'Next
•
• 'Compute average cuteness (valid data only) from total and count.
•
• 'Output average cuteness (valid data only)
•
• 'Output number of valid records

Georgina

Ethan

Yeah, that was nice!

OK, let's look at that again, after the code is written.

### Initialize

To compute an average, we need a total, and a count. Let's make those variables, and initialize them. Also, let's create the Range.

• Sub run()
•     Dim rData As Range
•     Dim sNumberOfRecords As Single
•     Dim sNumberOfValidRecords As Single
•     Dim sTotalCutenessValidOnly As Single
•     Dim sAverageCutenessValidOnly As Single
•
•     'Initialize.
•     sNumberOfValidRecords = 0
•     sTotalCutenessValidOnly = 0
•
•     'Create the Range.
•     Set rData = Range(Cells(4, 1), Cells(4, 1).End(xlToRight).End(xlDown))
•     sNumberOfRecords = rData.Rows.Count

I named the variables to remind ourselves what we're doing. Instead of `sTotalCuteness`, use `sTotalCutenessValidOnly`. This is optional, but if it helps you, do it.

OK, add the loop, and the code that reads in the current record. I'm going to start leaving out code you've already seen. You can download the solution to see it all.

• Sub run()
•     ...
•     Dim sRecordNumber As Single
•     Dim tUncheckedCurrentCuteness As String
•     Dim tCurrentName As String
•     Dim sCurrentCuteness As Single
•     ...
•
•     Const FIELD_NAME = 1
•     Const FIELD_CUTENESS = 2
•
•     'Initialize.
•     sNumberOfValidRecords = 0
•     sTotalCutenessValidOnly = 0
•
•     'Create the Range.
•     Set rData = Range(Cells(4, 1), Cells(4, 1).End(xlToRight).End(xlDown))
•     sNumberOfRecords = rData.Rows.Count
•
•     'Loop over records.
•     For sRecordNumber = 1 To sNumberOfRecords
•         'Get fields for current record
•         tCurrentName = rData.Cells(sRecordNumber, FIELD_NAME)
•         tUncheckedCurrentCuteness = rData.Cells(sRecordNumber, FIELD_CUTENESS)
•
•     Next sRecordNumber

Notice the `Consts`. They make the later code easier to read, and maintain.

### Validation

We'll only validate the cuteness field. The flag pattern.

• 'Initialize validation flag.
• tIsDataOk = "yes"
• 'Validate
• If Not IsNumeric(tUncheckedCurrentCuteness) Then
•     tIsDataOk = "no"
• Else
•     sCurrentCuteness = tUncheckedCurrentCuteness
•     If sCurrentCuteness < 1 Or sCurrentCuteness > 10 Then
•         tIsDataOk = "no"
•     End If
• End If
• 'Is the data valid?
• If tIsDataOk = "yes" Then
•     'Processing.
•
• End If

### Processing

We want to have a count and total. If the data is valid, add to total and count variables.

1. Option Explicit
2.
3. Sub run()
4.     ...
5.     'Initialize.
6.     sNumberOfValidRecords = 0
7.     sTotalCutenessValidOnly = 0
8.     ...
9.     'Loop over records.
10.     For sRecordNumber = 1 To sNumberOfRecords
11.         'Get fields for current record
12.         ...
13.         tUncheckedCurrentCuteness = rData.Cells(sRecordNumber, FIELD_CUTENESS)
14.         '(Move into sCurrentCuteness if valid)
15.         ...
16.         'Is the data valid?
17.         If tIsDataOk = "yes" Then
18.             'Processing.
19.             sNumberOfValidRecords = sNumberOfValidRecords + 1
20.             sTotalCutenessValidOnly = sTotalCutenessValidOnly + sCurrentCuteness
21.         End If
22.     Next sRecordNumber
23.     ...
24. End Sub

Lines 6 and 7 initialize variables for count and total.

Lines 10 to 22 loop over records. Line 13 gets a cuteness value for the current record. If it's valid, line 19 increases the count of valid records by one. Line 20 increases the total, by the new cuteness value.

### Average and output

OK, after the loop ends, we'll have the count and total we need to compute the average.

• 'Compute average.
• sAverageCutenessValidOnly = sTotalCutenessValidOnly / sNumberOfValidRecords
•
• 'Output.
• Cells(4, 6) = sNumberOfValidRecords
• Cells(5, 6) = sAverageCutenessValidOnly

That's it! Remember, you can download the solution, and trace it with the debugger.

Ray

That backwards planning Adela did, that really helped me understand why the code was written that way.

Aye, I agree.

Notice Adela didn't get bogged down in detailed coding when she was working backwards. I recommend that. If you think about all the deets when you're planning, you'll lose track of the big picture. And it's the big picture you're working on.

Do the planning thing first. Then go back and write the code. Your brain will thank you for it.

A question. What if all of the data is invalid? The count will still be zero after the loop ends, right?

Oh, wow! Great thinking! Here's what Adela means:

• Sub run()
•     ...
•     'Initialize.
•     sNumberOfValidRecords = 0
•     sTotalCutenessValidOnly = 0
•     ...
•     'Loop over records.
•     For sRecordNumber = 1 To sNumberOfRecords
•         ...
•         'Is the data valid?
•         If tIsDataOk = "yes" Then
•             'Processing.
•             sNumberOfValidRecords = sNumberOfValidRecords + 1 Never happens!
•             sTotalCutenessValidOnly = sTotalCutenessValidOnly + sCurrentCuteness
•         End If
•     Next sRecordNumber
•     sNumberOfValidRecords is still 0
•     'Compute average.
•     sAverageCutenessValidOnly = sTotalCutenessValidOnly / sNumberOfValidRecords
•     Crashy crashy
•     ...
• End Sub

`sNumberOfValidRecords` starts at 0. If all of the data is invalid, it never increases. It's still 0 at the end of the loop.

What happens when you divide by 0? Crashy crashy.

You could use an `If` to allow for this:

• Next sRecordNumber
• 'Guard against no valid data.
• If sNumberOfValidRecords = 0 Then
•     Cells(4, 6) = "No valid records. Cannot compute average."
•     End
• End If
• 'Compute average.
• sAverageCutenessValidOnly = sTotalCutenessValidOnly / sNumberOfValidRecords
•
• 'Output.
• Cells(4, 6) = sNumberOfValidRecords
• Cells(5, 6) = sAverageCutenessValidOnly

This is the guard pattern at work.

Pattern

Prevent your code from doing something crazy.

### Exercises

Exercise

Goat weight program

Cthulhu thinks its goat friends are looking a little thin, so it sponsors a weight gain program. Goats are weighed at the start and end of a week. The more weight gained, the better!

Write a program that computes the average start and end weights, and the average gain. Only include valid data. The names are all OK, so you don't need to check them. However, as you can see, there's some bad numeric data. Exclude it from the analysis. Also exclude negative weights. No floating goats!

Here's what the output should look like:

Exercise

Cthulhu is thinking about betting on goat ball point spreads. It wants you to figure out some statistics.

Point spread is the difference between scores in a game. So if:

• The New York Nannies scored 10 points and the Buffalo Billies scored 7, the spread would be 3 for the game.
• The Brisbane Bed Bugs scored 11 points and the Yosemite Yolks scores 16, the spread would be 5.
• The average of the spreads for the games would be (3+5)/2 = 4.

You can download the start workbook with the data. It looks like this:

The scores should all be numbers that are zero or more. There are data errors.

Write a program to compute the two counts and the mean, as well as the score differences. Here's the output:

Differences are always positive, or zero.

You can assume that at least some of the data is valid.

Exercise

Bouncy boots

Goats love to bounce. Like Tiggers.

Cthulhu got its goaty friends bouncy boots. Do the boots work? Write a program to find out.

Your worksheet should start like this:

Each data row has three values:

• Goat name. Should not be blank. A name that is just spaces counts as blank.
• Height goat jumps, without bouncy boots (meters). Should be a number that's zero or more.
• Height goat jumps, with bouncy boots (meters). Should be a number that's zero or more.

Your program should validate the data, and show some statistics.

Exercise

Database lesson

Goats love databases. They like nothing better than writing SQL queries.

Cthulhu teaches an SQL class. It had an idea for improving a lesson. It gave some goats the old lesson, then a test (the "before" test), then the new lesson, then another test (the "after" test). What happened to the scores?

Notice that scores can go down. That's normal; scores have some randomness to them.

Each record has three values:

• Name.
• Before. Number, zero or more.
• After. Number, zero or more.

Some data might not fit the rules. You'll need to validate it.

Write a program to compute averages and counts. Here's the output.

Your program should warn the user if more than 10% of the records are invalid. Don't forget to remove the message each time the program runs.