Validating during Range processing

Lesson contents

Goat movie ratings

It's Wilde Week in goatville. The Wildes are like the Oscars, but for goat movies.

You have a worksheet like this:

Movie data

The data starts in row 9. Each record has four fields:

  • Movie title
  • Genre (romance, action, or scifi)
  • Average critic rating, from 1 to 5
  • Average viewer rating, from 1 to 5

The data is not reliable, unfortunately. You can see a few errors in the screen shot.

You want to compute the stuff in rows 4 to 6:

  • Number of valid records
  • Number of invalid records
  • Number of records where the viewer rating is higher than the critic rating.

You can download a solution.

Pattern

In the previous lesson on normalization, the pattern was something like this:

  • For each record:
  •   Grab the record's data from the Range
  •   Normalize the data
  •   Process the data

Let's add one more line. It will be the final shiny on this pattern.

  • For each record:
  •   Grab the record's data from the Range
  •   Normalize the data
  •   Validate the data
  •   Process the data

So, after we normalize the fields in a record, we'll check them for errors.

Goat

​M0000t!

All but validation

Here's the code, except for the validation.

  • 'Set up the Range.
  • Set rMovieRecords = Range(Cells(9, 1), Cells(9, 1).End(xlToRight).End(xlDown))
  • sCountAllRecords = rMovieRecords.Rows.Count
  • 'Initialize
  • sCountValidRecords = 0
  • sCountInvalidRecords = 0
  • sCountViewerMoreThanCritic = 0
  • For sRecordNumber = 1 To sCountAllRecords
  •     'Read the current record
  •     tTitle = rMovieRecords.Cells(sRecordNumber, TITLE_COLUMN)
  •     tGenre = rMovieRecords.Cells(sRecordNumber, GENRE_COLUMN)
  •     sCriticRating = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
  •     sViewerRating = rMovieRecords.Cells(sRecordNumber, VIEWER_RATING_COLUMN)
  •  
  •     'Normalize title and genre
  •     tTitle = Trim(tTitle)
  •     tGenre = LCase(Trim(tGenre))
  •  
  •     'Process
  •     sCountValidRecords = sCountValidRecords + 1
  •     If sViewerRating > sCriticRating Then
  •         sCountViewerMoreThanCritic = sCountViewerMoreThanCritic + 1
  •     End If
  •  
  • Next sRecordNumber
  •  
  • 'Output
  • Cells(4, 2) = sCountValidRecords
  • Cells(5, 2) = sCountInvalidRecords
  • Cells(6, 2) = sCountViewerMoreThanCritic

There's a For loop to run through the data:

  • sCountAllRecords = rMovieRecords.Rows.Count
  • ...
  • For sRecordNumber = 1 To sCountAllRecords
  •     ...
  • Next sRecordNumber

Inside the loop, the first thing is to grab the fields for the current record. Here's the pseudocode:

  • For each record:
  •   Grab the record's data from the Range
  •   Normalize the data
  •   Process the data

Here's the real code.

  • For sRecordNumber = 1 To sCountAllRecords
  •     'Read the current record
  •     tTitle = rMovieRecords.Cells(sRecordNumber, TITLE_COLUMN)
  •     tGenre = rMovieRecords.Cells(sRecordNumber, GENRE_COLUMN)
  •     sCriticRating = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
  •     sViewerRating = rMovieRecords.Cells(sRecordNumber, VIEWER_RATING_COLUMN)
  •     ...
  • Next sRecordNumber

The next step is to normalize.

  • For each record:
  •   Grab the record's data from the Range
  •   Normalize the data
  •   Process the data

The code:

  • For sRecordNumber = 1 To sCountAllRecords
  •     ...
  •     'Normalize title and genre
  •     tTitle = Trim(tTitle)
  •     tGenre = LCase(Trim(tGenre))
  •     ...
  • Next sRecordNumber

The last step is processing.

  • For each record:
  •   Grab the record's data from the Range
  •   Normalize the data
  •   Process the data

The code:

  • For sRecordNumber = 1 To sCountAllRecords
  •     ...
  •     'Process
  •     sCountValidRecords = sCountValidRecords + 1
  •     If sViewerRating > sCriticRating Then
  •         sCountViewerMoreThanCritic = sCountViewerMoreThanCritic + 1
  •     End If
  • Next sRecordNumber

The flag pattern for validation

Let's add the validation. We have three fields to check:

  • Genre (romance, action, or scifi)
  • Average critic rating, from 1 to 5
  • Average viewer rating, from 1 to 5

We want to do the processing only if all of the fields have valid data. If there's an error in any field of a record, we want to ignore the record.

For that, we can use the flag pattern again.

Pattern

Flag

A variable that summarizes the results of a bunch of code.

Here's what the code will be like:

  1. tDataOk = "yes"
  2. If field 1 is bad Then
  3.     tDataOk = "no"
  4. EndIf
  5. If field 2 is bad Then
  6.     tDataOk = "no"
  7. EndIf
  8. If field 3 is bad Then
  9.     tDataOk = "no"
  10. EndIf
  11. If tDataOk = "yes" Then
  12.     Processing
  13. EndIf

If any of the fields are bad, then tDataOk will be "no" by the time we get to line 11: If tDataOk = "yes" Then.

Validating genre

Genre is a string field with three valid values. Here's code that will validate it:

  • If tGenre <> "romance" And tGenre <> "action" And tGenre <> "scifi" Then
  •     tDataOk = "no"
  • End If
Ray
Ray

What if the genre is "Romance" instead of "romance". That If won't pick it up.

Reflect

Is Ray right? Why, or why not?

(If you were logged in as a student, the lesson would pause here, and you'd be asked to type in a response. If you want to try that out, ask for an account on this site.)
Georgina
Georgina

We already normalized the data, so "Romance" will have been converted to "romance".

  • tGenre = LCase(Trim(tGenre))

Validating ratings

Here's code to validate the critic rating:

  1. tCriticRatingIn = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
  2. ...
  3. 'Check critic rating
  4. 'Is it a number?
  5. If Not IsNumeric(tCriticRatingIn) Then
  6.     'Not a number
  7.     tDataOk = "no"
  8. Else
  9.     'It is a number
  10.     'Copy value to a numeric variable
  11.     sCriticRating = tCriticRatingIn
  12.     'Range check
  13.     If sCriticRating < 1 Or sCriticRating > 5 Then
  14.         tDataOk = "no"
  15.     End If
  16. End If

First, we put the field's value into a string variable (line 1). If we put it straight into a numeric variable, the program will crash if the cell contains something like "fish".

Then we check whether it's numeric (line 5). If it isn't, set the flag, and that's it.

If tCriticRatingIn is a number, then copy it to the numeric variable sCriticRating (line 11). Then do a range check.

All the code

Let's put everything together. Check that this makes sense.

  1. 'Set up the Range.
  2. Set rMovieRecords = Range(Cells(9, 1), Cells(9, 1).End(xlToRight).End(xlDown))
  3. sCountAllRecords = rMovieRecords.Rows.Count
  4. 'Initialize
  5. sCountValidRecords = 0
  6. sCountInvalidRecords = 0
  7. sCountViewerMoreThanCritic = 0
  8. For sRecordNumber = 1 To sCountAllRecords
  9.     'Read the current record
  10.     tTitle = rMovieRecords.Cells(sRecordNumber, TITLE_COLUMN)
  11.     tGenre = rMovieRecords.Cells(sRecordNumber, GENRE_COLUMN)
  12.     tCriticRatingIn = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
  13.     tViewerRatingIn = rMovieRecords.Cells(sRecordNumber, VIEWER_RATING_COLUMN)
  14.  
  15.     'Normalize title and genre
  16.     tTitle = Trim(tTitle)
  17.     tGenre = LCase(Trim(tGenre))
  18.  
  19.     'Validate
  20.     tDataOk = "yes"
  21.     'Check genre
  22.     If tGenre <> "romance" And tGenre <> "action" And tGenre <> "scifi" Then
  23.         tDataOk = "no"
  24.     End If
  25.     'Check critic rating
  26.     If Not IsNumeric(tCriticRatingIn) Then
  27.         tDataOk = "no"
  28.     Else
  29.         sCriticRating = tCriticRatingIn
  30.         If sCriticRating < 1 Or sCriticRating > 5 Then
  31.             tDataOk = "no"
  32.         End If
  33.     End If
  34.     'Check viewer rating
  35.     If Not IsNumeric(tViewerRatingIn) Then
  36.         tDataOk = "no"
  37.     Else
  38.         sViewerRating = tViewerRatingIn
  39.         If sViewerRating < 1 Or sViewerRating > 5 Then
  40.             tDataOk = "no"
  41.         End If
  42.     End If
  43.  
  44.     'Process
  45.     If tDataOk = "yes" Then
  46.         'Data is valid.
  47.         sCountValidRecords = sCountValidRecords + 1
  48.         If sViewerRating > sCriticRating Then
  49.             sCountViewerMoreThanCritic = sCountViewerMoreThanCritic + 1
  50.         End If
  51.     Else
  52.         sCountInvalidRecords = sCountInvalidRecords + 1
  53.     End If
  54.  
  55. Next sRecordNumber
  56.  
  57. 'Output
  58. Cells(4, 2) = sCountValidRecords
  59. Cells(5, 2) = sCountInvalidRecords
  60. Cells(6, 2) = sCountViewerMoreThanCritic

Check out lines 51 and 52. They let you count the number of invalid records.

Now you know how to analyze a Range.

  • For each record:
  •   Grab the record's data from the Range
  •   Normalize the data
  •   Validate the data
  •   Process the data

Up next

M0000t! If you understood everything here, give yourself a pat on the back. This is getting to be real data analysis.

Goat

​M0000ty, m0000ty, m0000t, m0000t!

OK, that's it for Ranges for a while. Let's move on to reading data from CSV files.

Attachments