Lesson contents
Goat movie ratings
It's Wilde Week in goatville. The Wildes are like the Oscars, but for goat movies.
You have a worksheet like this:
The data starts in row 9. Each record has four fields:
- Movie title
- Genre (romance, action, or scifi)
- Average critic rating, from 1 to 5
- Average viewer rating, from 1 to 5
The data is not reliable, unfortunately. You can see a few errors in the screen shot.
You want to compute the stuff in rows 4 to 6:
- Number of valid records
- Number of invalid records
- Number of records where the viewer rating is higher than the critic rating.
You can download a solution.
Pattern
In the previous lesson on normalization, the pattern was something like this:
- For each record:
- Grab the record's data from the Range
- Normalize the data
- Process the data
Let's add one more line. It will be the final shiny on this pattern.
- For each record:
- Grab the record's data from the Range
- Normalize the data
- Validate the data
- Process the data
So, after we normalize the fields in a record, we'll check them for errors.
M0000t!
All but validation
Here's the code, except for the validation.
- 'Set up the Range.
- Set rMovieRecords = Range(Cells(9, 1), Cells(9, 1).End(xlToRight).End(xlDown))
- sCountAllRecords = rMovieRecords.Rows.Count
- 'Initialize
- sCountValidRecords = 0
- sCountInvalidRecords = 0
- sCountViewerMoreThanCritic = 0
- For sRecordNumber = 1 To sCountAllRecords
- 'Read the current record
- tTitle = rMovieRecords.Cells(sRecordNumber, TITLE_COLUMN)
- tGenre = rMovieRecords.Cells(sRecordNumber, GENRE_COLUMN)
- sCriticRating = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
- sViewerRating = rMovieRecords.Cells(sRecordNumber, VIEWER_RATING_COLUMN)
- 'Normalize title and genre
- tTitle = Trim(tTitle)
- tGenre = LCase(Trim(tGenre))
- 'Process
- sCountValidRecords = sCountValidRecords + 1
- If sViewerRating > sCriticRating Then
- sCountViewerMoreThanCritic = sCountViewerMoreThanCritic + 1
- End If
- Next sRecordNumber
- 'Output
- Cells(4, 2) = sCountValidRecords
- Cells(5, 2) = sCountInvalidRecords
- Cells(6, 2) = sCountViewerMoreThanCritic
There's a For
loop to run through the data:
- sCountAllRecords = rMovieRecords.Rows.Count
- ...
- For sRecordNumber = 1 To sCountAllRecords
- ...
- Next sRecordNumber
Inside the loop, the first thing is to grab the fields for the current record. Here's the pseudocode:
- For each record:
- Grab the record's data from the Range
- Normalize the data
- Process the data
Here's the real code.
- For sRecordNumber = 1 To sCountAllRecords
- 'Read the current record
- tTitle = rMovieRecords.Cells(sRecordNumber, TITLE_COLUMN)
- tGenre = rMovieRecords.Cells(sRecordNumber, GENRE_COLUMN)
- sCriticRating = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
- sViewerRating = rMovieRecords.Cells(sRecordNumber, VIEWER_RATING_COLUMN)
- ...
- Next sRecordNumber
The next step is to normalize.
- For each record:
- Grab the record's data from the Range
- Normalize the data
- Process the data
The code:
- For sRecordNumber = 1 To sCountAllRecords
- ...
- 'Normalize title and genre
- tTitle = Trim(tTitle)
- tGenre = LCase(Trim(tGenre))
- ...
- Next sRecordNumber
The last step is processing.
- For each record:
- Grab the record's data from the Range
- Normalize the data
- Process the data
The code:
- For sRecordNumber = 1 To sCountAllRecords
- ...
- 'Process
- sCountValidRecords = sCountValidRecords + 1
- If sViewerRating > sCriticRating Then
- sCountViewerMoreThanCritic = sCountViewerMoreThanCritic + 1
- End If
- Next sRecordNumber
The flag pattern for validation
Let's add the validation. We have three fields to check:
- Genre (romance, action, or scifi)
- Average critic rating, from 1 to 5
- Average viewer rating, from 1 to 5
We want to do the processing only if all of the fields have valid data. If there's an error in any field of a record, we want to ignore the record.
For that, we can use the flag pattern again.
Here's what the code will be like:
- tDataOk = "yes"
- If field 1 is bad Then
- tDataOk = "no"
- EndIf
- If field 2 is bad Then
- tDataOk = "no"
- EndIf
- If field 3 is bad Then
- tDataOk = "no"
- EndIf
- If tDataOk = "yes" Then
- Processing
- EndIf
If any of the fields are bad, then tDataOk
will be "no" by the time we get to line 11: If tDataOk = "yes" Then
.
Validating genre
Genre is a string field with three valid values. Here's code that will validate it:
- If tGenre <> "romance" And tGenre <> "action" And tGenre <> "scifi" Then
- tDataOk = "no"
- End If
Ray
What if the genre is "Romance" instead of "romance". That If
won't pick it up.
Is Ray right? Why, or why not?
Georgina
We already normalized the data, so "Romance" will have been converted to "romance".
- tGenre = LCase(Trim(tGenre))
Validating ratings
Here's code to validate the critic rating:
- tCriticRatingIn = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
- ...
- 'Check critic rating
- 'Is it a number?
- If Not IsNumeric(tCriticRatingIn) Then
- 'Not a number
- tDataOk = "no"
- Else
- 'It is a number
- 'Copy value to a numeric variable
- sCriticRating = tCriticRatingIn
- 'Range check
- If sCriticRating < 1 Or sCriticRating > 5 Then
- tDataOk = "no"
- End If
- End If
First, we put the field's value into a string variable (line 1). If we put it straight into a numeric variable, the program will crash if the cell contains something like "fish".
Then we check whether it's numeric (line 5). If it isn't, set the flag, and that's it.
If tCriticRatingIn
is a number, then copy it to the numeric variable sCriticRating
(line 11). Then do a range check.
All the code
Let's put everything together. Check that this makes sense.
- 'Set up the Range.
- Set rMovieRecords = Range(Cells(9, 1), Cells(9, 1).End(xlToRight).End(xlDown))
- sCountAllRecords = rMovieRecords.Rows.Count
- 'Initialize
- sCountValidRecords = 0
- sCountInvalidRecords = 0
- sCountViewerMoreThanCritic = 0
- For sRecordNumber = 1 To sCountAllRecords
- 'Read the current record
- tTitle = rMovieRecords.Cells(sRecordNumber, TITLE_COLUMN)
- tGenre = rMovieRecords.Cells(sRecordNumber, GENRE_COLUMN)
- tCriticRatingIn = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
- tViewerRatingIn = rMovieRecords.Cells(sRecordNumber, VIEWER_RATING_COLUMN)
- 'Normalize title and genre
- tTitle = Trim(tTitle)
- tGenre = LCase(Trim(tGenre))
- 'Validate
- tDataOk = "yes"
- 'Check genre
- If tGenre <> "romance" And tGenre <> "action" And tGenre <> "scifi" Then
- tDataOk = "no"
- End If
- 'Check critic rating
- If Not IsNumeric(tCriticRatingIn) Then
- tDataOk = "no"
- Else
- sCriticRating = tCriticRatingIn
- If sCriticRating < 1 Or sCriticRating > 5 Then
- tDataOk = "no"
- End If
- End If
- 'Check viewer rating
- If Not IsNumeric(tViewerRatingIn) Then
- tDataOk = "no"
- Else
- sViewerRating = tViewerRatingIn
- If sViewerRating < 1 Or sViewerRating > 5 Then
- tDataOk = "no"
- End If
- End If
- 'Process
- If tDataOk = "yes" Then
- 'Data is valid.
- sCountValidRecords = sCountValidRecords + 1
- If sViewerRating > sCriticRating Then
- sCountViewerMoreThanCritic = sCountViewerMoreThanCritic + 1
- End If
- Else
- sCountInvalidRecords = sCountInvalidRecords + 1
- End If
- Next sRecordNumber
- 'Output
- Cells(4, 2) = sCountValidRecords
- Cells(5, 2) = sCountInvalidRecords
- Cells(6, 2) = sCountViewerMoreThanCritic
Check out lines 51 and 52. They let you count the number of invalid records.
Now you know how to analyze a Range
.
- For each record:
- Grab the record's data from the Range
- Normalize the data
- Validate the data
- Process the data
M0000t! If you understood everything here, give yourself a pat on the back. This is getting to be real data analysis.
M0000ty, m0000ty, m0000t, m0000t!