# Range validation

Tags

Lesson contents

### Goat movie ratings

It's Wilde Week in goatville. The Wildes are like the Oscars, but for goat movies.

You have a worksheet like this:

The data starts in row 9. Each record has four fields:

• Movie title
• Genre (romance, action, or scifi)
• Average critic rating, from 1 to 5
• Average viewer rating, from 1 to 5

The data is not reliable, unfortunately. You can see a few errors in the screen shot.

You want to compute the stuff in rows 4 to 6:

• Number of valid records
• Number of invalid records
• Number of records where the viewer rating is higher than the critic rating.

### Pattern

In the previous lesson on normalization, the pattern was something like this:

• For each record:
•   Grab the record's data from the Range
•   Normalize the data
•   Process the data

Let's add one more line. It will be the final shiny on this pattern.

• For each record:
•   Grab the record's data from the Range
•   Normalize the data
•   Validate the data
•   Process the data

So, after we normalize the fields in a record, we'll check them for errors.

​M0000t!

### All but validation

Here's the code, except for the validation.

• 'Set up the Range.
• Set rMovieRecords = Range(Cells(9, 1), Cells(9, 1).End(xlToRight).End(xlDown))
• sCountAllRecords = rMovieRecords.Rows.Count
• 'Initialize
• sCountValidRecords = 0
• sCountInvalidRecords = 0
• sCountViewerMoreThanCritic = 0
• For sRecordNumber = 1 To sCountAllRecords
•     tTitle = rMovieRecords.Cells(sRecordNumber, TITLE_COLUMN)
•     tGenre = rMovieRecords.Cells(sRecordNumber, GENRE_COLUMN)
•     sCriticRating = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
•     sViewerRating = rMovieRecords.Cells(sRecordNumber, VIEWER_RATING_COLUMN)
•
•     'Normalize title and genre
•     tTitle = Trim(tTitle)
•     tGenre = LCase(Trim(tGenre))
•
•     'Process
•     sCountValidRecords = sCountValidRecords + 1
•     If sViewerRating > sCriticRating Then
•         sCountViewerMoreThanCritic = sCountViewerMoreThanCritic + 1
•     End If
•
• Next sRecordNumber
•
• 'Output
• Cells(4, 2) = sCountValidRecords
• Cells(5, 2) = sCountInvalidRecords
• Cells(6, 2) = sCountViewerMoreThanCritic

There's a For loop to run through the data:

• sCountAllRecords = rMovieRecords.Rows.Count
• ...
• For sRecordNumber = 1 To sCountAllRecords
•     ...
• Next sRecordNumber

Inside the loop, the first thing is to grab the fields for the current record. Here's the pseudocode:

• For each record:
•   Grab the record's data from the Range
•   Normalize the data
•   Process the data

Here's the real code.

• For sRecordNumber = 1 To sCountAllRecords
•     tTitle = rMovieRecords.Cells(sRecordNumber, TITLE_COLUMN)
•     tGenre = rMovieRecords.Cells(sRecordNumber, GENRE_COLUMN)
•     sCriticRating = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
•     sViewerRating = rMovieRecords.Cells(sRecordNumber, VIEWER_RATING_COLUMN)
•     ...
• Next sRecordNumber

The next step is to normalize.

• For each record:
•   Grab the record's data from the Range
•   Normalize the data
•   Process the data

The code:

• For sRecordNumber = 1 To sCountAllRecords
•     ...
•     'Normalize title and genre
•     tTitle = Trim(tTitle)
•     tGenre = LCase(Trim(tGenre))
•     ...
• Next sRecordNumber

The last step is processing.

• For each record:
•   Grab the record's data from the Range
•   Normalize the data
•   Process the data

The code:

• For sRecordNumber = 1 To sCountAllRecords
•     ...
•     'Process
•     sCountValidRecords = sCountValidRecords + 1
•     If sViewerRating > sCriticRating Then
•         sCountViewerMoreThanCritic = sCountViewerMoreThanCritic + 1
•     End If
• Next sRecordNumber

### The flag pattern for validation

Let's add the validation. We have three fields to check:

• Genre (romance, action, or scifi)
• Average critic rating, from 1 to 5
• Average viewer rating, from 1 to 5

We want to do the processing only if all of the fields have valid data. If there's an error in any field of a record, we want to ignore the record.

For that, we can use the flag pattern again.

Pattern

A variable that summarizes the results of a bunch of code.

Here's what the code will be like:

1. tDataOk = "yes"
2. If field 1 is bad Then
3.     tDataOk = "no"
4. EndIf
5. If field 2 is bad Then
6.     tDataOk = "no"
7. EndIf
8. If field 3 is bad Then
9.     tDataOk = "no"
10. EndIf
11. If tDataOk = "yes" Then
12.     Processing
13. EndIf

If any of the fields are bad, then tDataOk will be "no" by the time we get to line 11: If tDataOk = "yes" Then.

### Validating genre

Genre is a string field with three valid values. Here's code that will validate it:

• If tGenre <> "romance" And tGenre <> "action" And tGenre <> "scifi" Then
•     tDataOk = "no"
• End If

Ray

What if the genre is "Romance" instead of "romance". That If won't pick it up.

Reflect

Is Ray right? Why, or why not?

If you were logged in as a student, the lesson would pause here, and you'd be asked to type in a response. If you want to try that out, ask for an account on this site.

Georgina

We already normalized the data, so "Romance" will have been converted to "romance".

• tGenre = LCase(Trim(tGenre))

### Validating ratings

Here's code to validate the critic rating:

1. tCriticRatingIn = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
2. ...
3. 'Check critic rating
4. 'Is it a number?
5. If Not IsNumeric(tCriticRatingIn) Then
6.     'Not a number
7.     tDataOk = "no"
8. Else
9.     'It is a number
10.     'Copy value to a numeric variable
11.     sCriticRating = tCriticRatingIn
12.     'Range check
13.     If sCriticRating < 1 Or sCriticRating > 5 Then
14.         tDataOk = "no"
15.     End If
16. End If

First, we put the field's value into a string variable (line 1). If we put it straight into a numeric variable, the program will crash if the cell contains something like "fish".

Then we check whether it's numeric (line 5). If it isn't, set the flag, and that's it.

If tCriticRatingIn is a number, then copy it to the numeric variable sCriticRating (line 11). Then do a range check.

### All the code

Let's put everything together. Check that this makes sense.

1. 'Set up the Range.
2. Set rMovieRecords = Range(Cells(9, 1), Cells(9, 1).End(xlToRight).End(xlDown))
3. sCountAllRecords = rMovieRecords.Rows.Count
4. 'Initialize
5. sCountValidRecords = 0
6. sCountInvalidRecords = 0
7. sCountViewerMoreThanCritic = 0
8. For sRecordNumber = 1 To sCountAllRecords
10.     tTitle = rMovieRecords.Cells(sRecordNumber, TITLE_COLUMN)
11.     tGenre = rMovieRecords.Cells(sRecordNumber, GENRE_COLUMN)
12.     tCriticRatingIn = rMovieRecords.Cells(sRecordNumber, CRITIC_RATING_COLUMN)
13.     tViewerRatingIn = rMovieRecords.Cells(sRecordNumber, VIEWER_RATING_COLUMN)
14.
15.     'Normalize title and genre
16.     tTitle = Trim(tTitle)
17.     tGenre = LCase(Trim(tGenre))
18.
19.     'Validate
20.     tDataOk = "yes"
21.     'Check genre
22.     If tGenre <> "romance" And tGenre <> "action" And tGenre <> "scifi" Then
23.         tDataOk = "no"
24.     End If
25.     'Check critic rating
26.     If Not IsNumeric(tCriticRatingIn) Then
27.         tDataOk = "no"
28.     Else
29.         sCriticRating = tCriticRatingIn
30.         If sCriticRating < 1 Or sCriticRating > 5 Then
31.             tDataOk = "no"
32.         End If
33.     End If
34.     'Check viewer rating
35.     If Not IsNumeric(tViewerRatingIn) Then
36.         tDataOk = "no"
37.     Else
38.         sViewerRating = tViewerRatingIn
39.         If sViewerRating < 1 Or sViewerRating > 5 Then
40.             tDataOk = "no"
41.         End If
42.     End If
43.
44.     'Process
45.     If tDataOk = "yes" Then
46.         'Data is valid.
47.         sCountValidRecords = sCountValidRecords + 1
48.         If sViewerRating > sCriticRating Then
49.             sCountViewerMoreThanCritic = sCountViewerMoreThanCritic + 1
50.         End If
51.     Else
52.         sCountInvalidRecords = sCountInvalidRecords + 1
53.     End If
54.
55. Next sRecordNumber
56.
57. 'Output
58. Cells(4, 2) = sCountValidRecords
59. Cells(5, 2) = sCountInvalidRecords
60. Cells(6, 2) = sCountViewerMoreThanCritic

Check out lines 51 and 52. They let you count the number of invalid records.

Now you know how to analyze a Range.

• For each record:
•   Grab the record's data from the Range
•   Normalize the data
•   Validate the data
•   Process the data

M0000t! If you understood everything here, give yourself a pat on the back. This is getting to be real data analysis.

​M0000ty, m0000ty, m0000t, m0000t!

Attachments