Hey, Adela!
Adela
What's up?
Can you write a program for us?
Adela
Sure. What's the task?
Cthulhu has data on goat music sales, and wants you to analyze it. Here's some of the data in the worksheet Cthulhu gives you:
The genres are:
- Folk rock
- Throat song
- Chant
- Kpop
- Bleat
- Hoof stomp
Some of the data could be invalid, since goat bands..., well, they are really into grass. Check that:
- The band name is present.
- The genre is present, and is one of the values above (but allow extra spaces, and upper/lowercase differences).
- Sales are numeric, more than 1,000, and less than 1,000,000.
Here's what Cthulhu wants to know:
- How many records are in the data set?
- How many records are valid?
- How many records are invalid?
- What is the percentage of invalid records?
Cthulhu wants some stats about valid data only.
- Total sales
- Average sales per band
- Band with the lowest sales: name, genre, and sales.
- Band with the highest sales: name, genre, and sales.
- Bleat music (Cthulhu's fave genre): bands, total sales, and average sales.
Ray
That's a lot of stuff to work out!
Adela
Hmm... it's not so bad. We've done everything on the list before. Just have to think it through, a bit at a time.
How are you going to start?
Adela
Let me check out data analysis tasks I've already done...
OK, here are some. This exercise was about processing ranges, using the accumulator pattern.
We learned a lot about validation. I can use code from there. Oh, right, the flag pattern. That'll make it easier. Easy is good.
I did this whole example on validating, and doing some stats.
There's filtering. we'll need that to break out the bleat music data.
We did the min/max code as well. Need that for this task, the highest and lowest sales.
OK, there's a lot of code I can copy.
Note
Copying from other programs you've written is a Good Thing. Copying other peoples' code is usually fine, too, if you have permission. You have permission to copy any of the code on this site, so go for it.
Ray
What about copying random code from, like, StackOverflow? Is that OK?
Stock Overflow is the best known site where programmers help each other. Copying the code is usually OK...
Georgina
You have but face!
What?
Georgina
You're about to say but!
Ha! Good one. Yes, there's a big but. Be careful copying code you don't understand. It might not do what you want, and you won't know that.
Remember that problem working out the minimum value in a data set? Georgina figured out we needed to initialize the minimum variable correctly, or it would stay at zero. What if you copied code that had that problem, but you didn't know it? ARGH!
Anyway, let's get back to Adela.
Note
Adela's reminding herself of similar work she's done before.
Adela
OK, I'm ready now.
I'll start with the big picture.
Hmm, I like to do this in VBE.
- 'Create Range object.
- 'Initialize.
- 'Get count of total records.
- 'For each record:
- 'Validate.
- 'Normalize genre (to make filtering easier).
- 'If data is OK:
- 'Accumulate overall totals and counts.
- 'Test for min and max sales.
- 'If genre is bleat: (filtering, use normalized genre).
- 'Add to bleat sales total and count.
- 'End loop.
- 'Compute averages.
- 'Output.
Note
Cool! Adela put in comments for the big picture.
Getting the overall structure right at the start makes a big difference. Because of all the practice in this course, Adela knew what to do.
Adela
OK, let's see...
I'll make the Range
... Now get the number of records... Then set up the loop... And Dim
the variables.
- Dim rMusicDataRange As Range
- 'Total number of records in the data set.
- Dim sCountRecordsInDataSet As Single
- 'Which record is being processed?
- Dim sCurrentRecordNumber As Single
- 'Create Range object.
- Set rMusicDataRange = Range(Cells(26, 1), Cells(26, 1).End(xlToRight).End(xlDown))
- 'Initialize.
- 'Get count of total records.
- sCountRecordsInDataSet = rMusicDataRange.Rows.Count
- 'For each record:
- For sCurrentRecordNumber = 1 To sCountRecordsInDataSet
- 'Validate.
- 'Normalize genre (to make filtering easier).
- 'If data is OK:
- 'Accumulate overall totals and counts.
- 'Test for min and max sales.
- 'If genre is bleat: (filtering, use normalized genre).
- 'Add to bleat sales total and count.
- 'End loop.
- Next sCurrentRecordNumber
- 'Compute averages.
- 'Output.
Note
A few things to notice.
- Adela put in the
For
and theNext
, the outer layers of the data processing onion. - She gave the variables long names, so she knows what the variables are for. She also added comments as she
Dim
ed the variables.
Adela
I'll work on the validation now. I'll use the flag pattern for that.
Start with the flag variable...
Oh! I forgot about counting invalid records.
- 'Flag for data validation.
- Dim tIsRecordOk As String
- ...
- 'For each record:
- For sCurrentRecordNumber = 1 To sCountRecordsInDataSet
- 'Validate.
- tIsRecordOk = "yes"
- 'Normalize genre (to make filtering easier).
- 'If data is OK:
- If bIsRecordOk Then
- 'Accumulate overall totals and counts.
- 'Test for min and max sales.
- 'If genre is bleat: (filtering, use normalized genre).
- 'Add to bleat sales total and count.
- Else
- 'Add to invalid count.
- End If
- Next sCurrentRecordNumber
Adela
Let's see... Each time through the loop, I'm processing the current record. I'll put the current record's fields into variables. That'll make it easier to write the validation.
- Const BAND_NAME_FIELD = 1
- Const GENRE_FIELD = 2
- Const SALES_FIELD = 3
- ...
- 'Data from row, before validation
- Dim tBandNameIn As String
- Dim tGenreIn As String
- Dim tSalesIn As String
- ...
- 'For each record:
- For sCurrentRecordNumber = 1 To sCountRecordsInDataSet
- 'Validate.
- tIsRecordOk = "yes"
- 'Get data for the current record.
- tBandNameIn = rMusicDataRange.Cells(sCurrentRecordNumber, BAND_NAME_FIELD)
- tGenreIn = rMusicDataRange.Cells(sCurrentRecordNumber, GENRE_FIELD)
- tSalesIn = rMusicDataRange.Cells(sCurrentRecordNumber, SALES_FIELD)
Note
Adela broke off one part of the task: getting the values of the current record. She could write the code without these variables, but using them makes things easier for her.
Easy is good.
Adela
The validation for the band is easy. It just has to be there. If it is, I'll put the band name into another variable, showing that the band name is validated.
- 'Data from row, before validation
- Dim tBandNameIn As String
- Dim tGenreIn As String
- Dim tSalesIn As String
- 'Data from row, after validation.
- Dim tBandName As String
- Dim tGenre As String
- Dim sSales As Single
- ...
- 'Loop over records.
- For sCurrentRecordNumber = 1 To sCountRecordsInDataSet
- 'Validate.
- tIsRecordOk = "yes"
- 'Get data for the current record.
- tBandNameIn = rMusicDataRange.Cells(sCurrentRecordNumber, BAND_NAME_FIELD)
- tGenreIn = rMusicDataRange.Cells(sCurrentRecordNumber, GENRE_FIELD)
- tSalesIn = rMusicDataRange.Cells(sCurrentRecordNumber, SALES_FIELD)
- 'Check band name.
- tBandNameIn = Trim(tBandNameIn)
- If tBandNameIn = "" Then
- tIsRecordOk = "no"
- Else
- tBandName = tBandNameIn
- End If
Note
Adela made two sets of variables for the fields of the current record.
Fields before validation:
tBandNameIn
tGenreIn
tSalesIn
Fields after validation:
tBandName
tGenre
sSales
Georgina
That's a good idea, having two sets of variables. I like it!
And you trimmed the band name, just in case.
- 'Check band name.
- tBandNameIn = Trim(tBandNameIn)
Ray
OK, but you have sales as string, then single.
- 'Data from row, before validation
- ...
- Dim tSalesIn As String
- 'Data from row, after validation.
- ...
- Dim sSales As Single
Why?
Adela
That's because there could be an error in the sales data. Someone might have typed something that's not a number. If you tried to put that value into a Single
, the program would crash, with a type mismatch error.
Ray
Doh! Of course! I've done that a bunch of times in exercises.
More validation code.
- 'Check genre name.
- tGenreIn = LCase(Trim(tGenreIn))
- If _
- tGenreIn <> "folk rock" And tGenreIn <> "throat song" _
- And tGenreIn <> "chant" And tGenreIn <> "kpop" _
- And tGenreIn <> "bleat" And tGenreIn <> "hoof stomp" Then
- bIsRecordOk = False
- Else
- tGenre = tGenreIn
- End If
- 'Check sales.
- If Not IsNumeric(tSalesIn) Then
- tIsRecordOk = "no"
- Else
- sSales = tSalesIn
- If sSales < 1000 Or sSales > 1000000 Then
- tIsRecordOk = "no"
- End If
- End If
Georgina
I like the way you laid out that long If
, breaking up the lines. Easy to see the genres. Easy is good.
Ethan
There's the normalization of genre, too. Nice work!
Adela
Thanks!
Now I'll add in some counts, while I'm thinking of it.
- Dim sCountValidRecords As Single
- Dim sCountInvalidRecords As Single
- ...
- 'If data is OK
- If tIsRecordOk = "yes" Then
- 'Current record has valid data.
- sCountValidRecords = sCountValidRecords + 1
- 'Compute counts, totals, and stuff.
- Else
- 'Something in the current record is invalid.
- sCountInvalidRecords = sCountInvalidRecords + 1
- End If
Ray
Do you need to initialize the counts, or did I get that wrong?
Adela
Oh, right! Thanks.
- Dim sCountValidRecords As Single
- Dim sCountInvalidRecords As Single
- ...
- 'Initialize.
- sCountValidRecords = 0
- sCountInvalidRecords = 0
- ...
- 'If data is OK
- If tIsRecordOk = "yes" Then
- 'Current record has valid data.
- sCountValidRecords = sCountValidRecords + 1
- 'Compute counts, totals, and stuff.
- Else
- 'Something in the current record is invalid.
- sCountInvalidRecords = sCountInvalidRecords + 1
- End If
Adela
Total sales, too.
- Dim sCountValidRecords As Single
- Dim sCountInvalidRecords As Single
- Dim sTotalSales As Single
- ...
- 'Initialize.
- sCountValidRecords = 0
- sCountInvalidRecords = 0
- sTotalSales = 0
- ...
- 'If data is OK
- If tIsRecordOk = "yes" Then
- 'Current record has valid data.
- sCountValidRecords = sCountValidRecords + 1
- 'Compute counts, totals, and stuff.
- sTotalSales = sTotalSales + sSales
- Else
- 'Something in the current record is invalid.
- sCountInvalidRecords = sCountInvalidRecords + 1
- End If
Adela
OK, lowest and highest. Let me check the lesson on that... OK, I need some new variables, some If
s, and initialization.
- Dim sLowSales As Single
- Dim tLowSalesBandName As String
- Dim tLowSalesBandGenre As String
- Dim sHighSales As Single
- Dim tHighSalesBandName As String
- Dim tHighSalesBandGenre As String
- ...
- 'Initialize.
- sLowSales = 3.4028235E+38
- sHighSales = -3.4028235E+38
- ...
- 'Loop over records.
- For sCurrentRecordNumber = 1 To sCountRecordsInDataSet
- If tIsRecordOk = "yes" Then
- ...
- 'Check for new lows and highs.
- If sSales < sLowSales Then
- sLowSales = sSales
- tLowSalesBandName = tBandName
- tLowSalesBandGenre = tGenre
- End If
- If sSales > sHighSales Then
- sHighSales = sSales
- tHighSalesBandName = tBandName
- tHighSalesBandGenre = tGenre
- End If
- Else
- 'Something in the current record is invalid.
- sCountInvalidRecords = sCountInvalidRecords + 1
- End If
- Next sCurrentRecordNumber
Adela
OK, now the filtering code for bands in the bleat genre.
- 'Bleat genre results.
- Dim sCountBleaters As Single
- Dim sTotalSalesBleaters As Single
- Dim sAverageSalesBleaters As Single
- ...
- 'Initialize.
- ...
- sCountBleaters = 0
- sTotalSalesBleaters = 0
- ...
- 'For each record:.
- For sCurrentRecordNumber = 1 To sCountRecordsInDataSet
- If tIsRecordOk = "yes" Then
- ...
- 'Just the bleaters.
- If tGenre = "bleat" Then
- sCountBleaters = sCountBleaters + 1
- sTotalSalesBleaters = sTotalSalesBleaters + sales
- End If
- Else
- 'Something in the current record is invalid.
- sCountInvalidRecords = sCountInvalidRecords + 1
- End If
- Next sCurrentRecordNumber
Adela
Nice! That's most of it. Just some calculations, and output.
- 'Compute averages.
- sAverageSales = sTotalSales / sCountValidRecords
- sAverageSalesBleaters = sTotalSalesBleaters / sCountBleaters
- 'Compute percent invalid records.
- sPercentInvalidRecords = sCountInvalidRecords / sCountRecordsInDataSet * 100
- 'Output.
- Cells(4, 2) = sCountRecordsInDataSet
- Cells(5, 2) = sCountValidRecords
- Cells(6, 2) = sCountInvalidRecords
- Cells(7, 2) = sPercentInvalidRecords
- Cells(9, 2) = sTotalSales
- Cells(10, 2) = sAverageSales
- Cells(12, 3) = tLowSalesBandName
- Cells(13, 3) = tLowSalesBandGenre
- Cells(14, 3) = sLowSales
- Cells(16, 3) = tHighSalesBandName
- Cells(17, 3) = tHighSalesBandGenre
- Cells(18, 3) = sHighSales
- Cells(20, 3) = sCountBleaters
- Cells(21, 3) = sTotalSalesBleaters
- Cells(22, 3) = sAverageSalesBleaters
You can download Adela's solution
Up next
Now there are some final exercises for the course.