Want to learn more? Take the full course at https://learn.datacamp.com/courses/er... at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
Nice work summarizing the data. You probably noticed that these summaries don't help us much yet. They would be more useful if we could calculate summaries based on different groups of data, such as by precinct or crime type. In this video you will learn about functions that only include data IF they meet certain conditions. Let's take a look.
To count the number of crimes by precinct or compute the average time by precinct, we first need a list of all the possible values of that column.
This is a perfect use for the UNIQUE() function.
It takes a set of data and returns an unduplicated list of the distinct values in the column.
For example, calling the UNIQUE() function on the C column will give you a list of all the precinct names in our dataset, listing each precinct only once.
Now that you have a list of all possible values, you can easily compute summary statistics for groups within the dataset.
For example, you might want to count the number of records for each group, such as the number of crimes by precinct.
You can use the COUNTIF() function to do this. It takes two arguments: the range refers to the cells that you want to count, and the criterion refers to the value that you want to count.
So you can use the unique list of precincts and the COUNTIF() function to count the number of crimes per precinct.
You could also count the number of crimes that occurred after 8:00 by changing the criteria to all crimes with a time greater than 2000. Recall that our data list times in 24-hour format.
We can also use the AVERAGEIF() and SUMIF() functions to compute group-wise summary statistics. These functions take three arguments instead of the two required by COUNTIF():
They require a criteria range,
a criterion,
and a range to which they apply the function.
For example, you can calculate the average time at which a crime occurred by specifying the criterion range as the column that contains precinct names, the criterion as the name of the precinct of interest, and the column containing times as the range to which to apply the function.
Similarly, you can calculate the sum of crimes in various precincts by specifying the range of columns that contain precinct names,the criterion to evaluate, and the column containing counts of crimes in those precincts as the range to apply the SUM function.
So that will help us improve the insights from our data, but what if you want to restrict your counts, averages, or sums based on multiple criteria? Fortunately, spreadsheets has functions that do just that.
Let's start with the COUNTIFS() function.
It takes a practically unlimited number of pairs of arguments, though we typically provide no more than three criteria,
and it allows for multiple criteria.
You can even apply different criteria to same range, as you might need to if you only wanted to count crimes between 8:00 and midnight.
Let's consider an example. Say you want to count the number of crimes in the West precinct that occurred after 8:00 in the evening.
You would call the COUNTIFS function, specify the first range, then the criteria to check, then the second range and the criteria to check.
We can use the AVERAGEIFS() function similarly.
The order of the arguments in the AVERAGEIFS() function is a bit different from that of COUNTIFS(). For example, the first argument of AVERAGEIFS() is the range of cells to include in the average, not the first criteria range.
It also takes pairs of ranges and criteria
and allows for multiple criteria for the same range.
Now you're armed with multiple functions for creating and counting groups of data. Let's get some practice.
#Spreadsheets #DataCamp #SpreadsheetsTutorial #Defining #error #uncertainty