Course materials Github: https://github.com/machinelearningplu...
Join Pandas course on ML+: https://edu.machinelearningplus.com/c...
--------------------
Often you want to create a new column in your data frame based on one or more existing columns in a data frame. Or it could be any arbitrary transformation.
For example, you want to take log of a numeric column or square root or square like this, that there can be any number of transformations that you want to do. In this one, we are going to work with the tips 100 data set what this contains is, this contains transaction information about 100 different transactions for a given restaurant.
This is completely fictional data, even though you have columns like credit card number, payment ID, and all this is completely fictional. So no privacy information over here. Also, in this data, you have the total bill, information about the customer, the size, number of customers visiting for that particular bill, so on and so forth, right.
This information you have suppose you want to create a new column based on this information. Suppose you want to create a percentage of the tips that is tip divided by the total bill. Or you might also want to create a column like tip per person, how we would normally do it is we will use the DF dot log function, open colon open brackets, then the percentage equals to df dot divided by df dot total this column. This is a usual way of creating a new column in a data frame.
But imagine you want to create five, six different columns in a data frame, right? That would involve creating five different statements like this, that can make your code a bit cumbersome. What you could instead do is combine all of those statements into one single function, which is df dot assign. Within df dot assign, you can assign you can create as many variables as you want in one single function call. So the code for assign is quite straightforward.
And it looks something like this. Here you have df dot assign name of the column that you want to create, then define the logic in a lambda function, the logic that you want to create. And that's it. Now, how you write this lambda function is important. That is, the argument that your lambda function receives is one single row of your data frame at a given time. So suppose if this is your data frame that has a lot of rows over here, at any point in time, the value of x will be the entire row at a given time.
So, this computation will happen or row wise, the entire values of a given row will be taken as a series and passed on to this x. From this row, you can extract the respective columns X and X total bill from this you are extracting it and doing your computations. So that is how you write a dot assign based column assignment. Alright, next function is map map is a very, very useful function often used, there are two different very common use cases for the map function. Let's understand both of them with an example. First one is for example, we have a column called date. This contains the different days of the week in text format. We want to convert this into numbers.
For example, we want Monday to show up as one Tuesday to show our past two, and so on. To do this, you can use map and password or dictionary argument that gives the mapping of each and every value in your column to the corresponding value it will change to, for example, Saturday SAT, you want to change it to six thurs you want to change it to four, and so on. So first, you need to create that dictionary, then pass that dictionary to the map function. Let's see how it works.
So here, this is the dictionary Monday equal to one, Tuesday two and so on, then we take that particular column Now remember, this is a series The map function applies to a series not a data frame. Alright, to this map function pass in that dictionary that you want to map it against. On running this you should get that reflected the changes of plotting in the de column correct. Now let's see one more function not just dictionary you can use a lambda function also inside map function here, we have the credit card number, right we want to mask the credit card and show only the last four digits and you can do that this way you define that lambda function, this lambda functions x right
This will take the entire series as an argument, not the entire series you are given the series you are applying the map on a given series, it will take each and every cell of that series as the argument right? So x, this particular value will become x from this you are extracting the last four items of that particular sale. Right? That's what you're doing here. Let's run this and see the result.