Welcome, everyone to this course in Python programming.
My name is Saurabh and inthis tutorial we are going to focus on each and everyimportant concept of Python.
So without wasting any time, let us move forward and have alook at the agenda for today.
So this is what we'll be discussing.
Today, we'll begin byunderstand what is Python, how it works, how we can install Python.
Then we'll understandwhat are the sequences and file operations in Python.
Next up, we are goingto focus on functions and Oops concepts.
After that, we'll be working with modules and how we can handleexceptions in Python.
Then you will be introducedto a couple of very important libraries called NumPy and Pandas.
After that, you'll understandhow you can perform data visualization using Matlab.
Then you will understandhow you can perform data manipulation usingall the three libraries that I've just mentioned.
After that, you are going to develop really cool applications, one is web map usingWeb Maps using Folium, another is motion detector using OpenCV.
I hope you guys found theagenda very interesting.
Varun will be the instructor for this particular session.
He has rich experiencein working with Python.
Over to you, Varun.
I will be your instructor for this course.
My name Varun.
Introduction to Python.
So the question is which programming languageshould you start with? I mean, of course, you havealready started with one as part of this course,but in Dave's opinion and my opinion, it has to be Python.
Simply because it is very,very simple and easy to learn.
In a few moments when westart looking at Python code, you will notice how it looks like English.
It doesn't even look like code.
Many people hear the word code, they start thinking of ones and zeros or they start thinking aboutoh it's going to be like a puzzle that you have to solve.
Nothing of that sort, really.
It's just a set of detailed instructions.
Like if you can write instructionsabout how to send a email also about how to send a postmail, if you can write very, very detailed instructionsabout how to send a post mail, you can probably code.
It's like that.
Next is that it is free and open source.
So in the sense that you don'thave to pay anybody money for using Python.
Now, one question might be that,hey, what's new about that? I don't need to payanybody to speak English, and English is a language.
But the thing is thatEnglish, Spanish, French, these languages, there is nocreator of these languages.
They're commonly developed.
So nobody really ownsEnglish as a language.
Nobody has the right to English.
Whereas, if you look atprogramming languages, they're created by a small set of people.
10, 15, 50, 100 people.
100 people is not a big group of people, it's relatively smallas compared to people who developed English.
People who developed Englishwords will be the authors, the writers, and theregular folks as well.
So that number would be millions of people who developed the language.
So open source is a conceptin software programming.
I strongly suggest that you go on YouTube and search for open source.
We'll come across somevery neat documentaries, I suggest you have a look at them.
But comparing this withsort of what is paid for, Matlab has a version which is paid for, which has some extra functionality which you need to pay for.
But Python is free, completely.
A lot of Python librariesare available for free.
There is a languagecalled Wolfram, W-O-L-F, like a wolf, R-A-M,which is very expensive.
You need to buy a license,you can't just use it, it's not allowed.
It's like Windows, you need topay a license fee to use it.
Now it's a high-levellanguage, so believe it or not, but the first three linesover here are actually valid Python code.
Forget the fourth one.
Some of you might beable to understand that, but the first three linesare valid, valid Python code.
That's why Python.
You don't have to dealwith the right side.
Don't think about this,this is you'll never, almost never going to lookat this and be supposed to figure out, oh what does this mean.
You're only supposed tolook at stuff like this and say okay, what does this mean.
Then it's portable, portablebeing that you can use it on multiple devices whetherthat's a Window's device, or a Linux device.
Python is very compatible that way.
It will learn different devices.
It's supports differentprogramming paradigms.
So paradigm is just afancy word for styles.
If you are a beginning with programming, don't bother with itunless and until you have, you know 600 to 1,000 hoursof programming under you.
Don't bother with it at all.
This is not relevant for you.
However, if you have someexperience with programming and you know what aprogramming paradigm is, just know that Python supportsmultiple and for those of you don't know about it, don't worry about it.
You will get there eventually.
I'm not saying it's notimportant, it is important, it's just that rightnow it's like running.
Like right now, ifwriting a line of code is the starting point, it'slike stumbling or crawling, thinking about paradigmsand styles of programming is kind of like running.
So I would not recommendthinking about it too much.
Next is Python is extensiblein the sense that you can invoke C and C++ libraries from Python.
So you can sort of callcode of other languages using Python.
That way it is very extensible, increases the usability of it.
Makes it work with other systems.
So Python's also used widelyby a lot of companies.
Let us look at a few of the giants.
So this would includeYouTube, Google, Dropbox, RaspberryPi, BitTorrent, Nasa, Netflix, NSA, and I mean, it's like amazing, right.
I mean, you look at howvaried and wide different these companies are.
They are internet companies to you know, national space agencies, tonational security agencies.
Really though breadth and the depth of it.
It's being used by all ofthem in some way or the other.
So it's not that youshouldn't say that hey Google is built on Python.
No, Google uses 10 languages.
It's a huge company, right.
Same for Dropbox.
Oh by the way, the creatorof Python works for Dropbox.
He actually does work forDropbox and he has it in his contract that he willdevote a certain percentage of his time, like it'sabout 30 or 40% of his time, for developing Python.
So that is something that he got as a part of his job contract.
Of course he's a verycelebrated person that way.
Not regular people or I guess you as well, can really negotiate something like that.
But yeah, he did becausehe's the creator of Python.
But yeah, the point isthat best of the places in Silicon Valley and government agencies, they are using Python.
Now the best part, yes,this is my favorite part, because Python really worksfor most of the thing out there like you want to develop awebsite, you can use Python.
You want to do data analysis,you want to do data science, you can use Python.
You want to do automationtesting, you can use Python.
You want to do collectdata from the internet, which is web scraping,that is extract data from the websites, you can use Python.
Now one cautious thing,you don't need to do all of the things to saythat you are a Python, a starter Python developer.
Like for example, I've neverdone this one, testing.
I've not never really done scraping.
I know how it is done, butthe point to note is that there are very slightdifferences between these.
It is then that you have tolearn an entirely new thing then you are doing this versus this.
You have to learn theconcepts related to it, but that's more like business logic rather than programminglogic is completely changed.
Right, the golden questionon a lot of people's mind, are there Python jobs in the industry? So this is a job trend from Indeed.
Don't focus on the smallpercentages, I know they're small, but they are of the totalnumber of jobs, okay.
They're not just, it says.
85%, but it's of probably 10 million or 15 million job postings.
So you can notice how thetrend is good for Python.
It's going a little sideways,but then it keeps doing that, but if you compare it to PHP or C++, definitely a huge difference.
Then of course, the popularity.
This is even more relevant, how much is the community developing? This indicates Google Trendswould indicate both searches using the word Python eitherby experience or by new people, all by industry.
So even if I have tohire Python developers I might go to Google andsay hire Python developers and then it will show up here.
So this is an overall accumulation.
This is a summation of howpopular Python is compared to other languages.
Like these indexes, 99,66, so on and so forth, these are indicative about that.
So let's look at how to install Python.
We have sort of understoodtill this point what is Python, why it is used, what isprogramming, why it is used.
Let's look at how to get started.
Now, get started, the firstthing is to of course, buy the card, or get the Python.
You go to Python.
Org, yougo to the downloads tab and you download thelatest version of Python.
X is the latest version.
X is also out there.
It's being used by a lot ofprojects, even my projects these days use 2.
X,because they have like one, 12 to 15 months old now,the kind of projects that I'm working on, so Imean, it isn't that you do 2.
It isn't that for a big(speech unclear) in Python specifically, you would finda massive amount of difference between a 2.
X and 3.
I mean, it's like you know,having a car and then it's sports version of thatcar or the plus version or the luxury version of a car.
The differences are minorfor the regular user, but if you are a Python user,then of course you might find significant differences.
But don't be confused by it.
It's not that companies telldevelopers who you know, know Python 2.
7 and don'tknow the latest version.
The job market doesn't work like that.
They will go for Python.
They will ask you if you knowPython and then the version difference doesn't reallymake too much of a difference.
You open the link and click on run.
Click on install now.
This is very important.
In case you have alreadyinstalled Python and you forgot or you unchecked this for some reason, please go back, uninstallPython, and reinstall it, and make sure that you selected this box.
Nothing will work if you don't do this.
So please, please,please, follow the step.
And that's it, you're done.
As easy as that you havePython on your system now.
There's a Python GUI developer for you.
Now next is Python IDE.
The one question is what is an IDE? IDE is an integrateddevelopment environment.
It is used as a code editor,including a series of parallel components in it and attachments.
So fancy words aside, if you can ask me, hey, can I write code or canI write Python in a notepad or a Word doc? My answer to that questionwould be yes you can.
Now if you ask me shouldI write code in Notepad? My answer to that would beno, please don't, never.
So you get the difference,you can do it but you should definitely not do it.
So it's not that your codewouldn't run or something of that sort, it's just thatIDEs are tools which are meant for doing this.
They are specializeddevelopment environments for you to code in.
It helps you in certain ways.
That makes sense, right,because programming can be very difficult without them.
So definitely please choosean IDE and stick to it.
One thing that you noticed,there are line numbers.
These are very, very useful by the way.
As you will experience, ifyou're talking to another developer, the best way to talkto them is line number three then edit on this particular line.
Can you start reading fromthis line to this line, that's how I deal witha number of situations.
I will talk about a certain line number.
Okay, let's open somethingcalled the terminal.
So on Mac and Linux, thiswill be called the terminal.
On Windows this is called the CMD.
See unit two, search for CMD.
Once you install Python, whatyou can simply do is type Python on the commandline and press enter.
It opens the Python interpreter.
I can write Python coderight here and it will work.
This is valid Python code.
It comes with a help function.
If I type help followed by braces, it will launch help for me.
If I type in keywords, right,so there are certain things called keywords in programming.
Let me just zoom into it, yeah.
So I can exit from Python.
Look at the definition.
Let's get started.
I will show to you this ishow you can open the Python interpreter, you just needto go on the particular page and type Python and it will open it up.
Let's come to Pycharm for a minute.
In Pycharm, the way itworks is that of course, I already have this, but incase you would not have this.
So let's just remove it all together.
What you need to do is youneed to create a new project and then you need to give it a name.
The name could be test and you do a create.
We do open it in the current window.
Once it's in the currentwindow, what I can do is that I can go to new and thenI can select Python file.
I can give it a name.
Automatically creates a testas pv extension by itself and it stores it.
Even if you go into a systemand you'll see that it has been created and you canstart writing your Python code over here.
Now programming hassomething called comments.
A lot of time what happens isthat you will write a piece of code but you will forget what it does.
It's not something thatonly you go through.
It's a standard thing.
Even if I write code Ican bet my life on it that three days later I willforgotten about 80% of it.
I would know what it does,but if it's more than 100 lines I would have to read through it.
Now what developers dofor this situation is that they add comments.
Comments are nothing butsimple English explanations for what you have done sothat you don't need to remind yourself again and again ofwhat this particular thing does.
It just helps to keep atrack of, it's like notes.
It's like small notes andthe programming languages basically ignore them.
Now there are two ways to write comments.
This is a new Python file for the class.
Now that's it.
This is not valid Python code.
This is not code, this is English.
This is one way to write it.
Other way to write it isusing three single quotes and you write it like this.
This is a new Python file for the class.
These are the two ways you can do it.
The purpose is basicallyto just remind yourselves or to other developers.
So this is one thing that alot of people when they're starting off, your code willbe viewed by other developers at some point.
You will start sharing yourcode, either within a company or otherwise.
So when they look at yourcode they should be able to understand from yourcomments what it does.
Not just the whole file, notlike a description that this file says something to thedatabase, but like what each and every step is along the way.
So like, for every 10, 20 lines of code, probably one comment should be there.
That's like a best practice.
So next is indentation.
You need to think aboutthis in the way that Python follows indentation like comment system on social network websites.
So if you've ever seen Reddit, let me just open it up actually, okay.
So if you look at Reddit, ifyou go to the comment section, you will see that there's a nesting.
You can know that okay,Ahah is being responded to by RetardedClownFace orthis guy's being replied to by this but this guy'sbeing replied to by this.
This guy's being replied to bythis guy, so on and so forth.
It's a nesting.
You can see that there is a division.
You can see that this commentis a reply to this comment.
This is something that we see everywhere.
We see this on Facebook, we see this on Instagram, everywhere.
Now, this is how Python sortof declares blocks of code.
So in case you're notfamiliar with what blocks or closures are, it's okay.
You're going to come across itwhen we go further in Python but just remember thatindentation is a very important concept in Python and this is how closures or blocks of code are defined.
So how does Python code execution happen? You write code and it issaved with a.
Py extension, like I just showed it to you.
The code is convertedinto Byte code for machine to understand.
That's done by the Pythoncompiler or the interpreter.
You don't need to worry about this step.
It happens automaticallyas you run the file.
It gets executed.
Just need to care about stepone, two and three are taken care of by automatically.
It's not something thatyou have to do much about.
Let's created a first "HelloWorld" program in Python.
So how do you write hello world,how do we greet the world? Well, this is how we doit, as simple as this.
Print, open braces, then a string.
This is a string.
You can't write this,this will not work, okay.
If you see, this isalready showing an error and of course this iswhere Python is smart.
Python is smart.
If you notice, it is underlined this file.
It has underlined thefolder containing it.
This is telling you,before you even run it, that this is not going to work.
So anytime you're printingstring, a string being anything which is like a message,right, it needs be in strings.
Use the print statement,use the brace opening and the brace closing like this.
If you miss the brace it won'twork, you have to include it.
Then let's run it.
So the way to run it is you right click and then you click on run hello, where is it referencedto the name of the file.
This is where it will run.
This terminal is already present.
Hello world is printed two times.
Hello welcome to Edureka gets printed and then something odd happens.
Happy learning welcome to Edureka.
This gets printed on two different lines.
Why is that? We just wrote a single line.
So this is where this comes in, n.
This is what it's calledis break line character.
It will break the line.
If you're typing in a Worddoc and you press enter it goes to the next line.
It's a new line character.
If I type it two times, weshould expect two spaces.
If you notice it is gone two spaces deep.
If you do it another time, let's suppose you do it fourtimes and you save and we run.
It goes even further.
The reason why it is notdividing in the same end is because there's a space character here.
If I do it like then, thenit'll start on the same line.
Now see, it adjusted.
So this is how you do it.
So it could be single code todouble codes, doesn't matter, but I would suggest following one of them.
This is the output you willget after running the file.
Next is variables, variablesare pretty much like mathematics variables over here.
You take a particular namethat you want to give it.
So you want to set A as 10,you want to set B as edureka.
This is an integer in Python.
This is a string.
Notice that now we're usingsingle string as compared to double quote.
You can do print AB, A,B and notice the comma.
So ever here for example,I could have done this and this would have worked.
So print can separatemultiple strings or things that you want to print via comma.
So you see both of them get printed.
Now let's come here and let's print this.
The 10 and edureka both get printed.
Of course they're comingwith these braces.
So in case you want to just print it, print them individually,that's a separate thing.
I mean, you will have to separate it.
But in case you just want tosee the values on your terminal this is the way to do it.
Now, another exampleover here is this one.
So what this will do isthat it will move in order.
It will assign X to 10,y to 20, and z to 30.
Let's try it out.
Okay, now of course this isvery convenient way of doing it.
Now the alternate way of doing it, of course you can alwaysdo it like this as well.
That it increases thevertical length of your file.
I don't find it to be neatand you will see expedience of level of doing this way.
It's just more convenient.
It just comes in a single line, that's it.
It's just a matter of convenience.
Otherwise, this is also correctand this is also correct.
Correctness is there, it'sjust that line number nine is better than doing it in three lines.
This is more concise.
Now, a word about identifiers.
This is a line identifiers.
When you hear the word identifier,A and B are identifiers.
They're identifying a particularvariable name for you.
Now there are certain rules around it.
It is used to identify tovariable function, class, or any other object right.
It starts with a letter A toZ or A to Z capital or small or an underscore and followedby digits, zero to nine.
So you can't start a variablewith a digit like this.
It shows an error.
What you can do it you can do this.
Or you can do this.
This works, but doing this won't work.
So this, this, _B, __b, and all of this, theyare valid identifiers.
But you can't start it withanything else apart from letter A to Z, capitalor small, underscore.
You can't start with the digits.
Python doesn't allowspecial symbols, like add, dollar, percentage, star, braces, exclamation mark with an identifier.
Now there are certain namingconventions in Python.
There is something called classin Python, it's a concept.
Maybe some of you are familiarfrom other program languages.
It starts with an upper case letter and all other identifiersstart with a lowercase.
So only the class namestarts with an uppercase.
Now there are certain other rules as well.
They are related to classes.
I think it would be betterif we covered them then.
All of these things like toidentify this as private, strongly private, twotrailing underscores.
Now all of these are someconcepts that we'll learn later.
Let's leave them for that.
But for now you justneed to know that okay, this is how you name a variable in Python or an identifier.
Now how are variables stored? There are reserved memory locations.
What happens is that inthe computer's memory, which is the RAM right, itcreates a space and stores that variable in the memory and it reserves somebytes for it, that's it.
That's all that's happening.
It's just remembering them, storing them.
So it's not that while I amwriting it is doing that, it is when I'm running the file.
So it is not going to do that right now as soon as I create it.
So don't think that if youcreate too many variables your typing speed willslow down or your laptop will start hanging up.
No, I mean it might do that, but when you're running thefile and it might be for different reasons that just you know, simply having too many variables.
So but just notice thatthis is just an instruction.
This is just nothing but instructions.
When you run it that's whenit really does something on the computer.
Right now you're just typingit like a Notepad document and you're pressing control S.
But when I run it and by runI mean I go ahead and do this, that's when it goes to memory and it does all of these things.
Yeah, so we have alreadycovered this right, where we run it andthe values get printed.
Now let's look at some standard datatypes available in Python.
Even before that, let's tryto understand what is meant by datatype.
Now, a datatype is basicallya way to say that something is an integer, something is a string, something like what kind of data it is, whether it is a numeral or it is a string or it is a Boolean or somethingelse or something else.
So Python has two kindsof different datatypes, or any program language for that.
One is immutable and theother one is mutable.
So the word mutable andimmutable comes from, you need to think of mutation.
The one that you mighthave studied in biology.
Mutation fundamentally means change.
So mutable datatype wouldsay that it can be changed, that the data can be changed.
Immutable is that it cannot be changed.
So lists, dictionaries, or sets, these are something thatyou're going to study.
They can be changed,but immutable datatypes, numbers, strings, or tuples,they cannot be changed.
Now one source of confusionthat you might have is you might be thinkingso, if I do a=121 like this and then I do a = 121,or let's suppose right, I do this and then I do a = 3, then this should not happen.
Because a is an integerand you're changing it.
I'm first setting itto a random value 121, and then I'm setting it to three.
But that is not what it means.
What it means is that, seethis is just an identifier.
This is not the data.
The data is on the right side of this.
So when it says that it isimmutable, what it means to say is that whatever you do, youcannot turn 121 into three.
Which if you think aboutit makes sense, right? So this is the level of depth by the way, that happened in programming.
Something as obvious thatone is not equal to two or 121 is not equal tothree, cannot be three.
It is just that.
But this is a different thing in the world and this is a differentthing in the world.
There is no way that itcan convert this into this.
Like you cannot convert gold into silver.
Gold is immutable.
Gold is immutable, where youcannot turn it into silver.
You cannot turn it into any other element.
But then there are certainthings in the world which are mutable, that youcan take them and turn them from one thing to another thing.
So for example, you can take aseed and turn it into a tree.
But if you're trying tomutate gold into a tree or gold into silver, it will not happen.
It will remain gold, evenif you mix impurities in it.
It will be impure gold,but it will still be gold.
So you need to think of it like that.
It's a very basic concept butit's something which is there across programming, thatthe entity cannot change.
Now let's talk aboutthe immutable datatypes, numbers, string and tuples.
No Python supports threedifferent kinds of numeric values, integers, which are signs.
So you can add a plus ora minus in front of that.
There are floats, whichare real numbers, decimals.
Then there are complex numbers.
When it comes to representingnumbers, you can represent them in three different ways,binary, octal or hexadecimal.
However, please note that youtypically will not be using these representations.
Typically, you will be using numbers like you use them normally.
So to define an integeryou set base of 10, to define a float base of10.
65 and for complex numbers.
In case you remember this oryou're familiar with this, this is 10+6j.
This is the imaginary valueand will operate like this.
Let's quickly have a look.
We have the same file here, let's run it, and it outputs like 10+6j.
If I do 10=5+4j and I print D-C, let's see what do I get.
So if you see, it doesdone the subtraction, 5-10 is minus five, 4-X isminus two and a get a result.
Next is strings, acontinuous set of characters represented within anyquotation is called a string.
So whether it's singular or double quotes.
Now, Python does notsupport a character type.
This is basically coming fromother languages which have something called its char, as a character.
Python doesn't have that.
Python just has continuous strings.
In case you're not familiarwith character it is okay.
But character datatype isno different to character in the generic way of alphabets.
It is referring to it inthe programming context, where there is somethingcalled a character, it's char, datatype, in Java,C++, so on and so forth.
Python doesn't have it.
Python just has strings.
It cannot differentiate betweensingle and double quotes or does not differentiatebetween single and double quotes.
As you've already tried it,this works for us right.
We have already done the printstatements using strings.
Next is tuple, so tuple is a fixed list.
It comes within parentheses.
For example, even this rightnow, the way it is printing, it's a tuple.
If you look at this, this is a tuple.
I can do this, now notice this.
I can print this and this will run.
Now I can also do thefollowing with a tuple, I can print and individual element.
I can print the zero index.
These are zero index,meaning or let me change it, so let's suppose these are fours, right.
Start off saying that thisis the first and the second and the third.
It works with a zero index.
It says this is a zero, this is the first and this is the second.
Now I can do this as well.
So this is me reading anindividual element of the tuple, four, five, six.
However, notice if I try to do this.
Python has already highlighted this.
It has a problem with it.
It you hover over it, it'sjust saying that tuples don't support item assignment,which is basically meaning that hey, you can't dothis, you cannot change it.
I cannot change this, let me run it.
It was throwing an error.
It's throwing an error tupleobject does not support item assignment.
So I'm trying to change this datatype.
It's like I'm trying tochange gold into silver and saying sorry, can't dothat, that's not possible.
You cannot modify this.
Now let's look at the mutable datatypes.
The first one is a list.
So we just saw a tuple, itseemed like a list of items right but it was called a tuple.
But Python also hassomething called a list.
Now, lists can contain various elements and number of elements if youwant to create an empty list, this is how you create one.
Square bracket, open andclose, this is an empty list with zero things in it.
This is a list with one, two, three, and four things in it.
Now if you notice, all ofthese are different types of data.
This is a decimal, this is an integer and this is a string.
You can print the list andpretty much make a tuple you can assess any element within it.
Let's run this.
What you can also do isthat you can update it.
So we can say A is equal tochanged first, second element.
So because this is, again,this is zero indexed again, by the way.
Like tuple was zero indexed,this is also zero indexed.
Zero index being that thisis referred to as a zero.
It doesn't count it as one.
So that is generally howthings work in programming.
Almost all programminglanguages have it this way.
So if you see, I changethis second element.
So the same thing was throwingan error in the tuple, but in a list it is doingin this certain way.
It is allowing me to modify it.
Now a list can containother lists as well.
So you know, you can also do this.
And you can also do, you know,four and then third element and it can also contain tuples.
It can also contain a tuple inside it.
This is a particularlycomplicated example of, not complicated, but somethingthat you would normally encounter, but I'm justtrying to showcase it to make a point.
That a list can be of anything.
So think about the English wordlist, it can be of anything.
What I've done here is thatI've said that zero element, which is this, this is alist, give me the second one.
So it gives me four.
If I say one and then Irun it, it'll give me two.
I can run it on the first one as well.
Yeah, it will give me four.
If I need to access the tuple,think about which element it is, zero, one, two, three, right.
I want to print five, then I'll do this.
B3 to, I'm just counting.
I'm counting zero, one, two, three.
Okay, I got the tuple till this point.
After that, I want to access five.
It is zero, one, two, I go to two.
Now slight differences, listsare putting square brackets, tuples are within parentheses.
Lists are mutable, tuples are immutable.
Tuples are faster thanlists, letting that owing to the immutable nature of it,but that's not the real reason why you will use a tuple.
You will use a tuple if youwant to have a fixed value.
So if let's suppose, you aremaking a simple calculator, you know which can doscientific calculations.
Let's suppose by using the value of pi and scientific constantsthat can calculate the area of a circle or a sphereor a cone or whatever.
It needs to use 3.
14, the value of pi, and other constants andyou need to maintain that and need to ensure that itis not modified accidentally by you or your programming.
That is when you will use a tuple.
Or you would use a tuple ifyou had to maintain a list of countries because you don'twant the list of countries to get changed.
You don't want, accidentally,when you're coding, you end up changing anelement within that list because you know, it willcause a lot of problems.
So basically, to understandwhy you need to also think about and understandwhy immutable datatypes are needed because if youlook at lists and tuples, the only difference really here is– So by the way, I can dothis with a tuple as well.
I can do the very same thing.
I can simply do this,see, and I can run it.
It will work as it is.
It just did, I justprinted the element, right? Now, the only differenceis that lists are slower, but more importantly, lists are mutable.
So mostly you will findyourself believing that (speech unclear) dealing withlists rather than tuples.
Tuples are needed forvery specific scenarios.
So don't get confusedby why do I have two, can I use either one of them.
Focus on lists for now.
Tuples will naturally cometo you when you are presented with a scenario.
The tuples are more in theniche, where they are part of a particular case.
Next are dictionaries.
So dictionaries, as the namesuggests, it is like you would go through a dictionary right.
What is the meaning of the word naive, what is the meaning ofthe word precarious, what is the meaning of theword apple, so on and so forth.
Against every word youwill have a meaning.
So the word is like a key and the explanation is like a value.
This is a key, age is akey, this is the value.
This is the key, this is a value.
If you look at is, this is very readable.
They're separated by colons, okay, and the need to use thisis that it makes your data very readable.
Key value pairs is somethingthat comes naturally.
So if you look at a fillform, any sign up form that you have every filled, it asks for an email and a password.
Imagine that being stored here.
So in the sense, let's open this up, so this is being handled like this.
Now, suppose you can also argue that hey, you know I can also store it like this, why shouldn't I store it like this? I'm gonna store it like thisand instead of, you know, doing, oh by the way, so whatyou would have done over here is you would have used squarebrackets and then you would have typed the key.
Let's run this once.
Right, same output.
So you can argue, hey,why wouldn't I use a list or a tuple, why would I use this? So one primary reasonwhy you would not do this is because this is way more readable.
Imagine how other lines ofcode, imagine this was declared on line one and you'reusing this on line 100, you don't know what is zero.
You have to constantly referto what is being stored at zero.
But if you look at it,immediately oh, okay, you're trying to print the age.
So for developer readability,dictionaries are way better.
You will find that you willuse these a lot as well.
Once again, so the keyshave to be strings.
The keys have to be strings,but the values can be anything.
The values can beanything in the sense that it could be an add as well,or it could be you know, a dictionary as well.
A dictionary within adictionary and add A within a dictionary, all of themcan be mixed and matched.
Last one is sets, it's anunordered collection of items.
Every element is uniqueand you use curly braces separated by a comma.
If you notice, if you try it out, once you've tried finally,so you might assign it but it will automaticallyjust print or consider the unique ones.
That's it, it wouldn't botherwith the non-unique ones.
Next is Python operators,operators being any sort of operation on your datatypes.
So two plus three is equal to five, two and three are operantand plus is called as an operator.
Arithmetic, assignment, comparison, logical, bitwise, identity and membership.
These are the differentkinds of operators.
Let's go through them.
First one is arithmetic,quite simple that we have done in maths, a+b, a-b, a*b, a/b, this gives you the remainder.
So this is called a modulus operator, let me quickly show it to you.
Let's give zero fourmodulus five gives us four.
Then of course there's4+5 and then 4-5, right.
Exponent is 2**3 or 2**2, sorry, I misprinted that.
Yeah, two, four, and thenthere is float division, which is so this is float division.
This will return the float.
Now assigns value from right to left.
So first this will be calculated and then this will be assigned.
So if this seems confusing toyou, it starts from the right.
It doesn't start on the left.
So it isn't that it willbe confused with more to do it will do a+b and then reassign it to a.
A-b and then reassign it to a.
A*b and a/b.
What is written over hereis a shorthand notation.
So in Python you could,let's supposed I had a=2, b=3, now I could do a+b, then a is five.
Or I can also do a+=b and then a is eight.
So this is because I added again.
I can do it like this again, sorry, a, b=2, 3 and I can do a+=b.
These are the shorthandnotations which give the same effect, which says thata, take the left one, and apply this operationand using the second operant and just assign it back the value.
Next are comparison operators,so these give you a result that's true or false.
These give you Boolean values.
So if you do a=b it says false.
You say 1=1 it says true.
If I do it a is not equal tobe it'll give me a true value.
I do a is greater thanb, give me a true value.
If I do a less than b,it'll give me a false value.
'Cause a is two and b is three.
The a is five and b is yeah,in case you're getting that confused, so less than b,five is less than three false.
Five is greater than three is true.
This is naturally true, andI can also do a greater than equal to b and a less than equal to b.
These are keywords, theseare not strings, by the way.
So these are not strings,these are Python keywords.
Show it to you here, I canalso assign them like this.
If you notice the color,the color is different.
It's not like just another value.
It's been treated as aspecial value in Python.
Yeah, so then there arelogical operators as well, a and b, a or b, and not a.
They are pretty much so aand b will give me three.
It'll give me the lower value.
A or b will give me the greater value and not a will give me false.
So because a is a truth,not one will give me false.
Not zero will give me true.
So anything greater than zeroand not false will give me true and not true will give me false.
Anything which is not zero and not false is going to give me a falseand apply the not operator.
Going back to what was thisdoing, why it was giving the lower value and why itwas giving the high value.
It is the way Python to usewith this particular thing where or on two particularthings, if we compare them, and it will convert them into binary and then it will automaticallyperform the operation that is (speaking unclearly).
This is more related to binary logic rather than Python logic.
So this is the result that youwill get even in mathematics.
You had a math questionlike three and five, right and three or five.
This is not about Python really, this is more about mathematics,about why operators three and five should befive and while the operation of two of five should be three.
So I'll just sort of leavethat for you to kind of rediscover, but that's reallythe school mathematics.
It has got nothing to doparticularly with Python.
This is just the way you do it in Python.
You simply use it like English.
So if you notice this, thismight not seem like English to you right now, this mightnot seem like code to you right now, but all of this is valid code.
I can do this as well, right.
And I can run this.
See? I'm getting a result, false.
This is all valid Pythoncode and A or false returns this value.
So or basically alwaysthis returns a high value and always returns a low value.
If I have one and one, butif I have five and five, and will return the lowest value, which is five in this case.
But if I have five and four, it will return the lowest value.
Or always returns the highest value.
Next are bitwise operators,these are done on bits, essentially a binary AND, binary OR, binary XOR, binary NOT, binary left shift, binary right shift.
Good to know about them,but as I said earlier, you will rarely be dealing withbinary operators in Python.
Binary number, very, veryrare case.
Otherwise, when you're dealingwith that, it's going to pretty much like mathematics.
This has got very littleto do with Python, it's got more to do with maths.
Then Python also has identity.
Again, very, very readable.
So five is four, of course you'll get a false.
Again, five is not four.
I can do it using variables as well.
I can do the same thing saying that.
Of course we're going tobe, so you might be asking, hey what is the use of this,why is it printing true false true false? Of course these are useful comparisons.
You can later set up logic,something like if C is D then do this, otherwise do something else.
So if C was you know, supposetoday is equal to Wednesday.
This is yoga day, it's Thursday.
Of course, now you can builda logic, if today is yoga day and I can run it.
So now notice how readable this is.
It's reading like English, right.
That's why I love Python,that's why a lot of people love Python becauseit's very, very verbose.
The word is verbose is veryeasier to just read through the code and understandokay, what does this mean.
It reads like English.
Next is in, so membershipoperators what they do is that they check if a certain thingexists within a certain thing.
So this applies for dictionaries,this applies for lists, it applies for tuple.
So I can do something likethis, let me run this.
Four not in list, false, right.
Over here what I can do is,comment all of this out, right, so it's membership checking.
It's checking if a identifiesa member of this or not, of this list or not.
This works on lists, it workson dictionaries as well.
In dictionary it will work like this.
Print age in A.
We're just trying to checkif a key is present or not.
It says false.
So wait it says true here.
Let me comment this out.
Let me comment all of this out.
Now if I were to by mistakedo this, it was in this case it will say false.
Again, for comparison's sake, if I run this right, andthe point of this of course, not to in the case of adictionary if I do this, where I use a key whichis not a part of it and I run it I'll get anerror, key error, age.
It doesn't recognize it.
So in case I need to checkfor a key before I use it, I can use something of this sort.
Now let's look at conditional statements and conditional statements are used to, so this is what we are sortof building up to this point, they are if and else if.
If a condition is metas true do something.
If it is not do something else.
We do it every day.
If today's your Wednesday,go to yoga class, otherwise, no yoga classotherwise chill at home.
That's just something thatcomes very naturally to us as human beings now andnaturally we need it in programming languages as well.
Of course, the way it worksis that you start a program and this is sort of called a control flow.
If X is less than Y, do a bunch of stuff.
If it is not, then checkagain for a certain condition.
If that is true then doa bunch of other stuff, otherwise do this.
That just is how programming works a lot.
So a lot of business logicor logic of how you want to your application to workgets built in over here.
Okay, now let me talk aboutindentation once more, okay.
Let me first commentall file operation this.
So the thing is is thatwhen you type instructions in Python, how do youtell Python that hey, these bunch of instructionsneed to be followed and that if and these bunchof instructions under this else if, right? So essentially this is a block of code.
This is definitely a block ofcode where the instructions won't just come on a single line, right.
Naturally, you write morecomplex things over here.
You might need two, you mightneed to write a hundred lines if this condition is met.
You might need to do a hundred things.
Now how does Python knowthat this is within this? Like, one way to do it is ofcourse, I mean why can't I write it like this? I can, why not? But then, does elif comeunder tired instruction set that I've been writing? Or does it map to this? Because not all of themare on the same line.
So should my statement go here or here? There is no enclosure.
There is no clear path defined.
The way Python does it through indentation that we talked about earlier, where this needs to be ata distance of four spaces.
So the IDE does it by itself as soon as you're sort of typing.
If I'm typing, let's suppose if X>Y and I press colon, it'salways followed by a colon, by the way, so notice thecolon and I press enter.
It will automatically go to four spaces.
It doesn't even ask me.
Tab in the Pycharm workswith four spaces always.
Otherwise you can givefour spaces by yourself.
So there's one, two, three, four.
But your code will notwork if you don't do this.
Your code will not work.
It's not just, thereis a very strong reason why this is like this.
The reason why it is likethis in Python is because the developer of Python wanted to make Python read beautifully.
While it may seem like an odd reason, but if you look at codeproduction level code, so which I mean if you lookat code written by great software developers, good developers, not even great, justeven to be good enough, considered to be good, theywill write code which is very readable, very clean.
Guido van Rossum, the creator of Python, wanted to enforce it.
He said it was not a choice.
I tell you how sort ofotherwise it typically gets done in other languages.
So in other languages, youwould have something like this.
Okay, and then you wouldhave something like this and then you would havecurly braces, right.
Then you would have, let'ssuppose, a bunch of lines let's suppose print X, right.
Then this would haveprint Y a bunch of times.
Now this looks dirty because I mean, I could have indented it andit would become a little more readable, like this.
Ignore the underliningsbecause this is just not valid Python code, I'm justtrying to show an example here of how other program languages do it.
So they do it by curly braces.
They indicate what is part of the closure or a block of code using curly braces.
But Python doesn't have that.
Python doesn't have curly braces enclosing a bunch of instructions.
What Python has an indentation, this.
Bases that, it takes a decision.
So if it is true then it goes inside.
If this is true it goes inside it.
Otherwise, finally doesn't checkanything and just executes.
So let's run this.
X<Y and on line six.
If you just wanted todo an if else statement, you can do that as well.
You can add conditions overhere and simply run it.
Greater than 25, a is less than 10, a between 10 and 25, greater than equal to 10.
Pretty much like how youwould write it in mathematics.
If this is true then dothis, otherwise do this.
Please note that theseand these are not related.
Both of them will always be executed.
So if I do this and I make a as 10, right, we will see both the statements.
So less than 10, in between 10 and 25.
But if this was under this,now this is an if else block, then and if block.
Because the tab.
So to determine what is insidewhat, you need to really look at the indentation levelfrom the parent statements.
So for this line to beprinted this and this needs to be true.
For this line to be executedonly this needs to be true.
For this to be executed,this line needs to be false, but this line needs to be true.
So the control wouldn't goinside this, but this if else is for then this outer block.
So in the terms of comments,this is a parent comment and this person is replyingto that parent comment.
It is not an individual comment by itself, it's a nested comment.
If this is like this, then both of these are parent comments.
Next is loops, there are threekinds of loops in Python.
The purpose of loop is to dothe same thing again and again and again, like on repeat.
If a condition is met, so itwould check for a condition and it would continue repeatingaccording to that condition a number of times.
So a value performs untilan expiration is true.
It checks, goes in the body of the loop, otherwise it goes out.
Please note that the whileloop, again, indentation block followed by a colon.
Now in this case whatyou are doing is that we are printing tell fold.
We are going to increment thevalue one by one one by one.
We are going to count till four.
Let's run this.
One, two, three, four and then good bye.
As soon as count becomes five,it will not be true anymore.
It'll come out of the loop.
Let's mute this and instead run this.
So if rank is not equalto 12 it will print from five till 11.
So this is a little guessinggame that you can try out.
Input number 10, the number is too small.
Input number 15, the number is too large.
If input number is 12, exit,I mean you will end the loop.
Congratulations you made it.
Now for the for loop, thisis by far the most important one that you will be using.
It iterates a word arrange.
Now arrange could be a list.
So if you wanted toprint a list or a tuple or even a dictionary items,anything which is a collection, right, that it what it takesover and will increment, it will keep going iteratingover or stepping over the each individual item.
So let's look at the statement here.
So we have a list of items,fruits, and what we can do is ignore this.
This is not relevant.
Fruit and fruits, print fruit.
Again, simply run this.
So it runs bananas, apples,oranges, in that order.
This is the list, then againit will do the same thing.
The entire list, one, two, three, Python.
Right, so you can use for alist to click the factorial of a number.
All you need to do isreproducing the number by a one and keep multiplying it and keep using it again and again and again.
So you just subtract it, multiply it, multiply it, multiply it andthen you will get an outcome.
Of course you can have nested loops.
So you can have a loop inside a loop.
Again, same principleof indentation applies.
You can have a by whileloop instead of for loop, or a for loop instead of a loop.
Purely depends on what you're building, on what you are trying to do.
So it depends on the logic, right.
Let's suppose the logic isas simple as you are writing a robot and the sayingwhile Thursday for zero to one million keep taking the left foot and the right foot forward.
Left foot forward, right foot forward.
It will keep doing thatuntil it is Thursday.
So it will take as manymillion steps, or let's suppose it's about the instructionthat's something like this.
While day equals Thursday for three, four right.
So you are basically tellingthe world that hey while it is Thursday, just keepdoing this all the time.
That really depends onwhat you are trying to do with what your application requires.
So again, this is somethingthat you could try out where you could write a code in Python, just simulating a ATM, whereyou ask for the four digit PIN.
Then you show a menusaying that while balance while overdrawal, pay and return card.
Then the user selects a value or you know, it keeps repeating it a number of times or you could set a limit on it.
While the number of tries isnot five, keep asking the user to make a selection.
Now, there are certaincontrol statements in a loop.
If you want to sort ofbreak out of a loop, let's suppose you'retrying to looping over a (garbled speech) or something, or you are trying to loopover something and you, a certain condition is makeyou want to break out of it.
You want to finish theloop right then and there.
Then you would use a break statement.
If a certain condition is metI really do not want to do it then it would use a continue statement, where it would skip the loop.
It would skip where it is inthe loop and go to the next and then of coursethere's a pass statement.
This is used not just inloops, this is used in classes, functions, multiple other places as well.
This is used to define an empty body.
So let's suppose for I in one, two, three, I will just say pass.
Just because I have notreturned any code yet, I'll worry about itlater, but it's just that I want to, because if Idon't do this then this is not entirely correct.
If you notice, I cannotwrite any code after this, even if it is, you know legitimately true, because the indentation, Python doesn't know that I've closed this.
I will use a pass.
Pass, again, is at an indentation.
It is a default sort of wayto tell Python that hey, this is over and just go tothe next line, which is 24, and don't consider tryingto inside the for loop.
So as I was saying right,if you're printing numbers between 10 and 50, but assoon as you encounter 30 you want to break out of the loop.
It will stop the loop.
If you really want 11and you say that j = 5 and after that you want tocontinue to the next one, it will skip the five.
It will skip over it andstraight away go to seven.
Let's try out this one.
Let's suppose range is one to 11.
So let's run this.
Wait, I need to mute allof this and this as well.
Yeah, so if you notice thatthis time five is not being printed because on five we just skipped.
I place this two placesabove over here and run it.
Then it will print everythingbecause you're printing anyways, so I mean, essentiallythe use case of continue is that if you do not, forsome reason, want to press a certain value thatyou're going to encounter.
If a certain condition ismet you don't want the loop to continue after that.
You want it to go to the next one.
But if you want to breakall together right, and you were not expecting this to happen and it will just stop.
Continue will go to the next item.
The next thing is command line parameters.
When you're executing item file, right, you might want to passcertain arguments to it.
Arguments being certain values to it.
That happens to the Pythoncommand line argument.
So Python is a module andbuilt being that is straight out of the box, you don'thave to install anything.
It's like a library, that sortof module is, essentially.
You have a Python sys moduleand what the sys module does is that it stands for system.
It is first your operatingsystem and it allows you to access the arguments that you are passing.
It's a list of command linearguments and len(sys.
Argv) is the length of command line arguments.
So let's look at oneof the files for this.
Let's actually create a file.
First thing that you need todo in the file is import sys and then you want to say len(system.
Argv) and then you want to say sys.
Now, one thing that we haven'tcovered until now is how the Python file will be run.
What I'm doing is rightnow I'm going to the folder which contains thisfolder DS mod one, okay.
It contains somethingwith the letters there.
Okay, then all thefiles including test.
To execute the Python file, all I need to do is run Python test.
Then I need to provide the arguments.
Now notice that the lengthof arguments is six.
It's not just one, two, three, four, five.
The name of the file isan argument in itself.
So this can be confusing at first.
The first thing thatit takes as an argument is the file itself but thennext is followed by whatever you give it.
This is one, two, three, four, five.
This is what (voice cuts out).
(light instrumental music) Before we do any of that,let's suppose one problem or one thing that you wantto do with a Python file is you want to ask the user for an input.
Let's suppose we want to writea simple script which will ask the user for two numbersand add them or multiply them or do some sort of an operation with them.
So how can we do that? The answer is that Pythonprovides you with an input function and the way theinput function works is that it's an in-built function in Python.
You call the inputfunction, pass a string.
The string is the message orthe question that you want to ask the user.
So let's suppose you want to ask the user, give one number to multiplywith or give one number to add.
That will come between the strings.
Now whatever the usetypes in as the input, that gets stored into thevariable on the left side.
The function basically returnof what is does for you is that it prompts the user with a line, which is understood ina language that the user will be able to understand.
The user types in somethingand then it gets that value and gives it to you in a variable.
Let's look at a quick example.
Let me comment this and here on this file.
So it's asking for my input.
Let me enter three and it saysthe received input is three which means that theinput was stored in str.
Let's look at another example.
Suppose you want to ask theuser for their name, their age.
You provide the stringthat you want to ask using.
It gets stored in this variable here and then you can just print it.
Then of course if you wantto perform this operations you can go ahead and do that.
Let's quickly run this as well.
So I just faced an error.
Now what happens is thatinput expects, so if I'm not giving it a number, if I'mgiving it a string like my name, I have to provide it in quotations.
So it will work if I do it like this.
For numbers I don't needto provide any strings.
Let's run it again.
Suppose I give my name asVarun, that won't work.
If you're inputting astring then it has to be within single quotes.
It can't be without single quotes.
If it's an integer then itdoesn't need any quotation.
Next, let's look at files in Python.
So if you remember we discussedthat let's suppose you have to build a software whichwould manage files for you, put them in folders, sortthem, so on and so forth.
Python plays very neat and niftymaterials to be good files.
So let's look at that.
When it comes to files, thereare a couple of typical things that you will need to do.
You need to open and close afile, even if it's a Notepad file, or a Python filethat you must just created after the first one.
The thing that you do withany file is that you open it or you close it.
The next things are youneed to write a file or to read from a file.
Then sort of maintenanceoperation such as renaming the files and moving thefiles or deleting the files, stuff like that.
So before we do anything,you need to understand that opening and closing afile how does that sort of work in Python.
The thing is that openingand closing a file works with modes.
It's not as simple asopening a physical file in the sense that there aredifferent modes in which you can open up Python file.
There's a read only mode,there is a write mode, there is a write plus modewhich creates the file if it doesn't exist, andall these are for the sake of protection in the sensethat if a file was always open in write only mode right,if it was that you can write and read to it all the time.
You might end up doingcertain things to a file, certain operations, you mightend up messing up the data inside the file.
It is basically a bestpractice that you open the file with the correct kind of modebased on what you're going to use it for.
If you're not going to write to is, don't open it in a writemode, open it in a read mode.
Then of course, there is closing a file.
So suppose you are opening a huge file.
It's a file which is 500 mb in size.
It contains 10 million addresses or names.
If you do not close thefile, what happens is that when you open a file itgets loaded into the RAM.
If you do not close the fileproperly, you do not do that management bit of the file properly, then it will remain in thememory and that can potentially cause a memory leak.
So just like one of theprecautions to keep in mind that no matter how you'redoing it, opening closing of the file needs to be kept in mind.
So open the file, you willbe using the open function of Python.
It take the file name as a first parameter and then the access modeis the second parameter.
The file name actuallyis the name of the file and access mode is whatI just was talking about.
There's read, write, or append.
So here are the different access modes.
R, this is the default mode, opens the file for reading only.
There is a mode called rb as well.
By the way, you pass themas strings over here.
This is again, this gonna be in quotes, string r and string rb.
It opens a file for readingonly in binary format.
It is rare that you will usethis, but it's just something which is good to know about.
For typical files which aresort of decently maintained, not even well maintained, youwould not really need to go for rb and the difference,unless you're dealing with some sort of malformedfile, is not that much.
Or something specificthat you're doing with it where you need it in the binary format.
Then there's the r+ mode.
It opens it for both reading and writing.
Then there's the rb+mode, it opens the file for reading and writingand in the binary format.
There's the w mode, it opensthe file for writing only.
It overwrites the file if the file exists.
So this is the significantdifference between w and r+.
Where w will basically overwrite the file, just like creating a newfile, irrespective of the fact that it exists already.
Whereas r+ will not overwrite the file.
Wb, it open the file forwriting only in binary format and like w, it overwritesthe file as well.
Another mode is a, it opensthe files for appending, which means that it onlyallows you to add to the bottom of the file.
It doesn't allow you topart of it is already there.
So this is sort of makessense if you're doing update or append on thesystem where you do not want the prevents the codes tobe manipulated in any way.
Ab it does the appending in binary format.
A+ opens the file for bothappending and reading.
Ab+ opens the file forboth appending and reading in binary format.
W+ and wb+ is something thatwe have already covered.
Now when it comes to writing files, you take the file object,which is obtained over here.
You open the file, you get a file object.
This is basically presentingyour file in the code and then you say.
Writeand you pass the string that you want to write.
The write method has, ofcourse, I mean because you were talking with the binarybit as well earlier, that you can open afile in a binary format.
So write just doesn't take strings, it can take the binary dataas well and not just text.
The read method reads a filestring from an open file.
And again, it can readbinary data as well.
Notice the counts.
So if you give it a count, thatis the number of characters it will read at a time.
It doesn't offer parameters,we do not give it a count it will ready everything.
Let's look at renaming files.
So for renaming a file, youneed to give the current file name and then the new file name.
It takes two arguments andrename, if you're wondering what is OS, it's a moduleavailable in Python which will deal withthe operating systems.
We are going to cover it later in depth, but just know that OS representsyour operating systems.
Because when it comes torenaming, it is really usually done with the help ofthe operating system.
Not that you need to knowabout operating systems in terms of how they're built or written, but this is just is a niftyutility available in Python.
For deleting files, wewill again use os.
Remove and the file name as the argument for it.
So let's quickly lookat a couple of examples.
So let's start with this.
We open the file, notice the mode is w+.
So the file doesn't exist write now but we'll create the file.
Now I'm going to loopover numbers one to 10 and write these 10 lines.
Let's see if I'm successful or not.
Finished, I have a file thatsays hello, welcome to Python.
Maintains because this starts on one.
If I do just zero, we do that.
So let me run it again.
Now it has done it 11 times.
It will simply override the file each time because it's in w+ mode.
Now, let's suppose we had to read the file and we will run it.
What did we do wrong? What we did wrong herewas that it's in w+ mode so it is already emptied the file.
But let's run this oneagain and rerun this.
Now we have a fill file,let's mute this out and now open the file inr mode and now run this.
Now we have this appearing 10 times.
We have a newfile.
As I said, you could passit a counter as well, so I could pass it fiveand it will then only read five characters at a time.
You get a very weird output.
Don't be confused by it butit's just that it's reading only five characters ata time and that's why it's coming like this.
If it was reading 500characters at a time, then the output would have been different.
500 works similarly for usjust reading 50 characters at a time.
So I think the first time itwould have read all of this and then it will have read this.
Now let's try the OS function as well, in case you want to remove the file.
Let's first rename the file to new.
We don't really need toopen the file so we can just remove it and we can run this.
If you notice, this has been renamed.
That file disappearedand we now have new.
Now we can simply go ahead and remove.
We run it and that gets removed.
Did you notice it got removed from here? So the file.
Close methodcloses the open file.
A close file, of course, cannotbe read and in case I was to reassign new file.
Let's suppose I was doing thisand then I take this variable and I assign it some other value, what Python will do is tellit to automatically close the file whenever I do a reassignment.
So it will automaticallyclose the file if I reassign this variable.
That is where Python is ofcourse, really helpful as well, because that's one of thethings that if you remember, that Python is supposed tobe a very friendly language to learn.
Now let's try another onewhere we close the file and then we try to write to the file.
Let's see what happens.
It gives us an error.
Yeah, I'm trying to openit from the read mode and the file does not exist.
Let me run it again.
Now there it is, I/Ooperating on closed file.
It says the file is closed.
Now there are variousattributes of methods available on a file.
One is file.
Close, we just saw that.
There is file.
Mode as well.
Let's see what we get with.
So we get w+.
In case you need to, thisis used for checking.
You can use it for any sort of checks.
In case you're confused aboutit, like if writing a large piece of program, at somepoint there are multiple files and you don't remember or youwant to have a strict checking before you do something on afile, maybe you do not want to write to a file or youwant to add a constraint, that is when this will help you out.
Now the file.
Name will of course, return the name of the file.
Let's do that as well.
Let's do another one aswell, we have done the mode and then there's file.
Softspace as well.
Let's run all of these three.
So we got the name of the file as well.
Softspace is nothing butit returns the Boolean, which is zero or one, whethera space character needs to be printed before another value when using print statements.
Then there's a mode, there's name and then there's newfile.
A few other methods available are seek.
It lets you seeking through a file.
It's searching for a particular thing.
It takes in the number ofcharacters for line read creates a number of characters.
Similarly, seek also takesthe number of characters where you can tell it hey, startreading after 40 characters have passed.
Tell gives you the currentposition of the file.
Each of these thingsover here in the file, this is a position.
This is a certain line in the file.
So let's try this out and see what we get.
Let's open the file inread mode, run this now.
Right, so it's seek two zero.
Now let's make it 100 andsee if that changes anything in the output.
We're seeing that thesecond output is changing based on how much we are seeking.
So if I do it make 1,000,if you go to the 1,000 line and we can also do athird read at this point.
This is saying that I'm gonnafind the first five characters and I'm gonna do a read andthen I'm going to do a tell.
This is basically this of course, newfile.
Read will readthe entire file, okay.
This is not just going to read one line, but it starts readingafter the five characters.
So the first five characterswould be H-E-L, right, and then two charactersfor the empty space.
It also counts as a character.
Hello first, the space,one, two, three, four and then there is new line character.
There is an enter character, which makes it come to the next line.
So, the five characters.
We have already gone through this then, the different operationsperformed on a file.
Now, most of the time, alot of time in programming you will hear the word data.
Essentially, that's a lotwhat programming is about, about logic, about dataand dealing with data.
If your different types ofdata, how do you arrange them, how to you organize them? It's very, very, veryimportant in programming and in Python and Pythonprovides some excellent ways to do this.
So Python has something called sequences.
Sequences are nothing butcontinuous collections of datatype.
Data which allow thedifferent operations on them.
Whether that's indexing orslicing, or something else, as you will look.
Think of sequence like howyou would think of a sequence in a real life, like a sequence of bikes, like a row of bikesstanding out side an office or a row of bikes standingat a traffic light.
You need, in a show room,you need to think of that as a sequence.
So let's look at sequence operations.
Most common sequenceoperations are concatenation, repetition, membership testing, slicing and indexing.
Let's look at concatenation.
Concatenation is nothing butgluing two thing together.
So you have four bikes overhere and then you have four other bicycles over here.
Concatenation is joiningthe sequences together, create a new sequence.
Sequence repetition isbasically taking a sequence and sort of multiplying it by a number.
So which is create, duplicate,sequence it twice the number of items or it could be threeor four and it could just create a number of items like that.
Next is membership testing.
What this means is that youwant to check if a certain type of bike is a member ofthis collection of bikes or not, in the sense that this isnot a member of this right.
You cannot locate thisparticular image or bike style over here but this one is a member, like over here and over here.
So there are twooccurrences of this member of this sequence of bikes.
Then there's a conceptof sequence indexing, where items are zero indexed,where this is referred to as the index zero to zero items.
This is the first, second, third, fourth, fifth, and so on and so forth.
It is not indexed likeone, two, three, four.
Rarely, I don't think, I knowof a programming language which is one index.
Most programming languageswork on a zero index basis.
Slicing is basically taking a portion, which is saying that hey,I want index one to four, which is one, two, three, and four.
So I want to slice starting from this one and going up to this one, butnot including the last one.
So this is not includingfour, this is one, two, three.
So four minus one, three,starting with one though.
One is included, four is not.
The last one is not included,before the last one, this one, is included when you're slicing.
Slicing is basically takinga sub portion of any less.
Think of it like cuttinga slice out of a fruit or of an apple.
You're taking a slice out ofthis thing, which is like this.
Let's look at a couple of code examples.
First one is really simple.
You have a list which has alist of courses, let's suppose, are programming languages, Hadoop, Python, Android.
You can print the first item on the list.
You can slice from zero to two, just include Hadoopand Python not Android.
Or you could do list of minus one.
So let's just try this out.
So it took Hadoop and Python.
One prints Python notHadoop and then negative one basically takes the lastelement in the list.
Negative one is reverse, Icould also do negative two and I can also do negative three.
Let's look at that.
We can always do this, plusreact angular data science and we can look at the output over here.
See, so we got a concatenated list, listing react, angular,data science, right.
This got added to it.
We can do a list or partition as well.
Let me comment this, insteadhave it like print (list).
So I'll just use a differentexample where I want to multiply by three instead of two.
If you see, now the same thisis appearing three times.
This was appearing once,Hadoop, Python and Android, and now it's appearingHadoop, Python, Android, and again Hadoop, Python, Android.
Membership testing, thisgives a true or false answer.
So I can do Hadoop listand do angular in list.
Some of you might rememberthat we have done it the last class.
The answer is true or false.
You've already done indexing and slicing.
So let's look at the typeof sequences in Python.
There are lists, tuples, strings, sets and dictionaries available in Python.
Let's go through these one by one.
Lists, it's the most versatiledatatype available in Python.
It's basically a comma-separated values within square brackets.
Some of the operations thatyou can do in a list is you can update a list element,you can get the length of the list, you can concatenateit, as I just showed, you can slice it, you candelete the limits in a list, and you can repeat limits in a list.
That is by multiplying it.
So when do you want to use lists? If you have a collectionof data that does not need random access.
So what is meant by randomaccess is that you want to access it sequentially.
In the sense that if youtalk about random access.
If you talk about list four,like this, or list of three, you really don't know whatthis means until and unless you know the list, right.
You don't know what this isgoing to be for this represents.
When you're writing code andunless and until you know what the second element in the list is, you don't really really knowwhat it is exactly going to be.
You might know it's goingto be one of the courses, but which course, orwhether it's a course name or the course price, you don't know that.
List is useful if you want tojust sequentially access it and process it.
Of course when you have todeal with values that can be changed, so as you mightremember, tuples are immutable while lists are mutable.
So when you want to changeitems, that's when lists will also be helpful.
This is something thatI just showed to you.
You can index the list, you can slice it and then you can accessit and reverse as well.
Let's talk about updating the list.
Let's suppose I want toinstall Java instead of Python as the first element.
This is how I will do it.
As simple as just taking it out and assigning it a new value.
Let's run this.
Now we have an updated list.
It says Hadoop, Java andAndroid as compared to Hadoop, Python, Android.
To delete the element, whatyou do is you simply say that the keyword in Python isdel, it's an invert function.
You call del and you passit the, the item in the list that you want to read.
But note that we are notpassing in the index, we are passing in theentire item in itself.
The way delete might workinternally, that it will look for that item and then itwill go in and delete it.
See, so we removed this one.
Let's look at pop, what does pop do? If you have a list of thissort, it gives us four.
What pop does is that itjust accesses that particular element and returns it to you.
If you notice the list, itbasically removes the element.
Do not think that pop isthe same as doing this.
It might look like that,because you could say hey, it's a spending four right, so why can't I do it like this? But eventually pop is removing the element from the list all together.
Popping it out of the sequence.
Remove just totally removesthe element, doesn't return it.
This is also something called list.
Remove and if I do list.
Remove threeinstead and let's suppose I do this and then let'ssee the output for this one.
This doesn't returnanything, it just removes it.
There's a differencebetween pop and remove.
Please don't get confused about it.
Pop will actually give youthe item that it removed so that you can use it.
Remove is more or less usedwhen you are done with that item and when you don't want it in the list.
Pop is more like going to theshowroom and getting a bike out of the collection of bikes,whereas remove is basically taking the box in a collection of bikes and just destroying it.
There's a difference.
You can also use thetype function on list.
It'll just return that it's a list type.
Then what we have here issomething very interesting, it's called list comprehension.
Let's look at the example given here.
This is one of Python's verytalked about or well-know features among Python users.
You can basically createa list by doing this.
Forget the print statementfor a second, I can do this.
What this does is you need tostart reading it from here.
For x in[1,2,3,4,5]].
I know the syntax looks a bit weird.
First and foremost, it needsto be within square bracket, so let's remove this for a second.
Then square brackets.
It's already a list, butit doesn't have anything.
What we see, how I wouldhave started writing this expression is I would havesaid for x in [1,2,3,4,5].
Which is valid, right.
I mean, so which is just likewriting for x in [1,2,3,4,5].
It is kind of like that.
I just sort of ended the colon here.
Then I say do this for every X, which is exponential to the power two.
Now I could have also donethe same thing like this.
So let's list.
Append, Icould have done this as well.
But, this is the equivalent of this.
For line 29 and 31 to 33will give the same desired.
It's just that this isa bit neater to write and call list this comprehensionand Python developers love using it.
Just because it's in asingle line as compared to three lines of code.
Okay, we already did append, so it adds an item to the end of the list.
Extend is used with anotherlist, so where you see that here extend this list right,by adding all these items at the end of the list.
So let me show you an example.
If I have this and Ihave this list over here and then I run this, withone, two, three, g, h.
Append is for a single element.
Extend is when you have two lists and you want to join them together.
This is concatenation.
Insert, as the name suggests,it will insert one item at that particular positionin the list, right.
So it inserted.
If you had to do thisyourself, you could imagine, it is doable, but it wouldhave been very troublesome.
So it will take the indexand it'll say that okay, at this index you want this value.
At this index, which isone, you want this value.
So let me do that for you.
Because had you donethis like this instead, it would have overwrittentwo here, instead of one.
But insert will insert itinstead of deleting any of the overwriting any of the existing element.
You've already looked at remove, it will remove that element from the list.
So there is another inbuilt function which is called sorted.
It will take a list andwill sort it for you.
Let's try it out.
Though it will automatically sort it in the alphabetical order here.
Okay, so if you look at theoriginal list and you look at this one, this one is sorted now.
Another thing is that if youwant to reverse the list, so let's suppose for somereason you want to print it in reverse order.
You sorted it, you want itsorted for most reasons, but then just by printingthe names, or let's suppose there's a functionality inyour application where the user can go to A to Z and Z to A.
Course in list four andthe syntax is colon colon and then negative one.
So if you look at this,this has printed the list in the reverse order from here to here.
But the original list is still the same.
It's a neat little trickwhere you can use the reverse order of the list withoutactually reversing the list, which makes a lot of sense.
I mean, why create copiesafter copies of the same thing.
So you can just reverse thelist and use it just like that.
Now, another interestingthing is that you can have a, that the list in itselfcan contain dictionaries, tuples, other lists.
It can just go on and on andon and to whatever operations you can just keepchaining it on top of it.
So if you look at this overhere, it is selecting the tuple first and then on the tupleit is doing the slicing.
So whatever it is that you want to do, you can chain this together.
Tuple can contain more tuple.
There is no sort of limit on the amount of nesting you can do.
It is really up to your needs.
If you think that okay, thistuple will contain further more tuples, you can just keepadding square brackets, one after the other and it will work.
So this is a more visual representation.
Zero element, first element,this has zero, one, two, this has zero, one.
Now you can add further underit more and more and more and it can just keep continuedoing that repeatedly.
Now the thing is is that eventhough it's a tuple right, I mean you can modify the tuple even though it's inside a list.
So it's not that any mutable item will suddenly become mutable ifit's inside a mutable sequence.
Let's look at tuples.
So a tuple is a sequence ofimmutable Python objects.
Tuples are sequences justlike lists except that they with start with regularbracket and similarly they are separated by comma.
So a tuple has slicing,updation, length of the tuple repetition, deletion, concatenation.
All of those things areavailable as operations even on a tuple.
A tuple is somewhat faster than a list and in one of the areas whereyou might want to use a tuple and you might want toalso use a tuple when you have to deal with valueswhich cannot be changed.
So as you talked about last time right, immutable scientific contents,things that you do not want to change, the value of pi, just refresh your memory about that.
If there's a set list ofcountries in your application that you do not want to be modified, that is when you will use a tuple.
So basically if you havea constant set of values and you want to iterate through them, that is when you would use a tuple.
Let's open the file thatI have ready for this.
Let's run this and let's see the output.
Now length three, makes sense right.
Maximum is Python.
So the way it chooses amaximum is with a starting of the letter.
It's ascending order andthen there is a minimum which is Hadoop.
Now what if it had anothervalue, three, inside it or two inside it, what will happen? Let's see.
What will be the min and the maximum? Min will become as two.
What if this was a big numberlike this, what will happen? So chain will always win.
So it's looking at our skyvalues and that's how it's determining the maximum.
So again sorted and reversingworks exactly like strings.
Multiplication andmembership checking, again, exactly as you have seen till now.
You can do a repetition, youcan do a membership testing, just exactly the same.
Now of course, you cannot update a tuple.
That will throw and error saying that.
So this will not work at all.
If tuple three was alreadydefined, but of course if tuple three is not defined you canadd the two things together.
Of course, this will throwan error because tuple is not defined over here.
You cannot delete a variablewhich has not been defined.
This is something we coveredin last chapter as well and we went through a coupleof examples where lists are mutable, where tuples are not.
Now, as you saw thatlists can contain tuples, a tuple can contain lists as well.
Cuts both ways and then ofcourse you can chain it together.
So the lists in excel cancontain multiple lists.
Same visualization again, zero, one, two, then zero, you have zero, one, two.
Within one you have all these elements.
The list can extend furtherand can just go on and on and on like that.
So don't be confused about this.
You're actually updating the list here.
You're not updating the tuple.
The tuple updation would meanthat you're doing it on zero.
So let me show you via an example.
Let's suppose you have atuple here, zero, one, two, one, two, three and four, five, six.
If I try to do this, let me run this.
This will not work.
However, if I try to dothis, this will work.
Do not be confused aboutthis because what I've done, this thing has been doneon this particular element, which is mutable.
This is not mutable.
This is mutable.
This is basically a tuple element, what I've selected right now.
Now this is a tuple element.
The tuple element initself cannot be changed, but what is inside thetuple element is mutable by itself, then it can be changed.
So you can convert yourtuple into a list as well by using the list keywordor the tuple key word.
You just pass it, the tuple or the list, and it will just convertit and give it back to you.
So I can do this where I cansay in tuple, print tuple item and print item.
Notice the difference? Just that of brackets, butof course, this one is not, one is immutable, oneis the other way around.
I can of course always do this.
I can do this and then I can run it and now this entire thing is a list.
So it's just helpful toknow that these conversions are possible, not that you willnecessary need to use these.
Next, let's look at strings.
As you look at the strings,you can create them by enclosing them within singlequotes, double quotes, or even triple quotes.
The operations availableon strings are again, slicing, updating, concatenation, repetition, membership andthen reversing a string.
I know a string, it's a littledifficult to sort of imagine a string like a sequence, butyou really need to think of it pretty much like a tuple, wherethey're individual elements which cannot be updated.
Start thinking of a stringsomewhere like a tuple where, but of course the differencehere is that the members of the string cannot be a list.
So it cannot contain anotherlist or it cannot contain another tuple, naturally, but it can contain anyalphanumeric characters.
You need to think ofit really like a tuple, where each of the individualelements are not mutable.
Of course, strings are themost popular types in Python.
Single, double or triple quotes.
Let's try out a coupleoperations on strings then.
So you have a string, youhave a length on the string, you have string one, two, three, and this is membership checking.
Let's run this.
Okay, so string one gets printed,string two, string three, pretty standard stuff.
Length of string here is six.
String one, two, three isy, t, which is starting from one, two, doesn't go up to the third one.
T in string is true.
If it was a Z in string, then it's false.
We can check furtherindividual characters.
Now if you notice, thisbehavior yeah, I mean of course now it seems like a sequence.
It seems like a tuple.
I can do this.
So if I was to give you,let's suppose I do this and I replace the string with this.
Let's see what will be the outcome.
Of course this one isgoing to not represented like a string.
The length will be the same.
Yeah, but the representation here, the slicing representation is going to be somewhat different.
It's just in principle, likeit's a similar, you can use it as a clothes analogy, it'ssimilar but not the same.
Because when it slicesright, it is giving you y, t, and over there you weregetting what is again, a tuple in return, even with the slice.
Next is place holders in Python.
Now these are very, very powerful.
What they allow you to do is, think of them like fill in the blanks.
This is like a blank and you're filling it with a string on the outside.
Now this is a better way todo this would be like this, where you have a variable.
Now what is the alternative to this? The alternative to this iswelcome to plus course name.
Now this is messy, I'll tell you why.
If I have to write this,you're getting it that it's right in the middle of it.
There's a space here, I haveto take care of the period to end the sentence, haveto give another space.
It's not very readable.
If I would do the same thinghere, I would just do this, which is more readable.
Compare it for yourself.
Now you know what you're going to print.
Over here under the placewhere it even more helpful is when you have different datatypes.
Because if you are to doit without using this, only way to get it done, you have to use string to convert it because otherwise this won't add.
Python wouldn't add a string and a number.
It cannot concatenatea string and a number.
It can add two numbers, canconcatenate two strings.
This is again, very messy.
Compare this and this.
What is more readableto you as a developer? Which matters, it matters quite a bit.
The output is the samebut there's a difference.
This is more readable.
Think of these like fill in the blanks and these are the values.
Now this way you have alot of options, %s, %i, and they are pretty exhaustive.
So %c is any character.
%i is a signed decimal integerand the list just goes on.
There's of course, a decimal as well, %f.
I would suggest that you go and explore, but most of the time you'llbe using the %s, %d or %f.
Now there are some certainbuilt-in string methods.
One is a capitalize, itwill of course as it sounds, it will capitalize thefirst letter of the string, not the other ones.
Then there is string.
Count,it will consider the sequence of letters or a singleletter that you give to it, start counting the number oftimes it occurs in the string.
Next is encoding.
Encoding is basicallythe process of putting it in a certain format sothat different computers can support it.
The most popular incoding right now is utf-8.
In case you don't know about it, go ahead and please explore it.
It's not exactly withinthe scope of the course, but just know it's the waythat different computers across the world represent characters.
Because if you look atoutside the English language, you look at the Europeanlanguages, you have the accents and look at the Indianlanguages, which have all sorts of characters right, how isit that different computers sitting across the worldare able to transmit that correctly, are able to represent that correctly on the screen? Or the Chinese characters? That is through utf-8.
So whenever you type something, I mean, as you're transmitting over the internet, it gets encoded and decoded.
That's what encode is all about.
Max and min will basicallywork as per the ASCII values.
So max here would give you the value of U, which is the highest ASCIIvalue and min will give you the value of A, which isthe lowest in edureka.
So what does replace do? Replace will search for aparticular letter and replace all of its occurrenceswith the replacement given as a second parameter.
This is the first one andthis is the second one.
This is the number oftimes it should replace.
If you notice the output, itdid not replace the second e, it just replaced the first one.
Upper will capitalize all the letters.
So capitalize is capitalizingonly the first one, but upper will capitalize all of them.
Next is index, now same asfind, but there is an exception if index is not found.
So it will find you theindex of the letter k in the entire sequence.
It's kind of like searchoperation in a list or a tuple where you are trying tofind which is the position of a particular elementor a particular item in the sequence.
So if you look at k, zero,one, two, three, four, five, k.
That's where k occurs in edureka.
For the more stringmethods, of course reversal, slicing, you've already done this.
Now find is similar to string,however if it doesn't find anything it will not give you an error.
So let's see, it gives you a negative one, it did not find anything.
Whereas what the other onewill do, which is index, it will throw an error that not found.
So index basically worksunder the assumption that you, the developer or whoeverit is in the logic, has already checkedwhether it exists or not.
It is not going to helpyou out in checking.
So find is helping you determineif it is present or not and if present, where.
Index is just going to tell you where.
It is not going to, it isgoing to throw an error if it doesn't exist.
String concatenation, thisis something that you have already seen and string one start two, that'll just multiply it.
So happy learning, happy learning.
If you look, notice this, right.
It gets multiplied and prints the string.
It gives you the string,just two times over.
Next is sets.
Set is basically an unorderedcollection of unique items.
Please notice that theitems have to be unique and set is defined by values separated by commas inside braces.
You can also create a set bycalling the inbuilt function, set, pretty much like how wesaw with lists and tuples.
Quickly run this and see what we will get.
If you see that we have got in a set here, containing the various values.
They've automatically been ordered.
So the question is whenwould you use sets? The answer is that if youwant to collect unique strings or integer values from a sequence, that's when you will, becauseit doesn't repeat them.
If you notice over hereit is not repeating any of the elements, even thoughe appeared multiple times, we only see a small e and a capital e.
So if there are multiple duplicates, it will sort of give you the uniques.
One example is let's supposethat college administration is faced problems becauseduring information feeling students are enteringsame password and ID.
Now, sets support uniqueelements and we can convert the list of IDs and passwordsinto sets and get only the unique ones becauseit will just collect the unique ones and makesure that the duplicate ones aren't entered into the database.
Now there are variousoperations available in set.
One of them is the union operation.
It is one way using this pipe operator, so it's available on your keyboard.
It's available on differentlocations on different keyboards but usually it is availableon the backslash key and you need to use it with the sift.
Now, if you have this sortof a set, which is one, two, three, four, and then another set which is three, four, five, six, and you do print A union B,you'll get a union of the two.
But again, no duplications.
It will join the two togetherwithout any duplication.
Look at the example.
So it joined them together.
Similarly, you have A and B.
It is the intersection.
It will give you commonones between the two.
So common between thesetwo are three and four.
That is what it is givingyou, the common ones, three and four.
That's the intersection,so you could also imagine it like a Venn diagram, thatit is a common intersection between the two.
Then there is the differenceas well, which is A minus B.
All of the values in A, it should not the values inside of B.
So let's try this out as well.
Just try to think ofwhat the answer might be and let me run it.
So A minus B is one, two, three, four, five, six, seven, eight.
This is B, the intersection is B.
You can also think of A as like this.
A minus B is also like, bothof these are going to be same.
The common ones betweenA and B gets removed.
Something very similarto what have been seeing in sequences, membership testing.
You can also do a subsetcheck where you can check if a set is within another set.
There you need to imaginethat B contains A.
So A is a subset of B.
Then of course, with membershipchecking you can also do the opposite, not in.
You can do super set.
So for set and super setlet's try it using this.
Let's see if set one isa subset of S or not.
The answer is true, it is a subset.
But if you add a value ofC, then it is not a subset.
Subset is that it willbasically all the elements of set one are found withinall the elements of set S.
The next check is that if Sis a super set of set one, where it contains all theelements of set one as well, which is true.
Now if we mark it as C and now we run it, then it is false because Sdoesn't contain all the elements of set one.
However, if we set itas C and now we run it, then it's true again becausenow, the S set contains all the elements thatare contained in set one.
Now, similar to append, sethas something called add, where you can add an element.
So, let's try it out here.
You have added an element andit automatically ordered it.
If you want to remove anelement a certain index you can do that as well.
This is we'll try removingone from the set S.
Then you have discard as well,again you provide an index.
The difference betweenremove and discard is that remove will remove theelement from the set.
Discard will remove itonly if it is present, which means that itwill not throw an error if the element is not presentor will remove to an error.
Pop works similarly to thelist.
Pop, where it will remove the element and give it back to you.
Clear will remove allthe elements from the set.
This can create an empty set.
So let's try s.
Pop as well.
What you have done now is.
So it will basicallyremove one of the elements, which is the first oneover here in this case.
So you can do a print as well.
Right, it will removeday and then the set was one, two, three, B.
Next is dictionaries, let'sgo through it in depth.
Dictionary is an unorderedcollection of key-value pairs.
It is generally used if youhave large amounts of data, which basically is mixed, ifit's not continuous values that you want to store in an array.
If it's like details of abouta person, where the data about a person can take various forms.
So just think of your facialprofile, or think of your any social media profile.
You can have your name,your last name, location, you can have your interests.
So you know while and varieddata, very hydrogenous.
You could have your age, youcould have you birth year, you could have your jobs, thenumber of jobs that you're in, even like your current job.
So all of those things, sothe dictionary you will use in here, hydrogenous datapresenting a real life situation or a real life entity.
When would you use dictionary? So one example is if you wantto create, let's suppose, any sort of you know, arrangementof data where you want to store the name, the Aadhar card number, social security number.
Any Excel sheet that you'veever created, that's kind of an example of a dictionarywhere each of the header is the key and each ofthe rows are a value.
Name would have multiplevalues and the key is name.
So when the questions thatyou ask is what is the name of this particular personwho's at the 10th row, what is the name on the 10th row or the social securitynumber of the 10th row.
Then the answer is thatokay, social security number is one of the inputs and that's the key because you can go row byrow but then for checking which data corresponds to what sequence of character represents.
That depends on the header.
So it might be filled withdata and contextualization happens through the header.
So very simple, this is howyou create a dictionary.
You are given one and two soit can have integer values as well, but typically itwill have string values because integer values aresomething that are used in lists as well.
So it doesn't really help out that much.
The point of a dictionary isfor it to immediately make readable sense.
In the sense that if I wereto create a person dictionary, I would say first name,say last name, age.
I would say something likecity and then what dictionary allows me to do is accessthe elements by keys within the square bracket, which is much more readablethan you know, integers.
So if it was an array and Icould do something like this sort, let's suppose same thing you know, but let's suppose it was like this.
Let's compare the two, is thatline 56 is much more readable to you as a developer, ascompared to line 58, 59.
You wouldn't know whatperson two actually means, whether it acts as the age or not.
It causes all sorts of problems.
So it's very, very importantto understand which data structure to use this tuples,sets, you can have your way with it in a way.
Like in the sense that,think about it that if you have to put a nail in awall, you can use a rock but then or you can useso many other things.
You can use a hammer, but thenyou can use a screw driver.
But if you were using a hammer,so essentially your hammer can't do what a screw driver does.
A screw driver can'tdo what a hammer does.
Using the right tool for the right job, that is where the differentkind of sequences exist.
Because they do differentkinds of jobs based on where you're using them andwhat you are using them for.
Updating is very similar,you just take the index, you just take the key.
So if I were, for example, update the age.
Let's suppose I do thisand then I just do this and you will see that this will work.
Yeah, now the age is updated.
So it's a mutable data structure.
I can delete using again, the del keyword.
This comes the del keywordwhere I tell it which particular key to delete.
So I will say hey, delete citykey, I don't need it anymore.
Print person and we'll run it.
That's it, the citydisappears, I've deleted it.
Some of the built-in functions,length of the dictionary, which is number of keys,basically it will go by the number of keys not bywhat is against the values.
String will basically giveyou a string representation of the dictionary.
This is useful especiallyfor doing web development and you're dealing withJSON data structures.
Type is just in word methods in Python, which will give you thetype of any variable, if that's a string or a tuple or a list or a set or a dictionary, in this case, or an integer or adecimal or something else.
Given a class object it willjust give you the type of it.
There is one thing, ifI try to print something which is not present, so let'ssuppose I deleted person city and now I'm trying to access it.
Say it will throw an error.
You will say very commonlyin Python key error.
Where the string particularlythat here this key doesn't exist.
What are you trying to access? Like, I don't know whatyou're talking about.
Now, in case you don'twant errors to come in in this instance, it'snot a way to avoid errors but it's a safer approach.
So what will happen when you do get it will return null or none.
You can still put checks against this.
You can say if person.
Getcity, you live in and just use this, person get city.
However, the problem is thatso you can't do the same thing with this because themoment it checks this line it will throw and error.
But this is just ignore it.
It can do an else, print, city not found.
City for this person was not found.
Now, there are certain veryuseful methods, one is.
It will return allitems in the dictionary.
So if I want to get person.
Itemsand then another method, which is keys, whichwill return all the keys in the dictionary.
So let's run this.
Items will basicallyreturn a list of tuples with the keys and values inside it.
Keys will just return the keys.
This is first name, last name, and age.
That will present inside the tuple.
Then there's of course,dict.
Copy where it can create a copy of the dictionary and dict.
Clear, which will empty all elements.
Copy's basically just you canalways do create a variable but copy's just like asafer way of doing it, other than just assigningto another variable because it does a deep copy.
So the thing to notehere is that what happens with dictionary sometimesis that because dictionaries in Python, okay, they are not ordered.
So you cannot predict forsure if I create word, it'll come in a certain order.
So let's just see what we'll get here.
So of course, it iterates over the keys, it doesn't iterate over the values.
But there's no orderhere, if you look at it.
It's city, first name, last name, age.
There's no discernible orderin which it has done this.
But if you wanted to do it in order, you wanted to iterate it inthe order of the keys there, you want it to follow a certain order.
This is how you would do it.
You would get the keys, you would sort it and then you would iterate over it.
So this is when you wantto go the dictionary in a sorted fashion.
Your dictionary can containdictionaries as values.
It can contain tuples.
It can contain lists.
It can just go on and on and on like that.
The way that you willaccess it is of course, by a square bracketnotation and keep accessing till you get what you want to get.
(pleasant instrumental music) Why do we need function? Let's suppose you need to writea program which calculates the factorial of a number.
That is, if you're given a number four, then it just calculatesthe four factorial right.
Just four into three into two into one.
So you keep multiplyingthe same number by itself unless the number is one.
Four minus one is three,three minus one is two, two minus one is one.
You do four into three into two into one.
That's a four factorial.
Now the logic implement thefactorial is given here.
You just input a number from the user.
You set factorial thatyou want to calculate for and if the number ispositive, you keep repeating the factorial, you keepmultiplying it over here, as shown in this particular line.
So you go from range Lto comma and num plus L and you keep calculatingover this again and again and again.
Now, think about it in this way.
Let's suppose you need tofactorial multiple places in your program.
You want to calculate thefactorial of one, two, three, four, five, like different number.
It won't be a fixed number.
You have to do it in multiple places, not just in one file, you have 10 files.
What should you do? Should you copy, paste thecode again and again and again? You can do that but thething is that it will not be the most optimal approach.
As a developer you job isto write code or logic.
Your job is not to copy pastesame thing again and again.
I mean, that's the job of a clerk.
So keep that in mind.
That way, if you startthinking about in this way, why am I writing similarkinds of things or sentences or directions to mycomputer again and again, see if you can combine them together.
What are functions? A function is a block oforganized, reusable code.
As I said, you have to usethe function again and again.
That's where this definition comes from.
The function is a block ororganized, reusable code.
So typically, when you create a function, the intention should be doingone thing and one thing only.
So as you can see here,that if you just sort of put the logic to implementfactorial in a box and then use this box foran input and then an output, like we did in math.
So y=x+2, where y is a function in x.
So in a similar way, yougive input you get an output.
Now this is a very powerful idea.
As I said, wherever youthink that you're writing the same kind of codeagain, multiple times, you're repeating yourself,put it in a function.
How do you define call a function? Let's dive into some code.
So the way to define afunction is the def key written in the name of the function.
Here we want to createa function called add.
Now what happens next isthat it can pass parameters through this function,which is the input value.
So A is an input value,B's an input value.
In this function what it doesit that it calculates the sum.
So of course you could havedone it like this as well.
A is equal one, B is equal to two, and you could have saidsomething like A+B.
That's it, you have sum.
But why wrap it up in a function? Because if you do it like this,the next time that you have to calculate the valueof A or the value of B, calculate for a new value of B you have to write it again.
So it would be better to putit inside a function like this.
Then you can just call itfor the different values that you need it for.
Now this is the function definition.
The last statement isthe written statement returns whatever is the output.
You have to explicitlyask it to return something or whatever it is that you need.
It will not assume by itself.
The reason being, that let'ssuppose you have another variable inside here.
Now, if we didn't know byitself that they are the same sum or dif.
Because instructions canbe a lot in a function.
So you wouldn't know whichone you wanted to say.
So you have to be explicitabout it that you want to return sum.
Now to call a function,all we need to do it you need to write thename of the function, follow the parentheses andthen the values of the input.
So you see, I'm reusing this logic again and again and again.
Let's run this.
As you just saw, I cansimply type print add(1,3) and then print add(4,5).
I just asked to print the output of that.
Let's run this.
See if without parameter.
Let me try with a different number.
We get the output again.
So this is how you call a function.
Now why do you need touse the return statement? So as I was talking about it, terminates the execution of the function.
It also returns a output.
So the other thing that whenyou're looking at the control of the function, where is the code, where is the interpreter? You need to think about itin the way that you called print add, this one.
The control passes to the add function.
With the add function andthen executing one by one, one by one.
Then it returns the controlto line number seven and then the programgoes to the next line.
Then it goes in again,then comes out again.
But use the return statement.
Return is really callingkind of like, think about it, where do you return somebody? It's not just returning avalue, it's also returning the control to the mainpart of the function, where it needs to executeeven after line seven.
It needs to go on line eight.
So there's this concept of main.
Let's assume that you have two files.
One of them is one.
Py andanother one is two.
Let's suppose one has thisfunction and you are printing out function top level in one.
It has this condition, which is checking forsomething very weird.
But underscore underscorename underscore underscore, it's like an invisiblevariable present in the file.
Okay, invisible variablessort of is within the file, which is checking that ifthe value of this variable is main, then this needs to have printed, otherwise this needs to have printed.
Let's take this for a run.
So as you can see, top level in one.
This gets printed, of course.
Py is being run directlywhen we execute one.
So the name wasn't in main.
Now let's see with two.
In two what you're doing isthat you're importing one as as module.
Things that you're doing overhere is that you're printing the top level in two.
Py will be run directly.
So with two.
Py let's see what can get.
Top level in two.
So instead of this line numberfour when we executed two, we first get that toplevel in one.
Now how is this? This is because as soon asyou import a file to another Python file, whatever code is present inside it gets executed.
If there is one file that'ssomething written inside it, it will get executedonce you're importing it.
That means not that it willnecessarily show that it'll cause any sort of problems on it.
The reason why this happensis because certain variables need to be defined.
Certain variables andfunctions, they need to come into the computer'smemory for that file to be sort of present.
I could then load thosethings and that the programs is contained this featureto be used of Python is that people willdeclare certain defaults.
They will want certain things to happen if a file is imported.
So it is by choice thatthe design is existing.
It's not a flaw that if you import a file it will get executed, soimporting file is a bad feature.
So file gets executed.
Now this gets executedand then if you see that this time one.
Py is being imported while another module gets called.
Well, it doesn't mean modulewhen once you are running, might be to the mainmodule, which is calling it.
But since you've imported it from two.
Py is not the main module.
It is the two module presentedtwo.
Py, which is calling it.
Now, then finally weprint top level in two.
Py, then we call the function, one.
Function, function in py and thenfinally, for two.
Py to get that two.
Py run directly.
So same behavior as that over here we have imported something.
So whatever is gettingexecute and the other file gets printed.
So as it says, if one.
Pyfile returns the statements from this only get executed.
Py file runs, firstthe statements from one.
Py get executed then statements from two.
Rather than thinking about it in this way, the statement from one.
Py,you have other statements from two.
Think about it in the wayof, what order is the input statement in? Then put is at the verytop, then grab the file, and then the rest of thelines it imports first, some in the middle, then afew initial ends in two.
Py and then the lines from the other file.
So it's kind of like wheneverthe importing happens.
It's about which place in the code does the importing happen in.
Next let's look at someof the built-in functions.
So built-in functions arefunctions that are given to you by Python out of the box.
These are just some commonutilities or functions that you will use no matterwhat sort of programming language you are in.
Almost all programminglanguages provide for this and so does Python.
So built-in functions are thefunctions which are built-in to already availablePython and can be accessed by any users.
Now, these functions areavailable to you as soon as you load Python for the very first.
So for example, you sawthe print statement.
Actually, the print isalso built-in function.
You did not have to encodeprint or start something else.
It was just available to you as soon as you installed Python.
Some of the common built-infunctions are given here.
Sorted, absolute, all.
We will look at these one by one now.
So talking about sorted, itreturns a new sorted list from an items in an iterable, which is to say that anyarray, tuple or list, any tuple, list or sequences.
That's what it will return.
Let me show you an example here.
So you see, it gave us a sorted list.
So it checks that if theiterable, which is again, the sequence, the list or whatever it is, if all the elements are true.
So it checks them for being true.
So it's not that it meanswe all like, true comma, true comma, true.
It can be true comma, onecomma, two comma, three.
Even one, two, three aretrue values, they're true.
So it's just mode of howPython labels them to be true.
What is considers to be as a truth value.
This could be another datatype.
It could be strings,anything which is not false or none or zero.
These are true values.
Suppose I change this tonone, to none true values.
It gives off two only ifeverything is true t is how Python stores it.
That's the true t value whereit is not the truth value but it is kind of like a positive value.
So think about it in apositive versus a negative.
Let's suppose I changethis to zero, again false.
I change this to false, of course false.
Any, on the other hand, isa converse of all functions.
That is, it will give youtrue t for all the opposite.
So even if one of them is true,it will return a true value.
If you go to the very first example, even then it will return atrue value 'cause at least one of them is true.
But if we give it as false and none, then it will give a false value.
Next is a Boolean function.
It converts the value toa Boolean using the Python truth testing procedure.
The truth testing procedureis the same on which kind of gets applied to all ofthese iterables in any.
Think about it internallycalling bool on zero or none.
Our bool of one is positive.
Bool of bool is also positive.
In the sense that it is a string.
So as you can see, thisis like any or all.
They were definitely usesomething like bool to determine if the value that is beingpassed to them is true or false.
So this is really the atomicversion of any or all.
Next is chr.
So it returns a stringrepresenting a character whose Unicode point is the integer.
Let's see what it does.
So it basically takes aninteger value and it returns ACII character out of it or theUnicode character out of it.
It goes to 255.
Now what it does is thatit opens a file and returns a corresponding file object.
This is on that we willdo in file handling.
Then there is an abs function,which returns the absolute value of an integer, floatingpoint, or complex number.
So it takes the negativevalue of an integer or a float or a complex number.
Python does have complexnumbers, by the way, and it will return theabsolute value of it.
Enumerate returns anenumerate object with items and index values.
We will check enumerate in a little while.
It's further along in theslides, we'll come back to it.
Then there is an int function,which returns an integer object constructed froma number or string x.
So what int will do is thatit will take the string of one, two, three and turn it into this.
ABC, it will not do anything.
So only when numeralare returned as strings.
Then it does something to them,otherwise it will not work on any of the specialcharacters or whatever.
It will only work on this kind of thing.
So sometimes you might need to convert a string into integer.
Then len returns a length of an object.
Length of an objectmeans length as in array or enumerate.
Length of this, a length ofABCD or length of a tuple.
All of these.
Globals is something we aregoing to cover in a little while so let's leave it for that point of time.
Bin converts a number intoa binary string prefixed with a 0b.
If you take this, you get a binary string, which is prefixed by zeroand B, still zero and B.
Now eval is a very interesting thing.
It executes whatever youhave returned within strings and passes it to as a parameter or executes it like its Python code.
So in the sense that if I typeeval print four plus five, it corrects like Python code.
So it's throwing invalid syntaxes because it's a securityflaw, security problem.
So right now in my interpreterhas the securities turned on where it will not let meexecute anything which is a little more than arithmetic.
That is how usually you'll findthe setting of Python to be, that sort of (mumbles).
Now the thing is that eval isused to execute Python code and the reason can be dangerous is that let's suppose eval somethingand it deletes all the files from your machine becauseyou just allowed a user to put in an input that was fed into eval.
The user was a malicious user.
They end up making it executetheir code on your server.
Whatever they want to dofrom your Python code.
Next is sum.
Basically iterable that you give it, which is an array or a list,and it sums up all the numbers.
So if I do a sum of one,two, three, four, five.
Reverse, on the other hand,will as the name suggests, just reverse it.
So you have to do this with a loop.
You can't do this without the loop.
It does use an iterable object.
It doesn't give you anactual list over here.
Let's look at the enumerate function.
What the enumerate functiondoes is that it returns in object to the value that,once you pass through it, should be in iterable again.
So you pass it in iterableand then it returns a list of tuples and the tuplescontain the index values and items from the list.
If you notice over here,what you have done in that we have passed the listgrocery, bread, milk and butter.
What enumerate will dois that it will print the enumerate objects over here.
If you convert it into alist as zero comma bread one comma milk two comma butter.
Let's take it for a run.
So first is that they enumerate.
This thing returns andobject, enumerate grocery, the type of which is enumerate.
This is basically a functionenumerate, which is returning an object with a type enumerate.
Then you can convert it into alist using the list function.
The list function takesan enumerate object and it will give youthen a list of tuples.
Each of it contain the indexand the value of the index.
Let's look at a couple ofmore examples of functions.
So when it comes to functiondefinition, you define it by written name, whereyou do some operations, take the input, do some operations, and you return an output.
Call a function, you just need to call it, just use the student'sname followed by bracket.
Finally, with the resultin whatever form you want.
Let's look at lambda functions.
Lambdas are basically very small functions for a quick or small operationwhere you do not want to define an entire function.
So it is a shorthand notation.
It is the shorthand way ofwriting a particular function which are not more thanone expression in line.
So they cannot contain commandsand they cannot contain more than one expression.
So once you look at the syntax, things will become a little more clear.
They can take any number ofarguments and returns the value of a single expression.
So the syntax goes likethis, lambda is a keyword followed by the input.
I know that there are nobraces around this over here.
So the way you need to lookat this expression is that it's lambda then zed.
This is not the correct syntax, but I'm just trying to show it to you.
Then, what it must do to zedand return at the same time.
This part, or this part,is what would be returned by this lambda function.
This is the input and thenoutput, where we are saying that for this input, give this output.
Now notice that how you'vewritten this in one single line.
You could have also return it like this.
So even this would work.
You could still call lineseven, nothing changes.
But the difference here isthat this way is just a concise way of writing it.
Then the variable thatlambda's assigned to.
So basically what we'redoing is that we're taking a function and we areassigning it to a variable, like we will do even otherwise.
If you were to define a functionanswer and then it would basically be used from here on afterwards.
So like if you had to call this function, you would call answerin this particular way.
Even to call a lambda, whatI would have done is taken the expression of lambda andI would assign it to variable.
Now let's suppose youneed to print the answer and this is what we do to execute it.
So pretty much callingit like a function again.
We got out answer as 28.
We could do and run it again,the answer would be eight, two multiplied by four.
Now this is still a named lambda function, but then one of thebiggest uses of lambda, the reason why it exists,apart from it being concise, is that it allows you topass out other function, a function as a parameter.
So if you had, let's suppose,these typical functions that exist in any programming language, map for alternative use.
So what map does it thatit takes a list of items and it maps them to certain thing.
So for example, to show it to you, first let's look at theresult of this and then work map is doing.
Map, what it has done, isthat it has taken each value in one, two, three, four, five.
If you can just sort of compare.
It seems that it hasmultiplied them by themselves.
One multiplied by one.
Two multiplied by two.
Three by three and fourby four and five by five.
So it's squared them.
Well, this was mapped tothis, where you can see that these set of outputs, they camefrom each of the individual elements in this particular item's list.
You can then the same index.
So because they're in thesame index in both the lists, you can sort of map them.
Now this is what the mapfunction essentially does.
Now, what we can do is thatin map function the first parameter is what do youwant the current list to be mapped to? In the sense that how doyou define the mapping? Does it have to be a square every time? What if you wanted a cube? So what lambda allows youto do is that it allows you to do these very neat thing.
If you notice, this is veryconcise as compared to writing a function def squared,x, it'll take items.
Writing for item in items.
So one is that you write all of this or you write it like this.
There's a definite neatness over here, but the reason why it takesa function is an object.
Now doing it through lambdais way better than writing a function like this every time.
The next of these sortof functions are filter.
Filter will check each itemin your particular list and apply a filter to it.
It will remove some itemsfrom the list which do not match a certain condition.
So we took a range ofnegative five to five, which is the numbers arefrom negative to plus five, ten numbers.
We passed it to filter.
Now what filter does isthat it takes the list, applies this to each andevery individual one.
Each and every individualelement is applied to go through this lambda function.
Now this lambda will eitherreturn a two or a four.
What filter does is thatit keeps the true values.
So for whatever items in this list, this expression turns out to be true.
This filter function willtake those elements only.
If x is less than zero,only for negative five, negative four, negativethree, negative two, and negative one, wherex are these values.
That's what it going to return.
Final is reduce.
So what reduce does isthat it takes a list, applies some operation to itand it basically gives you a final, single output out of it.
So if you keep applying thesame output to again and again it's like decrement operator.
Think about it like a decrementoperator, which takes, let's suppose I do fourminus minus and I keep calling it again and again and again.
So what did you do thatit will reduce the number from four to three to two to one? So where it takes theprevious output as an input and keep applying it to thetime the list it reaches the end of the list.
So let's see what this is going to do.
24, so how did this happen? It took one and initially this was one.
So one comma one.
One comma one is one.
Then one comma two came in.
One was multiplied by two.
Now the value of y is two.
Three comes in, the valueof three's turned to a six.
Now six comes input andfour comes as an input.
So basically multipliedall of them by themselves.
Now we can do an interesting thing.
Let's see what is the result of this.
We're going to square thefirst number and we're gonna multiply it by y.
So let's see what is thevalue of y because our output will depend on that.
It tells me what is being squared.
So when we square this you get this value.
But if you look at this one,the value will be different.
So as you can see that youcan take two inputs as well for a lambda.
So any number of inputs arepossible but only one expiration and only one output.
Let's look at scope ofvariables in Python.
So Python has two scope ofvariables, global and local.
Global variables aredefined the file layer.
They get assigned assoon as the file is run for the first time.
Local variables on the otherhand, are defined within the scope of functionsare (voice cuts out).
So if you look at this example over here, a is present everywhere in the code.
So a can be used even by thisfunction and turn it to the global, because it itoutside the function.
But if I also use b outside the function it will throw an error.
As the slide says, the localvariable present over here is b, which is inside thefunction and can be only used within the given function.
Here's another example, so whenyou print a, it is outside.
But if you try to printc, which is inside add, it is not available toyou because it is not a global variable, it is a local variable.
So everything which iswithin add is local to add.
Next is memory address.
If you have two variables, aand b, you can know the memory address of either one ofthem through this way.
So if you want to know thememory address for variable in Python, all you needto do is call the in-built id function, which willtell you the memory location of that particular variable.
So let's try it out in the terminal.
Let's say I have a = 1, b = 23, and then I do id(a) and id(b).
As you see, the numbers are different.
In terms of where would you usethis, it would really depend on what you are working onor what you're building.
Usually it is not somethingthat you need to deal with on a day to day basisbecause these things, like id(a) and id(b), I mean,this is relevant more when you're dealing with lowerlevel things in Python, but not if you are doing datascience or stuff like that.
This is not really somethingthat you would end up dealing with unless anduntil you're making something to memory management in particular, which is any already done for you.
Next is function arguments.
So the number of arguments in the function that should exactly match.
These are the required arguments.
So if you look at this function over here, string has to be passed tothe function for it to run.
Otherwise, it will throwan error that print name missing one requiredpositional argument str.
Name of the argument thatit is expecting and that in the middle call, it is missing.
This is basically sayingthat it needed input which you did not provide thesame and so it cannot function just like that.
Next we have somethingcalled a keyword argument.
When you use a keywordargument in a function the caller identifies theargument by the parameter name.
So keyword argument works likethis, where I can pass this particular argument thatI'm saying that Annie is the value of string.
Now, it doesn't really hit you over here, but let me show you anotherexample on why this is useful.
Suppose you have a function add, which takes full parameters.
Now, then suppose it isnot just a simple function.
There's something that's the sum.
Instead, define a new functioncalled random function.
A, B, C, and D.
It does a sum of A**B, A**D, plus C**B and it returns the sum.
When you're calling the random function, you might need to remember hey,okay, what do I need to do? Do I need to erase the path five? So you're gonna have to map it, see what is being used where.
Instead if we just label it,where you see that d = to this.
Consider c as three, consider a as two, consider b as five.
Now you're telling Python thatthis is what needs to map it.
So it will not then be dependent on the original sequence only.
It will take it as you have passed.
This is super usefulbecause as you can see, now I can call thefunction in any fashion, as long as I have used thefunction parameters clearly.
The order doesn't matteranymore and in fact, I would recommend this toyou that rather than calling a function sort of, let'ssuppose, this particular way, where we could call itlike this as well, five.
But I would not recommend calling this.
I think everybody shouldstrictly just follow this particular approach.
Much cleaner, lot of benefits.
Makes your life as adeveloper much easier.
Now, the default argument.
I only wanted d to have a value of three, even if it is not supplied.
So I could just do this andI could remove the value of d all together.
So I can pass the value of d like this.
This is also correct, but whatI've done now in signature is that I have given ita default value of three.
That d should always have the value three, even if it is not passed, complicity should have a value of three.
So another example here,where age has been predefined.
Unless they provide data explicitly, particularly for a value of 50, which is being supplied whenyou was creating the functions.
So you're setting thedefault value of a particular function, naturally whenyou're defining the function.
So another example is of salary.
So where there is a basicsalary, let's suppose, 10,000 rupees that isgiven out any which way, irrespective of who the person is.
What if the salary's not defined? The reason these defaultargument exists is either because there are particularparameters is going to have a default value, or in most cases, it is going to have a set value of a sort.
What you need to do is thatonly in certain scenarios you will change the value.
So really for variousreasons, people could be using something of this sort.
So sometimes you may notknow how many parameters are required, how manyarguments are required and you might need variablelength of argument, where you can pass itin number of arguments because you just don'tknow how many arguments will be needed.
So let's suppose you'rebuilding a very general mathematical calculator in Python.
If you think about multiplication,you don't know if the user's going to multiplythree numbers or four numbers or five numbers at a time.
They might just inputseven cross five cross four cross three or likethe factorial example, where we had to repeatedly multiply.
Any operation thattypically be done repeatedly or have a repeated output, thefunction receiving the data in this way can probablynot define that it needs what all number of arguments it needs.
So to handle those cases,we have this start operator.
So as soon as you placeit in front of a variable, in front of a parameter overhere, you can pass a number of values to this function.
See what could happen is thatyou might think that okay, this second one is a weirdkind of way to call it.
But no, even the first one is correct.
Where this can be zero or and arguments, there is this option of info Annie Dave.
It could also have a thirdparameter and that second and third parameter wouldhave been accepted here as a list that you cansort of iterate over.
So let me show you a real quick example.
So I've defined a function print users, but it's similar to one that is all here.
Let's suppose edurekadefines one of these.
Then the other users are,let's suppose, admin, ceo, cto, manager.
Let's call this.
Getting back to this onceyou run it, as you can see, the first user argument isedureka and then you sort of see variable length argumentis ceo, cto, admin, so on and so forth.
It is received in thesame order with here.
Now another way ispassing keyword argument.
These are star arguments thenthere are keyword arguments where you can pass the same thing but with a key value place.
To look at this at this arg1, arg2, arg3.
One, two, three, four,five, six seven ar args and then you have these kwargs, which are named sort oflike name parameters, but cause they are received after args, they would be availableas a dictionary over here.
If you look at non key word argument, they're available as a tuple.
Now Python doesn't internally.
Again, the purpose of thisis that to make the functions flexible to receivedifferent kinds of inputs where there's a lot moreflexibility in this mechanism as compared to fixing whatthe function can be like.
So this also, this are theOOPS concept of polymorphism, where you can call afunction in multiple ways.
It's just about the way you'recalling it that the output will be manipulated.
Now let's look at the differencebe procedural oriented programming and objectoriented programming.
So in procedure oriented programming, the program is divided into functions.
That's the functionsis the central concept when it comes to procedureoriented programming or functional programming.
In object oriented programming, program is divided into object.
Now POP follows the top downapproach in that it views entire problems from the verytop and it builds downwards.
Where it says that okay,I need something which can give me an output of thissort, which can then further be divided into okay, thisthing needs to do these five things, so let's makefunctions out of those.
You start from the very topof it and you keep adding blocks smaller and smallerand smaller in size.
OOP follows the bottom upapproach, where it looks at the different buildingblocks that are needed, defines them as objects,defines the relationship between the different entities, andthen starts building a system.
Now in procedural programming,data needs to move freely from function to function in the system, as definitely is the case.
In OOPS, objects needto move and communicate with each other through member function.
Some examples of POP are C, Visual Basic, FORTRAN and Pascal.
Examples of OOP are C++, Java and Python.
But the thing to noticehere is that even Python, for example, through its lambda functions and through sort of otherways and mechanisms, allow procedural programming.
It is not that it doesn'tallow procedural programming, it is more for style.
So ultimately, these are thestyles of how you program.
No language enforces these, butit rather gives you features to do those things with it.
So think about it in this waythat a car has many features right, but you might notuse all the features.
So the car can drivefast, it can drive slow, it can drive in reverse,but you may or may not use all the features.
So driving fast, so some people don't like to drive fast at all.
They will never hit the maximum speed, but the car can actually do that.
In a similar way, programminglanguages have features built in.
If they allow for you to doa certain thing in a more easier way or they'remore oriented towards it.
That doesn't mean, however, you cannot make it do something else.
Of course you need to beexperienced enough to make it work like that.
Let's look at a fewobject oriented concepts.
So Python is very again, object oriented.
It supports it, it has aextensive class mechanism and structure that we're going to look at in just a few minutes.
What object oriented does is that it makes your code reusableand makes it possible for you to write less and less code.
It also relates very wellto the real world scenario, where you can sort of say thatimages when you think about the word object, a numberof things come to mind.
So everything is an object.
Like in the sense that a car is an object, a jet is an object, a table is an object.
Now given that everything is an object, it makes it very easy foryou to kind of imagine or model the real worldin your application.
Naturally, if the codeis reusable and you know, all of these nice features are there, the programmers, they willproduce faster, more accurate, and better written applications.
However, Python OOPS isdifferent than other languages.
Naturally, I mean, that'swhy it's a different language to start with.
For example, in Javayou cannot create a file without defining a class.
So like there are differencesand Python is an interpreted language as we know.
It executes code line by line.
That's no the case with Java.
So Java is a compile language, where Python is a interpreted language.
So Python OOP is different than other OOP.
Let's look at a use case.
SRT is a multinational company.
This company wants to createemployee information sheet.
It should include the name,employee ID, and the progress.
Now let's look at anemployee, as weekly, monthly, and his progress column.
But it will be difficultto create separate sheets for every employee.
So instead of that, theydecided to create one class of employee's information,create object of every employee, call class for the new employees.
So this is basically thearchetype, where the defined, hey this is a blueprint of a class.
This is what an employee has.
He has a name, he has a firstname, he has a last name, he has a salary, he hasan age, he has a gender, and this is his role,this is the department.
So, all of these thingsare called his attributes.
If you look at it right,that every employee typically would have those attributesin the real world as well.
So a class is nothing buta blueprint of a real world object, where we try to modelthat object by using classes.
So when I say model, I meanwe try to describe it in code.
When it is the word model comesin, men actually try to be a little formal with things,especially in the context of science and programming.
When you're modeling a certain scenario or a certain situation ora person or an individual or an object, in terms ofEnglish, it is more called as description.
So when you talk aboutlanguages, it's like describe a ball to me, but then youtalk about programming, they are fancy people, sowe say let's model the ball.
So modeling the ball reallyinvolve describing it, during its height, itsweight, its diameter, sorry, color, its shape, its material.
The same thing goes here, when you're trying to model an employee.
What all is it about anemployee that is relevant to us as an organization? So similarly right, anotherthing is that employee can be one particularthing, but they're defined by certain parameters butthe attributes will change from person to person.
Because at the end of theday, every person is unique.
Similarly, class is just a blueprint.
So if you look at acycle, right, or a bicycle or a two wheeler.
A two wheeler's basically the blueprint of having two wheels and having a brake.
That is the basic blueprint, right.
And from that it can various objects, whether it is an automatedvehicle like one of these, or it is not automated vehicle like this.
It is red color or greencolor or pink or red color, like it can keep changing,but doesn't change the fact that this is still aparticular implementation of a particular instanceof the cycle object.
So like a human being for example, the anatomy of a humanbeing is the same across all the seven billion people in the world.
They're not me, it's thesame, that is a blueprint that you will havecertain number of muscles and certain number of bones in your body.
But then still, each andevery person is different.
So this is what, this is theanatomy of a human being.
This is a variation thatcomes from this anatomy.
Now let's talk about relationshipbetween class and objects.
A class is a template for objects.
It contains all the codefor the object's method.
A class describes theabstract characteristic of a real life thing.
So they are saying thatit creates something, which we see in the real worldbut in programming terms.
An instance is an object ofa class created at run time.
So when you create or use that blueprint, so just creating theblueprint does nothing.
So you need to, youknow, declare a function, you need to call the functionfor it to do anything, similarly just declaringthe blueprint of a class does nothing in Python.
You need to actually create anobject using that blueprint.
You need to say that okay,let me bring it to life.
This is the blueprint, letme actually construct it, let me actually set it up in memory.
I just have the blueprintwith me right now.
Doesn't have that actual value.
Of course, they could be multipleinstances of a same class.
So to clear the class,what you do is that you use a class keyword.
Class keyword followedby the name of the class and parentheses.
Of course a class, atthe time of creation, can take certain argumentsinside it and we'll see what it does with those arguments but just a very basic definition.
You declare a class, thenumber, the number being the name of the class and thenstatements within that class.
To create instance of aclass, what you need to do is that you need to take avariable, you need to call the name of the class,and follow the pattern, which will create the class for you or create the instance of a class for you.
If you look at this they have print(x), which says that it's a object.
At so and so memory location.
So then it's like a variable.
This is nothing more than a variable then, at the end of the day.
Now, definition of a method.
So self points to the class object.
Instead of self.
Hello, we write ob.
Now right, what is happening over here is the first parameter of amethod inside, so a class can contain method inside it.
First, parameter is that theclass automatically supplies to itself is hello.
So when you call this ob.
Hello,internally Python will call this hello method and it willpass the self object to it.
What is self? Self is this object, ob, that you define.
Next let's talk about scopeof variables when it comes to classes.
The variables are notdeclared in the class within the scope of the class.
They are global variables,likely solve its function, where certain things whichwhere certain variables are defined inside the function.
They were local to thefunction or the enclosure created by the function.
Similarly, for classes,the same concept applies.
Variables which aredefined inside the function are local to it and variableswhich are defined outside are global.
The global variables, asusual, are usable everywhere.
Now if the variables aredefined inside the class they're local and they cannot be accessed without using the class.
B = 60, if you try toprint b, theoretically, it will not work.
Let me show you how.
I'll just comment this.
Suppose I declare a classdemo and it has a variable to variable b inside it.
Outside I have variable b, which is 45.
I can do this anywhere.
So even this class, I canprint b like inside a function or whatever.
But if I try to do this it will not work.
Even if you notice over hereright, let's try to run this.
It says name is a not defined,even though the variable has been defined over here.
However, if we do this,let's see if we can do this.
A where thisvariable has been defined inside the class.
It is not available unless and until you refer it to in this way.
So you really need to see manyof the variables are defined and you get any of this errorthat variable is not defined but you can actually see thatit is present in the code with the self not spelling state.
You need to check if thescoping is correct or not.
Please be very careful.
These are some of the initialmistakes beginning make.
So don't be demotivatedwhenever you see this.
It can be very confusing attimes, but just look if you can access the variable from thatparticular location or not.
Let's talk about attributes.
So as we say that classesare nothing but blueprints of real world object.
Now when comes to real world object, it will have certain property.
So for an employee right,an employee is property such as name, employee ID.
Let's suppose their progresspercentage in a certain project or something else.
That is what really defines the blueprint.
Where when we model theobject from the real world into the programming world,you would be defining its attributes or its propertybecause that is how we represent it.
Now, you may say that hey,this is not a complete representation of an employee.
In the sense that okay, I knowsomething about this person now but I don't know all the thing.
Which is fine, right.
Hey you know those thingsabout a particular person that are relevant to you.
Like if there's a colleague,you may know a lot about their professional life, butyou may know little to nothing about their personal life.
Or you might know abouttheir personal life as well because they're yourfriend or something else.
So in a similar way, dependingon what you're trying to do with that object in yoursoftware, in your code, you define the attributes that you need, not all that is present.
There are two types ofattributes, built-in attributes and user defined attributes.
Built-in attributes,like built-in functions, are made available ona Python class object by Python itself.
So there are certain properties, and you need to think of these.
So one question could bethat hey, if the blueprint of the class is definedaccording to me as an individual, according to my needsas a software developer, why is it that built-in attributes exist? Like how can Python know anything about? So if I am declaring an employeelist, how is it possible that the developers ofPython would know anything about my employee class? How is it that they couldimagine built-in attributes? The answer is that thesebuilt-in attributes usually are there for naming a classand contain meta information about that class or that object.
They're not really somethingthat what you see the class as.
So these are more thingsrelated to how more tools are information that Pythonneeds, rather than what Python needs to run that class or toexecute that piece of class or code rather than whatyou would want to describe.
Then naturally, there areuser defined attributes.
So user being you, as a developer.
So you can replace the word user here with the word developer.
Developer definedattributes that you know, you are using to model the class.
First, let's look atbuilt-in class attributes.
So I did nothing but I'vedefined class edureka and there are certainbuilt-in attributes available.
Now, don't be confused with this.
Why is it underscore underscore? It doesn't mean anything special.
We'll cover this in a little bit.
It's just a way Python writescertain special attributes.
So special in the sense thatit's just a visual recognition.
So it's not that this reallycan be a string or a number in it, the value would besomewhere along those lines or an object.
It is just that as a Pythondeveloper, the moment you see this, you will know that hey, this is not your usual attribute,it is either a built-in or it is something else.
So it just creates a contextwhere you can just look at this and recognize itas something different from a user defined attribute.
Let's run this, see whatthe output is going to be.
What does this do? It contains the name space of the class.
So the name space is veryimportant Python concept.
When it comes to finding files,in terms of Python files, and code execution, you needto know this if you're really architecting a big pieceof code with Python.
For now, we don't needto get into the details of what the name space is,how exactly name spaces work, but just something thatgood to know that we can sort of get the name spaceof the current class.
The next one is.
Now every Python classcan contain a doc string, like a common string explaining the class.
If you want to print that for some reason, okay you want to print the documentation, you need to use this method call.
So it needs to be ina particular actually.
The doc string needs tobe within single quote.
So if you notice over here, it has printed this particular doc string.
Now please notice that ifyou place this doc string anywhere else, if you place itoutside, this will not work.
Let me just show it to you.
There's no doc string, butthe moment I place it inside over here and I run it,the doc string is present.
Next is pretty simple.
It gives you the name of the class.
So __name and edurekaclass name is edureka.
So what I want is thename of the class here.
So edureka name, I can do the same thing.
I need to change it over here as well.
Right, so if you're wonderingwhere you might use this, this will be used if you arereally building some sort of framework or library with Python.
So in the sense that thisis used by large sort of systems that are beingbuilt because that is when these things are sort of neededway more than they're needed if you're just trying to build a website or if you're just tryingto do data prediction or data visualization or analysis.
It is not exactly thatyou will use all of this, but this is required if youare, especially architecting a big solution using Python.
Next is the module that we arein, which is the main module.
So main is something thatwe covered earlier as well, if you might remember.
If name is equal to mainthe beginning of this class and we are calling the functionfrom two different files.
It is just the module whichis calling the particular file and we know the main modulealways calls the file.
Let's look at bases.
So what are bases? So if the class is derivingfrom another class, then bases has a value.
It's just something that if aclass can essentially derive from another class, it's calledthe concept of inheritance, much like either one of usinherits from our parents.
So in a similar way, if theclass is inheriting from another class, that is whereyou use to get the list of these classes, youget and can inherit from multiple classes as well.
That's where you getthis edureka in this code and this code, bases__.
Okay, let's come toattributes to find the users.
Attributes are createdinside the class definition.
Pretty obvious right? We can dynamically createnew attributes for existing instances of a class.
So you can also create dynamic attributes for an existing class.
There's no restriction on that.
You can keep attachingthat as you go along.
There are three kinds of attributes.
There's a private attributewhich can be accessed only inside the class definition.
Then a public attributewhich can be and should be freely used.
Then there are protectedattributes, which are accessible from within the class or its subclass.
Subclasses, what are subclasses? Again, it related to inheritance or basically derivation bit.
Now, we will cover that soon enough.
But just know that there areprivate, public and protected attributes available inside of class.
The way this works is thatif you want to declare a variable as public, youdon't need to do anything.
It will just declare anormal variable on self.
Pub, this is theattribute on the class over here.
If you want to define it asprotected, it needs to start with an underscore andif it has to be private, then it has to startwith a double underscore.
Now, here is the thing,if you want to access it, you can try accessing it like this.
Now the thing here isthat ob.
Pub and ob.
_pro, let me just show it to you.
Let's see what the outputof this is going to be.
So I'm public ran.
I am protected ran, but thisone I'm private did not run.
This one did not run eventhough it is declared.
Python threw an error at us and that's it.
That's all you need to do tomake it a private variable.
Now, it's underscore protected,but how is that working out? Because you told me that theprotected attribute can be only accessed from a childclass, that can only be accessed within the class.
Here's the thing, Python isa little fluid or relaxed with these rules.
This is more for convention.
So as a Python developer, thisis the fact that the variable can only be accessed by aderiving class or within the class it's more for conventionthan a very strict rule that is enforced.
You will find that to be trueeven for the private one, as I will show you thatthere is a way to even access the private variable of a class.
So some people find it veryweird and odd that why is it that you can access all three essentially, but why is it that there the distinction.
It's more on an honorsystem of each other, where Python developers justsort of trying to indicate to others that hey,don't touch this thing, but if you want to you can essentially, but if you're doing thatthen people will notice it.
So experienced Pythondevelopers immediately notice that hey, this person iscaught making this particular call and they're going to question you.
So in case you're workingwith team developers, they will ask you hey, whyare you doing it like this.
So you better have a goodreason for using it then.
Right, so as you saw,Python uses two underscores to hide a method and twounderscores can also be used to hide a variable.
So we saw it for the variableand the same purposes applies to a method as well.
So I can declare a method over here.
And if I try to do this whereI try to, ob.
See, Python has already highlighted this.
There's a problem with it.
It's trying to tell me thathey, you can try and run this, but this will not work.
This will most probably not work.
So let's try that out.
See it did not work.
So yeah, now the big reveal.
Where you can access the private method.
You can access the privatemethod, but you need to do this.
You need to use thisreally, really weird syntax of appending edurekathe name of the class.
So this portion you needto append it to the name of the variable or theattribute of the method and then call it.
So let's try and call it like this.
See this runs.
So even private methods orattributes, they are accessible.
If I try to do this now,ob.
_Edureka and the casing has to be exact.
So _Edureka, as it is, youshould copy paste in fact for if you're justbeginning out with this.
Then pri and then you run it.
I'm a private, I'm aprivate, private method.
Both get executed.
Now class variable and instance variable.
So we've already done thiswhere we have defined a class.
We have set the blueprint for a class.
We are defining an instanceof class, Edureka over here.
You can set the course and you can print.
Class variable is sharedby both instances, object one and object two.
So in the sense thatthere are two things here where if there is avariable like this domain, which is defined atthe class level itself.
It will remain same acrossdifferent instances.
If you look at this one right,ob1.
Domain and ob2.
Both of them are data science,but the name, ob1 and ob2 can be different.
Think of this as a common shared property, a class different instancesof a particular class or particular instanceof objects to the class.
Human beings, if theattribute is number five.
If the class variable isnumber five, the default value will always be like two, right.
So there is no human being which is born, which is genetically asingle eyed human being.
Yes, there are people who are blind.
There are people who areborn with neither of the eyes working, but genetically,all human beings are supposed to have two eyes, not three, not one.
That is going to be fixed.
So think of it in that way.
It is that particular classvariable which is being shared as an attribute across different instances no matter what instance is.
Okay, next talk aboutconstructor and destructor.
The constructor is implementedusing the ___init__ function.
You can define parameters thatwill follow the self object.
The destructor is definedusing the __del__, third name.
It is called when the object is deleted.
So think of constructed asbuilding constructor really, that does the construction of your object.
You labeled a class whichbasically is the blueprint of your object.
Now to create instance of that object, the constructor will need toswing into action to actually make that object.
Similarly, the destructorwill basically come into play when you delete that object.
So think of like a skyscraperwhich is being constructed and which is constructionwould need certain things to happen, right.
Destructor would need certain things to be cleaned up and deleted.
Now, in it basicallysample initialization.
That's initializationand this is deletion.
So let's look at an example.
What we have here forexample, this is constructor, which is the privateprotected and public methods.
Public attributes being set on it.
Similarly, I can define a destructor, where I can say, let'ssuppose, I can set it to none.
Now let's talk aboutmultiple constructors.
Multiple constructors arebasically used in case you want the same kind of blueprintof an object that is the same kind of class in different ways.
You only initialize it different ways.
The way you do it is by usinga decorator, such as add to it class method on a particular object.
What this decorator doesis that it passes the CLS of the self object to thisparticular thing as well.
So for example, let'ssuppose I wanted to create another constructor.
I would declare a class method.
This basically take CLS, whichis pretty much like self, and let's suppose this time I want to.
So if you notice over hereright, what has been done is that you need to callthe particular method at the time of creating the instance.
What I'll do is that I'll create an object ob = Edureka.
MyCustomConstructorand pass it a value of 400.
Normally I would have createdan object, if you talk about a default object, create it in this way.
If you need a custom object, it would be created in this way.
Now, certain key concepts withinobject oriented programming are abstraction, encapsulation, inheritance, overriding and polymorphism.
Let's look into these one by one.
Abstraction, abstraction issimplifying complex reality by modeling classes.
So what this means is thatif you look at a fan, right, you don't know how it works internally.
Most of you wouldn't knowhow a car works internally or how an engine worksor how the horn works, but you just know how tointeract with it, make use of it.
If I talk about somethingas simple as this statement of print hello world.
Does it work? How does it work? How does it print somethingon the console here? I'm pretty sure most of youwill not be able to answer it and even I would not be able to answer.
In the sense that it wouldrequire for me to look at how this has been executed.
So the details of how this ishappening have been sort of abstracted away from me.
I need to really openup the Python libraries and the Python code baseto understand how this is making something print over here.
So when it comes to learning Python right, if we were to start doingthat for each and every thing we would never end upfinishing learning Python.
We would never be able tolearn anything for that matter or be able to learn to use anything.
So abstraction needs tohappen and it happens in every programming language wherecertain things are working in a certain way but you don't know.
You don't know till the pointyou need to do something out of the ordinary with thatparticular method or object or whatever it is thatyou're presented with.
So in case you want to repaira fan because there is no repair person around and you know, or it's just too expensiveto repair a fan or a car, that is when you willend up learning about it.
Not when you just need to turn on the fan.
Encapsulation refers to combiningthe code into interfaces.
This includes writingmethods, creating variables, using classes, any sort ofenumerates, creating a list.
We are encapsulation.
We are collecting it ina more transferrable way for it to be reused inmultiple scenarios where it is flexible in that sort of way.
Inheritance refers to the factthat there is a relationship, there is a parent childrelationship between most of the things that you see around there if you look at the vehicleright, there's a parent child relationship where bikesbelong to the category vehicle of cars belong to the category vehicle, buses belong to the category vehicle.
Probably cars, buses andtrucks could have another subcategory after vehicle,which could have been four-wheeled vehicle.
The subcategories underneath.
Same goes for bikes.
Inheritance is somethingvery naturally occurring and same is available to usto use in our programming to model the real world,because that's what software is all about, after all.
Now there are multipletypes of inheritance.
One is a single inheritancein that there is a parent class, there's a child andthere could be multiple inheritances that a class seesas inheriting from A and B.
Then there could bemulti-level inheritance.
This is like a parent anda father and a mother.
This is like a grandfather,father, grandson.
So B is the father ofC, C is the son of B, or B is the son of A andA is the father of B.
Let's talk about single inheritance.
Basically what inheritancedoes is that it allows us to extend the propertiesand methods of one class to another class.
It is if one blueprinthas certain properties and certain things whichhave modeling a certain, which have modeled the world around us.
Where an object, for examplelet's suppose and object definitely has a weight in the real world.
Doesn't exist an object thatyou would call an object in the real world thatdoesn't have a weight, or doesn't have a breadth.
Those are common properties,but then there are properties that are unique to a certain object.
Such as, let's suppose thecolor, the color of the object might be different for different objects and may or may not existfor certain objects.
For example, if we considereven water, like a mass of water as an object, then the waterwill not have any color, 'cause water is essentially colorless.
The color propertydoesn't (voice cuts out).
The other properties still existfor this particular object.
The way this works is thatyou can define a base class and you can define itsattributes and its method.
Then you can define another class, which inherits from the base class.
The way inheritance works isthat you just need to supply the name of the class to the child class now, when you create aninstance of the child class and you call the function fun,even though it is not defined over here, but it is definedover here, this gets executed.
So let's try this out andsee if this works or not.
So in class base one.
Now I can always haveanother fun class over here and it can run this one instead.
Now I'm in class sub.
So the way this works isthat it will first look for the class namewithin the image of class and then it will go upthe chain of finding it for the parent classes.
Let's talk about multipleinheritance, where you have one class which is beinginherited from two classes.
So let's look at this one first.
There is the first object,there is a second object.
Then there is a thirdobject which is inheriting from this first, second,and the first one.
When the inherit method is called, let's wee what gets executed.
Right, so when we are doingthis we are also calling the super method, whichis calling this one.
So this is basically sayingthat call my superior.
Call the class which is sortof, which I'm inheriting from.
Because it is inheritingfrom both second and first, both of these methods get called.
First gets called andthen second gets called.
Of course, even the third one is printed.
Then of course there ismultilevel inheritance.
Multilevel inheritance, the way it works, is that a single class,as in the grandfather, the father and the child.
So you have an animal classwhich can obviously eat.
You have a dog class, which will bark.
Then you have a baby dog, which will beep, which has a beepclass.
So you can create an instanceof the baby dog class.
You can call d.
Eat onit, d.
Bark or d.
If you see we wrote less code,otherwise we could have had to define these piecesof code again and again.
This is a very, verypowerfulconcept that is available to you in programming thatyou can reuse pieces of code again and again through creative editing, thereby creating assumptionsby wrapping them within classes and creating these sort ofrelationships between them.
Okay, so method overriding.
Now this is something that we did even for in the earlier example whenI showed you classes, right.
I had shown this to you whereI could override a method but let's look at another example.
So if you look at this,this is a parameter which is myMethod and then there's a child, which has another myMethod.
Now I can create an instanceof to child variable and I can call myMethod on it.
Now even though myMethodis present in parent, the child one will get called.
Let me run this for you.
So child method was called.
See how we did that? This is another limited example.
You have a class rectangleand you have a class square, which is inheriting from rectangle.
Now when you call on getarea, recall the one which is present inside the squarebecause we want to print it in a certain way.
A is the area of square,so this will not naturally recall the get area of rectangle.
It will print a of rectangle,which is not incorrect, because every square is also a rectangle.
But then we don't want to do that, right.
We want to call get areaof square and that is where a bit of overloading oroverriding will help us out.
We have the child classoverrides a parent class method.
Polymorphism is the abilityto leverage the same interface for different underlying forms,such as data types or class.
Now in the sense that amillionaire and a student both have to pay the bill,but maybe the way they pay the bill is going to bedifferent and paying the bill changes based on what is being called on.
So polymorphism basicallysays that the same thing or same object can behave differently in different scenarios.
If you are a student, you maypay a bill by taking it from parents, but if you're amillionaire and you're paying your bills, you might paythrough the credit card or you might pay throughyour debit card, you know, in two different ways.
So since that paying billis something that is common to a person, right, butthe way it is done it might change based on who is paying the bill.
Look at this one, right.
The class cat is beingderived and the class dog is also being derivedfrom the animal class.
If you observe that bothof them are the top method.
Talk and d.
Talk will behave differently.
So polymorphism I mean of course,now you what you have done is that you have used theconcert method overriding as well earlier, but animal.
Talkis present on all animals.
However, and cat essentiallyis also an instance of the animal object.
This is polymorphism is becauseessentially both cat and dog are animals, but when call talk on it, both of them behave differently.
So same kind of attributeare being called, but the behavior is differentand that is polymorphism, where you have multipleforms of the same object.
So there is also called thegetter and setter methods.
These are more or less ina programming convention.
So you can directly accessthe attribute so (mumbling) of a class but then you canalso use getters and setters, where you set values and you get values.
This is basically moreof a conventional way that these are used, thesemethods are prevalent or used is because a lot of times youmay need to modify you data before you're using it.
You may need to youknow, add symbols to it, or do some modificationbefore you're setting it and you just don't wantanybody to be able to do it.
You want it to be donein a safe and secure way.
Maybe this object isbeing set by the database and you don't want justanybody to access the database just like that.
You want the access to happenthrough a proper defined channel because it iscomplicated otherwise.
That is where you willuse setters and getters.
So let's have a look at an example.
Class edureka, we aresetting the course name.
We are creating an object of edureka.
We're gonna print, get course name, then we're going to set the course name.
So we could have alsodone this, by the way.
The alternate could have been this right, but let's suppose you hadto use the upper case, you had to upper case it.
Every single time you had toprint only the upper case one.
This is a transformation thatyou have to do on the data.
When you're updating it,if this is to be done every single time, it would becumbersome to write.
Upper every single time and makeit (garbled speaking).
Or the same thing would apply to this, where when you return thecourse name, you're always setting it to upper case.
So that is why you woulduse getters and setters.
See how we did that.
(pleasant instrumental music) Now we are done with loops.
We have had a basic introduction to loops.
Let's move over to standard libraries that are available in Python.
Standard library is a collection of tools that come with Python.
It includes the following,it includes the built-in functions, the modules, the packages.
So some of them we havealready been using.
Let's look at some others.
Modules used in Python.
So module basicallyallows you to logically organize the code.
It could contain classes, functions.
One simple thing is thatI have a utilities module.
Or you can have a DV module,which does all the DV interactions for you or something else.
So any sort of logical wrapping of code, logical being not ones andzeroes, but logical for you.
Okay, this chunk of codeshould go in this file.
This file deals a databaseand this file deals with, let's suppose PDF and thisfile deals with file handling and this file deals with this.
Let's even divide up thecode into logical breaks and then you call it a module.
Finally, they're nothing but Python files with a.
So it's not that they'redifferent from any of the other Python, it is just thatthey're called modules because they're organizedin a certain way.
And as I said, a module isa file containing Python definitions and statements.
So when we use import, wheneveryou have been using import, you tell Python to load a module.
When the Python interpreterencounters an import statement it imports the moduleif the module is present in the first part.
So it can locate the modulewhere it is in the same data tree and the path has been given.
Same data tree, samefolder as the current file.
It will end up importing that module.
Now see, there are twothings, one is that you have your own file or your ownmodule that you have written, which you're importing.
Or you're importingsomething from the Python standard library, whichwill always be available in the search path.
Third is a built-in functionwhich returns a list of strings containing thename defined by the module.
It gives the list of variables functions defined in that module.
So every file in this wayis a module and it has to what dir does it that itgives you all the variables, everything that isavailable in that module.
So certain hidden things, suchas underscore underscore doc, underscore loaded, all of these are also by default available.
Let's have a look at this.
You can close all of these.
So let's suppose I do import math.
One, b is equal to list.
C is equal to and I call the dir.
Let's see what the result will be.
See, a, b, c, and math are present.
If you want to particularlycheck what all is available inside math, you can this.
This sort gives you a lot offunctions that are available in the math module.
A lot of functions, lotof variables that might be and these can be a little confusing.
What are these underscoreunderscore kind of variables? But you just need to know thatthey're internal to Python and Python uses them toorganize and run code.
It is metadata aboutcode that Python uses.
The import statement allows you to import specific attributes from specific module into the current name space.
It works like import butit allows you to import specific things.
Where if you look at thisright, we imported so many of these functions.
What if you do not needall of the functions? What if you only needcosine or 10 or square root? Why wouldn't, can we do that? So what you can do is thatyou can import from math, import, suppose we import square root or let's import cosine.
Now let's do this.
Dir of this particular level.
Let's see what the result is.
So if you look at thisnow, it's not imported the entire math module.
It just imported square root.
If you want to use any otherfunction that's available in the math module, you won't be able to.
If you want to use all thefunctions in the math module we need to use import math,which will import the entire module, but here we are being specific.
We are saying that hey, go tothe library and get this book.
Or we are saying that from this book, get this particular page.
Import math is like importingthe entire book of math, which is very, let's suppose, big and fat.
So you don't want to import everything.
But this is more efficient,where you want to import specific things.
So it is recommended, thisis actually a better practice that you import specific things, only the ones that you need.
Because what happens isthat they get loaded into the computer's memory and ifyou have a lot of unnecessary functions lying over there,it can cause a problem and trouble dealing with them.
There is also the import*.
So you can also do this.
From math import*, which ispretty much same as import math.
However, this is not recommended.
This is not recommended because, I mean, it's pretty much the same butyou're importing everything.
Consider the scenario wherelet's suppose a variable has been defined in the math module, which is the same name asthe variable you are using.
Now I know the chances areless, but then the more number of files, the more number oflibraries you're reading with, the chances of collisionwhere math module, which is another file, has avariable with the same name as you are using.
So you really want to be carefulabout what you're importing as it might cause unexpected errors.
Now, these are not errorsthat you will see immediately.
These are errors thatyou will find at run time because the output will not be correct.
So just be careful about howyou use the import statement.
The most responsible wayof using it is through this particular way.
The reload function basicallyis a very nifty tool.
What it does is that itreloads a particular module.
So let's suppose you'reworking on two files and you have mademodifications to a certain file and the other file is already running.
You want to just reloadthe module that it runs the fresh code.
It will just reimport theparticular module that it's being asked to reload.
Now, these are some of thevery, very important modules available in Python.
The sys module, the Osmodule, math module, datetime, random ordual enter, JSON module.
Let's learn about someimportant modules in Python now.
The Sys module, so the Sysmodule basically what it does is that it avoids theaccess to some variables used to maintain by theinterpreter and to functions that interact stronglywith the interpreter.
Strongly being thatthey're closely related to the Python interpreter.
Now, the way it worksis that you import Sys and some very common thingsthat you can do with it is that sys.
It stores any command lineargument that are passed when you start Python.
So let me show you an example.
Now let's suppose I go to the terminal and I execute this file.
I pass an argument.
If you notice, this sys.
Argv,it takes in the file name that has been executed.
One, two and three are strings.
So any of the parameters thathave been passed has strings so it can also do this like.
So this is whenever youare executing a Python file through the command lineand you want to pass some arguments to it.
Exit does is thatit basically shuts down the terminal.
It shuts down the execution.
So nothing different, Imean that happens anywhere, but syst.
Exit might besomething that you use when you are looking toexit from the execution.
Let's suppose you arerunning Python file through the command line and youwant to end execution after a certain point basedon some error conditions that have happened.
That's where it will tellthe interpreter to quit.
Another thing I can sort ofshow you is if I do this, when I say import.
Sys and syst.
See, it's closed the terminal.
It closed the Python interpreter.
Now some other importantthings in Sys, this is.
So this will not work on mymachine because I'm in a MacBook but if you were not in aMacBook, if you're in a Windows system, this will give youthe version of Windows.
Let's run and see whatoutput I'll get for this.
Yeah, it will throw an errorfor me because I am not in a system which is Windows.
So this will not work, butyou can try out the other one.
Let's see what sys.
Flags is all about.
It gives us some flags.
What are these flags andlike, these flags are set to zero or one.
It exposes the statusof command line flags that are available.
Now these flags areessentially how command line sort of works and it basicallygives you those values of the command line.
Let's look at sys.
Prefix basicallygives you where is it that the work environmentor the Python path lies.
If you have set up whichenvironment or if you have the Python installed, itwill basically tell you where the Python isinstalled in your system that is being used tocurrently run this program.
Okay, now the thing is thatsys module is more useful again, when you're interactingwith the operating system or you're interactingwith the command line.
You will not use it alot if you're using it for data science purposes,or if you're using it for web development purposes.
This is more useful if you're dealing with the operating systemthat you're working on.
Let's go to the os module,so os is also related to the operating system.
It let's you do certainthings such as directory file manipulation.
So it allows you to create a directory.
It allows you to make a directory.
It allows you to remove a directory.
So let's try these things out.
Okay, so what you'll dois that you will start with some of the things available to us.
First is os.
I'm in a posix system, which is basically what MacBooks are on.
The name of the operatingsystem is not MacBook, I mean that's the marketingname, but it is posix.
Then the os environ, whatit does is that there are certain environmentvariables that are stored on any computer.
This is true for even Windows,where certain environment variables, environment beingthe entire operating system, it has certain variables set.
This gives an entirekeymap or a dictionary of the environment variables.
Then there's something calledget login and then there is a get ppid function,which gives the process ID of the current function.
If you look at this, it tellsme the name of the operating system, dependent module imported, returns the mapping of objectrepresenting the string environment, returns thename of the user logged in on the controllingterminal of the process.
The user which is loggedon this terminal is _mysql has got nothing to do withDVS, it is just the name of the user on my system.
It returns a parent's process ID.
Getppid it returnsthe ID of the test switches running this particular Python file.
Now the other ones that yousaw on the previous slide, so we can also say get thecurrent working directory.
Let's do that.
As you can see that I'm onthis particular directory right now, Python Edureka,DS mod three, DS mod three.
I can also make a directory using this.
I don't need to exactly printit, so I know this is shown as print, but let me make adirectory without using print.
I'll just use a directoryover here with the name test.
If you notice, a testdirectory has been created.
I can change the directory now to test.
I'll just comment it outso that we don't create it on the directory.
We have changed the directory.
I can print os.
Cwd is current working directory.
Python knows wherewhich directory it is on at any given point.
Now that you see thecurrent working directory is this one.
Now if I try to, let's suppose,import any of these files over here, it will not work because I'm in a different directory.
Finally, I can remove a directory as well.
For that what I'll do isthat I'll just give it the name of test.
If you notice, this particulardirectory disappeared.
Right, so the os pathmodule is very, veryhelpful.
It allows us to do a coupleof things to the current path where we are on.
Why is this needed in Python? One thing that Python uses theos module or the path module internally is to find the files.
Whenever you're doingany sort of module import and you know, you'retrying to import functions, in-built functions oranything of that sort.
Python relies on these methods internally.
It definitely is sort ofgood to have an idea about how Python is dealing withthese things internally, that it's not just magic.
Just code which is writtenby other developers and internally it is finding out and doing all of those things one by one.
So let's look at os.
What it does it that ittakes one or more paths, joins them by using thecurrent operating system's path separator, whatever itis, whatever the path separator is, and it basically givesyou the return of it.
So let's try this out.
I'll just change it to usersapple and users apple shared.
Let's run it once again.
Right, so it joinedthe two paths together.
Let's suppose they werecompletely different.
Let's see what the output will be then.
Okay, I hope this is clear to everyone.
Let's look at what absolute path does.
It takes a relative path nameand returns a corresponding absolute path.
What we can do it that wecan just give it DS_mod.
So what where we are is DS_mod3.
Let's see what we get by running.
If you notice that it took DSmod three, which was present within its domain, andit figured out that hey, this is the backward part of it.
This is the absolute path of it.
You can also try it on a file name.
Let's suppose we try iton this file, condr.
We run it.
It automatically found where this file, the full part of this file.
Let's write out for somethingwhich is not within.
Let's try it out for something over here, where we use info.
What absolute path doesis that it takes a current working directory.
So if we take thecurrent working directory and add the string that youare passing inside over here.
That's all it does.
It will get the currentworking directory like we did over here, just going to with this path, and it's going toconcatenate it with whatever you're passing to it.
Now, normpath, it convertspath names from a nonstandard format to a standard format.
So what is the standard format? Let's have a look.
Let's again reference the os module py.
Okay, this is the normalized path.
Now let's take this.
This is the normalized path for this.
Now let's suppose we give it this, let's see the output then.
Okay, this is the output.
This one is for this.
Split, what it doesit that it takes a path name and returns it in twoparts, the directory part and the filename path.
This is very useful.
Let's try this out.
Let's take the entire path,give it to os.
You could see that the firstpart of this is the directory and the second part isthe actual file name.
First part is the entire directory and the second is the file name.
So with os.
Exists, verysimply tells you if the path exists or not, the particularfolder or path exists or not.
We can try it on this andthe value will be false because I don't havea folder of this sort.
But if I try it out on thisparticular thing right, which does exist.
So let me try it withthis, class demos, right.
This should exist and thevalue is true that yeah, it actually exists on this system.
The next thing is that itchecks whether there are certain things that did exist or not.
So if I pass it a file name, it should say that it's not a directory.
It is not directory,because it's a Python file.
Now if I say this and I ran it, yeah, then this is a directory, over here.
Now os will not work, whatit does it that it walks or traverses the path that give to it.
It gives you, let's seewhat it gives in return.
So it gives you generatoror it gives you an iterator.
So let's actually use it in a better way and see what it actually gives us.
Right, so it gives us a bunch of things.
It gives us a bunch of files actually.
It lists the entire set ofdirectories that is present in class demos/python/edureka.
So probably it's a bit toomuch so let's go back a little and try over here and see whatwe can get out of this one.
Just be only in DSmod3, whatis available in this directory.
It should only be the filesbecause only what happens why you are getting somuch output over here is because there are a bunchof things under Python edureka.
This folder is huge.
So instead, let's try this.
Now, this is a little better.
Idea, which is justa system file, ignore it.
There's something called a pycache.
That's related to Pythoncaching something.
But yeah, if you look atthis, this is the list of all the files represented over here.
So what os.
Walk allows youto do is it allows you to go into folders and navigate or navigate to the different folders present on your system Python files.
Let's talk about the math function.
It provides access to themathematical functions.
Let's look at a couple ofimportant math functions.
Okay, so math dot ceiling,let's run this one.
Right, so ceiling is it rounds to the highest number possible,which is 11 in this case.
That's what ceiling does.
Copysign what it does is thatit copies a sign or whatever is on the right inside tothe left inside variable.
This is still better wayof doing it rather than just multiplying it by a negative one.
And of course it takes of complex cases, such as complex numbers orany other kind of system.
Copying the sign is non-trivial.
This will end in a positivevalue of the negative number that has been given.
If the number is positive,let's see what is output.
The output does not changeand if it's this, an integer, then so this is f absolute,which is floating absolute.
This is not, this converteven the 19 to 19.
0, as you can see.
Then we have math.
Exp, whichbasically is e to the power gives you the result of eis to the power of whatever you pass to it.
Then we have expml, which returnse to the power x minus one where x is two in this example.
This is log, so math.
Log 10of 10 return the logarithm of base 10 to 10.
Now in case you're not aware of e or log, it's just like sort of mathematicsthat you need to divide, but these are nothing complicated at all.
You can just give in theinput and get the power and logarithmic values that you want.
Similarly, we have sine and cosine.
You can get acos, indegree of an angle degree.
Then you get asin, atan, cos.
You can get all the values available here.
So as you said that Python hasa wide variety of use cases which include scientific use cases.
This is why this is in-built in Python, where you know, as comparedto let's suppose C++, where you need to importit and you need to have it or some other languages where, you know, you might need a special module.
All of this comes in-built in Python.
There's in-built support forall of this because Python is used in domains suchas science, particularly.
A lot of scientists use itso that it why these things exist out of the box in Python.
Similarly, there are Pythonmath, there are angular, hyperbolic functions, youcan convert math to degree, math.
Radians, you can get the value of pi, the constant value of pi, you can get the value of e as well.
Okay, let's talk about the random module.
Random modules, sometimesyou might need a random value in a certain range or youmight need a random number for whatever reason.
This is where the random module comes in.
So let me show you an example.
You import the module and randrange.
What randrange does itthat it gives you a number within the range of zero to 100.
So let's run this.
Okay, it gave you a value of 52.
Every time you run it we'llget a different number.
What randrange does isthat it gives you a number between zero to 100 butwith steps of 20 over there.
So start, stop and step.
It'll start from zeroand with steps of right.
So if I need to get betterresults I should increase this upper limit.
Now all of these are goingto be sort of multiple of 20.
It's not that it is goingto give us multiples of 20, but it is moving insteps of 20 every time.
Then there is randint, between zero to 30.
So let's try this out as well.
Which one to use dependson really the use case.
You can just go aheadwith either one of them.
So what does random.
Getstate do? It's another method available to us.
It returns an objectcapturing the current internal state of the generator.
So what happens is thatPython somehow maintains this concept of a generator,which maintains what random values have been generatedand can actually truly randomize the next value.
It gets the state of thecurrent generator in its current form, which is just an object.
Next is random.
Uniform, itreturns a floating point number such that N is between A and B.
A and B are the valuesthat you are passing to it.
So this number would bebetween three and six.
So again, it's like sayingthat any decimal value between three and six, generatethat and give it back to us.
What are the real life applications of generating random numbers? It could be for generatinga random pass phrase for a banking system on the internet.
It could be for video gamesthat you developing in Python that are, that randomlythe monster should appear or the spaceships shouldappear for the person to shoot them down.
Or it could be for simulationof any random event or seemingly random event like a rainfall.
This is where it is going to be.
There are similarly severalother similar real life applications where youmight need a random.
But I mean, if you'rebuilding a casino website, let's suppose, and there's arandom number to be thrown, there's a dice to be thrown.
You are building a roulette.
A roulette is basically oneof those things that you see in the casino where youcan just throw a ball and it will land on a particular number.
Now, if you're building anysort of a chance sort of game with Python, you willneed a random variable.
Or even if you're buildinganother probabilistic theorem.
So let's suppose you arebuilding machine learning you're implementing an algorithm.
Often machine learningrelies on random numbers in terms of a certainlanguage, in terms (mumbling) that's where you mightneed the random module.
Next is the date time module.
For date time moduleincludes tools for working with dates, times and combinations.
So let's have a look at someof the methods available.
Maxyear, what does it do? Let's run it and see.
It gives us the maximum yearthat is available to us, the minimum year that isavailable to us on the system.
Time basicallygives us the time object.
The max year is double nine double nine and the min year is one.
It returns the time object with r minute, second, and microsecond.
This one returns a time zone object.
But this might not work on all systems, as you have seen justnow, that it did not work on my particular system.
That module has no attribute time zone.
Just be careful withwhere you're using this.
It works on certain systems.
It does not work on certain other systems.
Next is a Json module.
The Jason module provides aneasy way to encode and decode data in Json format.
Let's have a look at this.
So Json is basically, it lookslike a dictionary in Python but essentially it stands forJava script object notation.
It is something that isused a lot on the internet these days through transferdata out, to store data.
It's very, verypopular these days.
It's not that you need to knowJava script before using it, it's just a format for data.
Like MP3 is a format for storing music.
Similarly, Json is aformat for storing data in.
Yeah, let me just find the file.
Okay, and just write it here.
So first you need toimport the module for Json.
Then let's suppose we create the data, which is a dictionary, and thenI can create a Json string, which is json.
Dumps isthe method available.
I need to pass the data to it.
Then I can print the Json string.
Let's run this.
So you might say hey, thislooks nothing different than what I printed over here.
But trust me, this is actually different.
This is different becauseif I do the way Python looks at it is going to be very different.
Right, so if you look atthis, the type is string.
This is not a dictionary.
It can look like a dictionarybut it's not a dictionary.
If I try to do this,yeah, this will not run because string object doesnot support item assignment.
Okay, so it's a strong object.
Let's talk about regular expressions next.
Let's suppose you wantto find a particular data in a large text of data.
So you want to find, let'ssuppose, all the email IDs that exist in this entire text of data.
You want to find thename of certain colors.
You want to find certain words.
You want to find how manytimes does the word flag appear over here and replaceit with the word flags.
So any of these things,right, where you need to match for certain patterns orwords and either replace them or find the number ofoccurrences can be handled very easily with regular expressions.
Or let's suppose you wantto verify email addresses.
So this is something thatyou see very commonly, right, that when signing up fora website, it tells you automatically whether theemail ID that you have provided looks like an email ID or not.
So the thing is that thereare so many patterns, so many ways in which email ID exists because there are like,websites which end with.
In, so on and so forth.
So how do you make sure thatan email ID actually looks like an email? How did the company figure that out? That's where regular expressions help us.
What are regular expressions? A regular expression isa special text string for describing a search pattern.
So it will describe a searchpattern, the pattern by which we will look for amatch in a given string.
You can do that as ahuman being very well, but how do we tell the computer to do it? The way we tell thecomputer to do it is by telling it for certainexpression and asking it to match it to a given string.
So you find a word in a string,you generate an iterator, match one or several of the letters.
This is done internally,thereby generating an iterator, and then it matches one orany of the several letters based on the input that you have given.
Or it matches a seriesor a range of characters.
Then you can probably replacethe string or you can match a single character.
So Python has the remodule available for this.
It has several methods available on it.
One of them is sub.
Let's have a look at this.
Okay, so what happens overhere is that read or sub now this over here iswhat is called a regular, this is what is calledas a regular expression.
This is what we meant bydescribing a search pattern.
This is a search patternand what it says is that match A or D.
So whenever the way it iswritten it that you write a, whenever it's somethingbetween square brackets, and let's suppose you giveit the characters to match.
Within square brackets,anything is an optional one.
So if you're going to match for A or D within this given string.
So it's going to find A or D.
It is going to replace them by stars.
So let's run it and see if weget the right output or not.
Okay, so if you look at this one right, all the A's got replacedby star and all the D's also got replaced by a star.
Let's do another one.
Now you want to match the pattern abc.
So this is not a or b or c,this is, this one says that it has to be abc in that sequenceand replace all the abc's.
So it replaced abc witha star here and here.
Let's compare it to the first one.
You see the difference? If you did, let's suppose Aand D, and let's see then.
Nothing changes becauseA D is not in a sequence.
But if we put in squarebrackets, then here we get nearly the same output, right.
Now this is another one.
This says A or B or C andthen one or two or three.
So let's try this out andsee what output we get.
A one gets matched, B two gets matched, B four has no match becausethere is neither D near things are four.
Let's suppose we had D.
Even then it would not bematched because there is no four.
But if you add a four overhere, yes then you get a match.
So first, it says thatfirst character should be either an A or B or C or D.
The second character inthe string should be either one, one or two or three or four.
Let's do another one.
The thing is that this is again, like one of the simplerones but then there are complicated ones as well.
There is this dot, the regular expression, has the o notation.
Has an o notation of howyou describe a pattern.
Let's see what we get with A.
Okay what is A.
B doing? It says that basically match any sequence.
A, which starts with A, capitalA, then add any character.
So dot stands for anycharacter on the keyboard and then B.
So A to B gets matched.
AXB gets matched andgets replaced by a star.
AXXB does not get matched.
A dollar B gets matched.
If I were to add anotherdot, then what will happen? Then none of the first threematch, only this one matches.
This could literally be any character.
It could AB in itself as well.
But it could be dollar in thisone and it will still match.
Square brackets, dots,all of these are notations and now what is AB+? So it says A and thenB in number of times.
Let's run this.
It matches A and then B once.
So that gets replaced by a star.
Then A and all of these Bsget replaced by a single star.
Then A, see, this is left untouched because it doesn't match.
Getting it, so this sayshey, find for a string within substring within this one.
Sub being that find anymatching patterns which are the first character is an Aand the second character is B+.
What does this do? This says AB but B betweenthree to six times.
So when it is curly braces,then it is saying that B has to be three to six times.
If it is less than three times,then it will not replace.
So ABB doesn't get replaced.
If it greater than sixtimes, it will not replace.
It will replace only tillthis point, to the first six characters then the last threeB characters get left out.
It matches this and itmatches this naturally.
This is a very important one.
This is for matching anythingwith starts with ABC.
The starting portion ofthe string which starts with an ABC gets matched.
Notice that this is not matched.
Whenever you have this uppercap, this caps operator over here, it will look forthe starting of the string and it will start replacing it from there.
It will not look at the end of the string.
If you want to match atthe end of the string you need to do it as ABC dollar.
Naturally, if you wantto match either way, then you just keep it asABC and then it will replace both of the starting and the end.
But only at the end youneed to put in a dollar, where to start looking fromthe back of the string.
Upper cap, it will start doingfrom the front of the string.
Similarly, there aremany more of this sort.
Like there is something ascomplicated at this one.
This is a little big in termsof how, what all is present.
The best way to get betterat it is basically by finding more and more regular expressions and practicing more and more on it.
Then we just did asubstitute but there is also search and match, whichtells you whether it can find a certain thing or not.
It just gives you a match object.
If this was, let's supposeabc, and you did this, it would give you none,because it could not find abc within the string.
So you just confirm thesearch object to find a match, but do nothing to it.
You just confirm that okay, this string exists within this string.
Then there is a match object.
Okay, so what does the match object do? Match looks for a match starting at the beginning of the string.
Search looks anywhere butmatch looks at the beginning.
So this is similar to in fact, I'll prove it is similar to saying that.
It depends on you howyou want to write it, whether you want to write itusing re.
Search or re.
Search is usually usedwhen you expect multiple for a word to occur multiple times.
Match is more at thestarting of the string.
So if you were matching anemail ID, you would use re.
Match but if you're looking forthe word blue in a particular text document, then youwould use re.
So you can use it to matchemails IDs or mobile numbers.
This is like a very,very typical application of regular expressions.
Let's talk about packages in Python.
So a package is a collectionof Python modules.
We learned about modules, whichwas nothing but Python file.
But packages are basicallya bunch of modules logically combined together.
Python package allows youto break down large systems and organize their modulesin a consistent way that you and other people can useand reuse efficiently.
Again, if you look at programming,there is a lot of stress on re-usability of things.
Whether it's functionsor classes or modules or packages, a lot of thingsare about building blocks and combining them togetherand making them work together.
So we have done this, right.
We have from backpack, import,pen, papers, calculator.
We have sort of coveredthis particular bit where we have learnedabout the import function and the found one.
Let's talk about exception handling.
Exception is a signal occurswhen an error or unusual condition has occurred.
A very classic exampleof an exception is when you try to divide by zero.
Now, mathematically, thescenario is that programmatically you haven't written anything incorrect, but mathematically this can't be done.
So what Python does or anyother programming language does is that it throws an errorsaying that zero division is error, it is not allowed, which is an exceptional scenario.
It is not something which is incorrect, but it is an exception which has occurred because of something failingat a really fundamental level.
Fundamental level being somethingwhich is just impossible or unexpected scenario.
Unexpected scenario in thesense that an earthquake is an exceptional scenario, right.
Think about exceptionalscenarios when you think about exception handling.
Now, you need to kind of take care of these exceptional scenarios.
You need to anticipate themas a programmer and make sure that you are handling themwithin your code base.
This is very importantfor a graceful degradation of the software.
This is very important forshowing the current kind of error to the user and it is an overall, a very, veryrequired codeprogramming practice.
So the syntax and Python is try.
To try says that so whathappens the try block is that you write whatever you are trying to do.
Let's suppose you're trying to divide.
So it will come under the tryblock as defined over here, with an indentation, of course.
Then you will handle certain exceptions.
Let's suppose when you'retrying to divide two numbers, you expect three kinds of exceptions.
One exception could be thatthe user might try to divide by zero or let's supposethe user might try to divide very large numbers or thedenominator might be so small that the number is solarge that it cannot fit in your computer's memory.
That is possible.
Like, computers have their limits.
Like probably our computerscould write up to in hundreds of thousands of trillions,but it will reach a limit where it cannot store a numberlarger than a given number.
Now, all of these thingsare different exceptions.
It is better than you dealwith different exceptions in a different manner.
For example, dealing with avery large number would be very different from dealing with the user about a zero input.
So probably if the number istoo large, it could still, you know try and figureout how to represent it in a different way and stillgive the user a solution.
But division by zero, youmight just tell the user hey, that you gave a bad input.
So don't confuse thiswith a if else at all because in the sense thatI have seen people use this instead of if else at timeswhen things could have done through an if else statement, but these things, thisis to be used more when you are expecting something unexpected.
Yeah, so I know it's a headspinner, but you are expecting, you're anticipating.
So rather than expecting,I think I should be using the word anticipating.
We are anticipating that hey,this thing might go wrong.
I might get a bad input solet me just make my program malleable enough to kindof handle that scenario.
So a try clause can haveany number of except clauses to handle them differentlybut only one will be executed in case an exception occurs.
So we'll raise the valueerror and only one will be executed, the first one that it finds.
You can also do somethingof this sort with you have to find multipleerrors which have been handled at the same time.
The typeError andzerodivisionerror are being handled in the same way.
Let's suppose the userinputs a string and you want in both cases, the userto be told that hey, you have given a wrong inputas compared to a value error, where something elsehas gone wrong, right.
As we discussed, so let'ssuppose this is more about the large number errorand this is the type error of the zero division error.
Then there is except, whichis for handling any unexpected or unanticipated exceptions.
This is even one that willbe on when you can't even anticipate, but you stillknow that this might break because it's a contextis that you're doing this for mission critical applications.
So let's quickly look at an example.
So we're trying to geta input from a number.
Now Python gives us the raise keyword, which we can use to raise an exception.
What we are artificiallyraising an exception.
This is not how it is usuallydone, but we're artificially raising an exception just sothat it can go into this block.
I think we should lookat this example instead.
Let's suppose for entry and randomness, we are trying to divide.
As I said, you can get an expected input.
So let's see whathappens when we run this.
Right, so if you look at this,the first time it executes one divided by A.
The entry's A and an exception occurs.
Oops, this gives you the info next entry and it goes to the next entry.
It does a zero, nothing can be done.
When it is true, then youget some result out of it.
Then you get some result out of it but if you notice thatwhat has happened now is that you have otherwise,your program would have stopped at the very first stepbut executed to the end because you were handling the exceptions as they were occurring,so some way or the other.
Imagine that you're getting random input and you're iterating overit and this is how you are sort going to deal with it.
There is also the finallyblock in the try statement.
It can have an option finally clause.
This clause is executed nomatter what and is generally used to release external resources.
This is in case, let'ssuppose you're reading a file, and some exception occursand you're not able to handle the exception but still,you want that regardless of whatever after thefile has been dealt with, it should be closed becauseit's a very, verylarge file.
You just do not want to leaveit chance, to leave it in the memory because that willcause a lot of problems.
So that is where you willdo any sort of cleanup.
Finally is mostly usedfor any sort of cleanup whether a DV connectionneeds to be severed or the file needs to be closed.
That's why you will use the finally block.
So this is an example.
We are trying to write to a file.
Let's suppose an error happens that you can't write to a file.
Ultimately, you need to close the file.
You do not have permissionto write to the file and an exception occurred.
Finally you need to handle it gracefully, where you need to actually close it.
There are also somethingcalled user defined exceptions.
So you can define yourown exceptions as well.
It will be logging t error and name error and value error and zero division error.
These are in-built exceptions.
But as a programmer, it'sa fit to your application.
You can define your ownnature of exception.
The reason why you woulddo this is because it would basically throw errors which are relevant to your application.
This is something which largecompanies or large things do usually, where it's avery, verylarge system and because what does an error do? So think about it in this way,that when you see an error on the screen, it tellsessentially a piece of information to you, which gives you a clue that hey, something has gone wrong and this is how you can probably fix it.
Errors, if you look at errors, they're supposed to be informative.
The laptop or the Python inthe printer doesn't say that hey, something is wrong figure it out.
It tells you, tries to tell youas specifically as possible.
It can understand and whatkind of error has occurred.
Otherwise, it would be veryvery difficult to find problems in your code.
So similarly, if you're workinga very complicated system right, and you know anexception has occurred and you know as a developerwhen you're creating that piece of code that whythe exception will occur.
Why the exception will occur,which is very dependent on your own use case.
That is where you will defineyour own user exceptions.
Now how do you do this? The way to do this is thatthere is something called an exception class you create a class, error class, that derives from the exception class.
Either directly from it oryou can have this sort of multi-level instance as well.
Really up to you.
And then you just use it.
So you, over here, the classis declared over here itself, but if you wanted to importit, you could import it and you could accept, youcould raise it as an error.
Now of course, you can addmultiple things within the class in itself and that willhelp you do certain things.
The stacked way is theway it gets printed.
All of that, but what I would say is that that should be left forlater when you're actually working on bigger systems.
But for now, you canjust keep it in mind that hey, there is something ascreating my own exceptions as well that I can do.
You can implement methods,override methods that are relevant in the exception classso that certain thing gets printed every time or someexplanation gets printed.
All of those things as possible.
You can make it as customizedas possible for yourself.
That's what programminglanguages tend to be a lot about.
So good programming languagesallow you to customize them in the way that you want to use them.
(pleasant instrumental music) Let's suppose we getdata of unemployed youth across the globe from 1947 to 2014.
This has variousinformation about regions, which have high unemployment rate.
For example, Afghanistan andthe region of in that time period whatever, unemployment rate.
But again, it's datafrom all over the world.
This data size is going to be huge.
There are seven billionpeople in the world right now.
Even if you collect only 10% of that data, it's a huge number.
That's 700 million people then.
Even if 1%, that's a huge number.
Then that is 70 million.
So this data size is huge.
How do you deal with it? How do you draw insights fromit, from such a large dataset? Excel will not work out at all.
Let's look at use case number three.
We want to look at geographicaldistribution of posts.
Let's suppose there is aprotest going on at Wall Street, Occupy Wall Street or there'ssomething going on in Syria or something going on inLibya, where either a revolt is happening, a revolutionis happening or let's suppose any event is happening.
Let's suppose multiplesporting events are happening across the globe and youwant to kind of figure out the geographical distribution of a post with a particular hashtag.
Around the world amarathon run is happening and everywhere that leadsor the supporters are using the same hashtag and youwant to see which geography is the post coming frombecause that sort of, let's suppose, dictatesthat where is the moment in terms of if you know wherethe geography of the post is.
This is the location whereit is and the momentum is changing from locationto location to location.
Let's suppose it's allin pictures happening and within a country it ishappening in multiple stadiums at multiple times.
So then you know wherethe action is based on the geography of the post.
Now, large amount of data,lots of insights to be drawn.
How do you do it withoutusing programming? What is data analysis? All of us have heard thisterm over and over again.
There are so many jobs outthere which say business analyst, data analyst, andall the titles in between.
But what really is data analysis? Data analysis is a processof inspecting, cleaning, transforming and modeling datawith the goal of discovering useful information,suggesting conclusions, and supporting decision making.
Let's break it down piece by piece.
You have your raw data here.
This raw data can be evensomething as simplistic as the name, age and salary of a person.
Even that is your raw data.
But what happens is that whenyou collect this raw data from the world, a numberof times it might have discrepancies in it.
Discrepancies such asthe supposed last name is the same in certain cases.
Let's suppose the age ismissing in certain cases.
The salary is missing orsomething else is going wrong.
There are various things thatcan be wrong with the data or not exactly in theformat that you want.
So let's suppose the salaryis given only in floats.
The decimal values have beenincluded but you don't care about the decimal values.
Or you need to transformage into integers perfectly.
Whether age is missing youdon't want a null value you want a zero valuebecause null would throw off your calculations, itwould give you errors.
So all of this comesinto data pre-processing, where you transformdata into desired format and you clean the transformed data.
Transform data in thedesired format would also be the row and column arranged, where maybe you don'twant certain columns.
Maybe you want to labelcertain columns a certain way.
So any of these thingswhere you arrange the data in a clean cut way thatbefore you can even start with any sort of analysis on it.
That way you know the predictably,okay there are no null values, there are no empty values, there are no extreme values of data.
Extreme being, let'ssuppose, salaries are given and you're supposed to figureout the CEO of the company has a very large salary.
He will drive the averagesalary to a particular point.
Let's suppose every employeein the company had a salary of $50,000.
00 but the CEOhas a salary of $500,000.
What will happen ultimatelyis that you might get an average salary of$125,000.
00 or $150,000.
But that is not correct.
That is, what that does foryou, is that it gives you a false impression thatthis is the average salary because that is not the average salary.
The average salary might verywell be 50, 55, 60 thousand.
So you know, those kindof things are required to kind of decide how tomanipulate the data to get into proper shape before you caneven start anything with it.
Then, the final step,which is modeling the data.
Now, the way you model thedata, so modeling is basically drawing insights from it,which could be just simply taking an average, whichcould be take a variance, something else where you say that oh, the variance is so muchit gives you a trend.
The average age is this much.
For an average age of 24 theaverage salary if 60 thousand, which can enable decision making.
Now decision making isnot done by a computer, it is done by you as ahuman being where you know the context, where you knowwhether this is correct or incorrect.
Not correct or incorrect, butbasically what to draw from it whether to take any action ornot and what action to take in what direction.
So that comes from analysis of strains.
Strains are brought forward by the model that you prepare on the data.
Honestly, these are the things that you do in your daily life.
So in the sense that let'ssuppose you see a discount on a website right.
You see a discount on Amazon, you just see a discount on Flicker.
You would compare it right.
Sometimes what we do is thatwhen we are comparing items, on one website it is published as 799.
On another website it is published as 800.
So what we do is that we say,you know let's consider it to be 800 on both.
One rupee doesn't make a difference.
That's transforming and cleaning the data.
That's pre-processingthe data in your head.
Then you prepare a model.
You prepare a model thatwill give one website gives 10% off and another websitegives 20% off if I buy another item with it.
Then the model is preparedwhere you do the math and you figure out whatis the total discount you will get on both thethings and whether you want the second thing or not.
How much money do you have,how much do you want to spend, and you analyze and youthen make a decision.
Again, so there you donot involve a computer because the dataset was very small in size and you just were comparing two items.
But just make that into2,000 items or 3,000 items.
So if it is 3,000 items,then the dataset becomes difficult to do that analysis mentally.
But the device, when itcomes to data analysis, it is something thatwe do on a daily basis.
Next which is why Pythonfor data analysis? Python provides variousmethods for data analysis manipulation and visualization.
For data analysis andmanipulation we have NumPy and Pandas and for datavisualization we have matplotib.
We will cover all of thesein the subsequent classes, in this module and the next module.
But as I've already told you that this is, Python is very, veryuseful in this respect and widely accepted, widely used.
Now let's do a shortintroduction to NumPy.
NumPy is a package forscientific and I would say, even data science computing.
It has certain features, likemulti dimensional arrays.
So till now we've just runa single dimension array in Python, a is equal to one, two, three within square brackets.
But what if you need amulti dimensional array? Multi dimensional array, imagine a matrix.
A row and a column setup like an Excel sheet.
That's a multi dimensional array.
Then once you have themulti dimensional array, naturally you want todo certain things to it.
You want to read from it,you want to write to it, you want to update it,you want to delete it, you know the typical crud operation.
You want to calculatethe sum of the column, sum of the rows, the average of the rows.
That's the method for processing arrays.
You might want to do anelement by element operation, which is something make a sum.
You might want to domathematical operations like in linear algebra.
That depends on, so now of course, which things you do or things you don't do depends on what you're trying to do, but is has multiple featureswhich enable you to do multiple things.
In NumPy you can basicallydivide the operations into three categories.
One is mathematical and logical.
Average, sum, median,variance, all of that.
Second is more scientific.
So if you're familiarwith something called the Fourier transform, ifyou're not from an engineering background and takingthis course, it's okay.
It's not that you need toknow that to be able to deal with NumPy but it'sjust one of those things that is available in generating that it's a mathematical function.
It's used in signal analysisand so on and so forth.
Something of that sort and linear algebra.
A lot of use cases, evenfor things like social media posts and everything.
If you end up doing machinelearning after this course, you will see that linearalgebra, the one that you studied in 7th or 8th grade, has a lotof real world applications.
It's just not theory.
So installing NumPy, very simple.
You just need to go to theterminal and I hope you already have it installed.
You need to pip and install NumPy.
I can't show this to youbecause I already have it installed, but this is thecommand, pip install NumPy.
As simple as that, nothing more.
It's all small, by theway, no caps and that's it.
You will end up installing NumPy.
Let's look at the NumPy array.
Now, NumPy has something called ndarray.
It's a multi dimensionalarray object consisting of two parts, the actualdata, some metadata, which describes the stored data.
This is two key differences here.
First and foremost froma normal Python add in that it is typically multi dimensional.
The treatment always onan ndarray it is set up like a multi dimension.
But the clincher, the big bigdifference is the metadata, which describes the stored data.
Now, if I have to go toPython and let's suppose I have to describe amulti dimensional array.
If I had to describemulti dimensional array without using NumPy, Icould do it like this.
How is it a multi dimensional array? Because I can do this.
This is me saying rowzero, this is the first row and the second column.
Two is in the second columnso you can imagine it being arranged like this.
If you look at this, thisis multi dimensional right.
So if I can do this inPython why do I need NumPy? Because apart from a lot of there things, a lot of other functionalitythat you'll see, one key difference here is the metadata.
So what is metadata? Metadata is informationor data about data.
So let me give you an example.
When you store a file onyour computer, any file.
You can right click andyou can go to properties or details or something of that sort, which gives you extrainformation about the file, such as when was it created,when was it last modified, what is the length of thefile in terms of minutes.
Let's suppose it's a song file.
It tells you about the format of the file, it tells you about certain other things.
It tells you what the author,if there is a description.
You have ever used iTunesor any other media player, it automatically shows the nameof the album and everything.
Where does that come from? Where does the name of thealbum or the artist come from? It comes f rom the metadatathat is stored within that file.
So the file in itself is data.
It is song data, right.
It contains ones andzeroes, which make up a song for us to listen to.
But then this data hasinformation about itself.
Even its name is a metadata.
So one is the song in itselfand then the song and format of a file, but then more information like when was it created, who created it, what user on the system createdit, who the original author, what is the copyright on this.
Several, several metadata points about it.
That is one of the keydifferences here with the way NumPy arrays are created.
They're not like regular arrays.
They can look likeregular arrays as you see, but keep this in mind thatthere is a lot of additional methods available on it, additional functionalityavailable on them.
Plus they have metadata about themselves, which can be very, veryusefuland which is very needed.
Once you're, when you're dealingwith large amounts of data.
Because it tells you aboutthe data and unless and until you know about the data youcan't do anything to it.
Next is that each elementin an ndarray is an object of data type object called dtype.
So every dtype issomething that you will see even in Pandas and allacross NumPy documentation and even in these slides.
Dtype basically tellsthe type of the data.
Now ndarray contains a header.
This is an entire ndarray over here.
It contains a header, which iswhat information eventually.
The first piece of metainformation that it contains about the data that isstored inside these cells, so think of these individualcells as the array items.
It contains the data type.
Now when you extract anitem out of an ndarray, okay, where you select aparticular item to read it or to manipulate it or whatever, the data type gets attachedand it becomes an array scaler.
So this is ndarray, whichis an actual computer skill, there's nothing but just oneitem from that entire array.
Now even this one item in this extracted, it is not just the data.
The data type comes in andattaches itself to it internally.
This is not something thatyou have to do explicitly.
This is just for your understanding sake.
Where there is a header again,which contains information about this array scaler.
So okay, why, what is scaler? Basically just the raw data.
This might be four, but thehead might contain that it's a float, or it's an integer,or it's a complex number.
You might have values as one, two, three, four, five, six, seven, but is it a float? Is it an integer? Is it a scaler? Sorry, is it a complex number? That can be determined by the data type and whenever you detachone of the elements that data type getsattached to it as well.
Why is this important? Because even if it's one,two, three, four, five, six, seven, when ifyou're applying some sort of mathematics to it, whichinvolves complex numbers or you are using it for division, you need to treat it like a float.
Because a division by a floatcan give you a different result than a divisionby an integer in Python.
If you divide by a float, Irecommend that you go back and try doing that.
If you divide by a float,you will get a float in the result, but ifyou divide by an integer you might get a zero result.
Definitely not get thefloat value unless and until you convert it.
So that is why all of thesethings are very very important to know the data type,especially when dealing with this large amount of data.
So let's create a first NumPy array.
The first thing is that you import NumPy and this over here is an alias.
This over here is an alias.
Import NumPy as np and thenyou have a method available.
The reason is that NumPyin itself is a large name so typically it is, whenyou see code online, is where you will see a lot of people use np as a short form.
I suggest that you become familiar with it because this is somethingthat you're going to see in a lot of places.
Now, you can do np.
Array and you pass it a normal Python array.
This is the simplest way to do this.
Array(1, 2, 3) and print(a).
So let's run this.
Now the output doesn't look much different but it looks like just like an array but if you notice there is one thing.
Let me show it to you.
Let's suppose maybe some ofyou have already noticed it.
Maybe some of you haven'tso I'm gonna pass it as original array and I'm gonna pass it and I'm gonna print both.
Right, notice the difference? This is not with commas.
This one is with commas.
For this general array,like you have known it all this time.
Not that it makes too muchof a difference for you when it comes to accessingit, like once you will go down that road, but just notice thatthere are subtle differences and a is the type is goingto be like, if we do this.
See? And this a has metadata about it.
Okay, so this is thesimplest way in which you can create a NumPy array.
Let's look at creating amulti dimensional NumPy array.
This is quite simple.
We do nothing but one commatwo comma three comma four.
Now one thing is thatthis needs to come within square brackets as well.
Don't make this mistake ofmissing the square brackets.
In the beginning people tend to miss it.
There are items, number ofthese items within this array will decide the number of rows.
Let's print it.
Again, see the commas aremissing and this is written in a certain way andeven the output is coming in a certain way.
Whereas if we had a multidimensional array with list inside it it would come withcommas and it would come in a flag structure.
Let's see what happens if weadd an other value to this or if we add another column.
Then it is not able torepresent it like earlier.
But this converts it into an array.
The columns are the same,it will just treat it like a single dimensional array.
This is a single dimension.
If they are same, then itwill make a multi dimensional array out of it.
It will not assume thatright, it doesn't know what the value of five is goingto be so it's just going to create it like a single dimensional case.
It will not create a multi dimensional.
This something about reissuing it, right.
I pass it a list, it justcreated an ndarray artifact.
Another way to create anarray over here is I can use the arrange function.
The arrange function.
I say zero to 1,000.
And I run this.
See again, no commas.
This is slack, this isnot indexed like this.
This is not a multi dimensional array.
This is a single dimensional array.
It would exclude the lastvalue, 1,000 is not included.
Starts from zero first is included.
The second one is not included.
Another important method over here is creating an array of zeroes.
Now, some of you may wonderwhat is the usefulness of creating an array of zeroes.
Well, the use case is alot in scientific computing and machine learning, again.
Sometimes you need things like this.
What it takes is it takes atuple of the size that you want and it gives you that.
So it's five plus five.
If you look at it amulti dimensional array.
I can do, let's try with five comma three.
So three columns and five rows.
Three columns, five rows.
Let's try it with somethingrandom as five comma 15.
15 columns, five rows.
We can to 15 here as well.
So all of this.
Under the method available is linspace.
Linspace creates a linearly spaced vector.
Now this is relevant if youare familiar with the concept of a linearly spaces vector,which is sort of a fancy way of say that okay, thesenumbers which are separated in a linear space, in asense that in this particular dimension, there areequal steps between them.
What it takes is threeparameters to start, stop, till what point itneeds to go and then steps.
So it's kind of like, thinkof the x-axis that you draw on a graph, axis that you draw on a graph.
This is one of those thingswhich will help you (mumbles).
This linearly spaced vectorscan come of help over there.
So I'm gonna cover np.
Linspace,zero comma 20 comma five.
Let me run this.
I get this output.
I can do it like this as well.
It'll have to be a shorter one, yeah.
So this is the number of steps.
So don't confuse it with the differences.
It will not be two, four, six, eight.
This is going to be the number of steps that you want to reach 20 here.
So if it is two steps, thenit is going to reach it in two steps.
If it is going to be 10,that is where you will get random numbers.
It's going to create 10values, which are ultimately going to reach to 20.
In steps of 10, reach 20 in steps of 10.
And steps of 10 being in10 steps to divide 20, so take the second numberand subtract it from the first number.
That, in this case, is 20.
It says reach 20 in 10steps, not exactly 10 steps.
That is what this is saying to it.
Or we do it as a, so it saysreach 20 in exactly eight steps or reach 20 in exactly six steps, where zero is alwaysgoing to be the first step and 20 is always goingto be the last step.
So you need to figure outthe rest of the four steps.
Okay, so we did this even earlier, right, where we took an array,a list and converted it for our use case.
This is pretty much the same,where you can use the as array method as well toconvert an existing sequence into an nd array.
Let's talk aboutrestructuring a NumPy array.
Restructuring, what do we mean by that? It converts a lineararray of eight elements into any sort of reshapeor resizing that you want to do to it.
So let's try this out.
Let's take this and let'screate an array of size, two comma.
Let's create an array ofeight zeroes, okay simple.
Then let's resize this intotwo comma two comma two.
We will print A first here and then we're going to print A here.
First, it falls likethis and then we reshape into this size.
Let's see if it can convert itinto four comma four or not.
Or four comma two.
Now why can't I do four comma four? Cannot reshape the sizeinto four comma four.
So four comma four isbasically going to multiply.
Four cross four is 16.
So if you have eight elements,you cannot make 16 elements out of it, naturally right.
You need to do it in a waywhere you can make finally eight elements out of it.
It can be transformed to two comma four.
There is two rows and four columns.
Or you can transform into four comma two or one comma eight,which is the same right.
Or eight comma one, whereeight rows in one column.
Now the use cases again, wheneveryou're doing any sort of, so let me just give you alittle bit of background about where you might end up using this.
When you're dealing inmachine learning right, machine learning isultimately it's nothing but the mixture of maths,statistics, and programming.
That's the overlap.
There are these three Venn,so think of a Venn diagram where maths or statisticalmaths, linear algebra and probability and programming,all of them meet together.
That is where machine learning happens.
Now when you're talkingabout linear algebra, linear algebra they usea lot with matrices.
Matrices are multi dimensional arrays.
So a matrix is nothing butwhat you see in Excel sheets, shows in columns and thatis what we're doing here with the NumPy array,where we had a normal array but we wanted something more.
In that domain, there are a lotof mathematical computations that need to happen.
Transforms need to happen.
Arrays need to be reshaped toperform certain mathematical things on it, functionsor operations on it.
That is where all of theseare used by Python developers into like, for either makingthose libraries that are doing machine learningor for adding their own machine learning code,where these transformations are really, really helpful.
Just good to know for now.
If you're wondering wheream I going to use it, eventually when we getinto the application part, there you will see that we'regoing to reuse these things again and again and again.
It's very, veryimportantthat you know about these.
There is another one calledravel, where so ravel what it does is that it levels up the array.
It flattens it out.
So if I've gotten a reshape right, let's reshape this intotwo comma two comma two.
Then let's print the ravel out of it.
What will it do? It will basically justgive you the same one.
We can do a reshape and then a ravel.
It will take this flatten, this2D array and flatten it out.
That's all that ravel does.
You can take a flat one and reshape it or you can take a shapedone and flatten it out.
It's just a two way transformation.
It is a converse of what we did earlier.
Let's talk about theindexing of NumPy arrays.
NumPy array indexing is veryvery identical to Python index.
If you want to collect anelement, so let me remove all of this and arr = np.
Now, I want to get, soearlier as you saw right, where we were selecting aparticular element out of it.
Let's run this.
Okay, now what we havedone is simply just access the sixth element out of it.
Let's look at the type and see what it is, because over here itlooked like an integer.
Is it like an integer? No, it's not, it's a NumPy n64 type.
This is one of the gotchas,where you know initially you might get stuck whereyou are looking at it like an element but likea simple number eight, but it is not simply a number eight.
Let's try another thing on it.
This is what we've got and wepicked out the sixth element.
Let's see if we canprint element plus seven.
We can, okay.
But still, understand thatthis is enabled by Python that it is allowing you to add an integer to a NumPy array element.
This data type and thisdata type is not same.
Element has a different data type, seven has a different data type.
This is an object of thebelonging to the NumPy libraries, where seven is the normal seveninteger value that we have.
Yes, you typically cannotadd, so this is like adding a cat to a dog.
The real world doesn'tgive you cat dog, right.
What is happening is that Pythonis handling it internally.
Part of the same that weneed to be conscious about what we are doing.
Let's try to see if this isa type conversion or not.
What is a type of element plus seven? Is it giving us a NumPyobject or is it giving us an integer object? See, so what Python isinternally doing is that it is taking the seven, firstconverting into this NumPy n64 and then adding it.
Because if this was simply eight, okay, if this was not the NumPyone you'd get typed as int.
You see, eight this, eightplus seven it's giving us NumPy n64 and this type isdifferent of eight plus seven.
Okay, what is slicing? We covered it a little bitwhen we were covering Python of the first few classes.
Python's concept of listslicing is extended to NumPy.
The slice object, now over herewe have a more sophisticated way of doing slicing, wherewe can create something called the slice object.
That is constructed byproviding start, stop and step parameters to the slice class.
So this is a slice class,this is a constructor.
We covered this in the last class.
I hope you guys remember.
We're creating a slice objectand we're just passing it to the array, the NumPy array.
Let's try this out.
So we have an array ofnp.
Arrange, let's make it of 20.
Then, we are going tocover it as np.
Okay wait, sorry.
Here we slice, slice offone comma 10 comma two, start stop and that.
Then we are going to print arr.
Let's print arr first as well.
So it started from thefirst one, went up to the 10 in steps of two.
One, three, five, seven, nine.
One, five and nine.
Now there are various ways of slicing.
So you could have also doneit in this way by the way, where we could have usedthe similar notation, so where we wanted to startfrom two and let's suppose go to nine.
You could have done this as well.
This is something thatwe covered in array.
Even that works, but thisis a more sophisticated way of doing it using the sliceobject and you can not use a slice object and do itdirectly like this as well.
You can always use this onewhere it starts from two, it starts from 10, the 10thelement, and goes up to the end.
Or you can take it to thelast element that you wanted to go to.
If I wanted to go to12, start from the top, so it goes to 12 minus one.
So it says 12 first twelve elements.
Now, here is again, now thisis something very very cool, very very nice that we cando with this, where we can extract specific rows andcolumns using slicing.
We can slice the first two rows and the first columns like this.
Let's try it out.
Zero comma two and zero comma two.
You do not have a multi dimensional array.
But let's create amulti dimensional array.
Okay, let's try to cover this,which basically brings in everything but first is forrows, the rows that we want.
We can want zero to three rows.
Let's do zero to four andsee if it gives us an error.
No it won't.
It doesn't give us anerror, it just ignores it.
The rows doesn't exist.
The first is rows and then the columns that you want with it.
Let's see if we give it a new column.
Yeah, it will still work, but not really.
It would be confusing forother developers in terms of why you have used a higherindex, so don't do that.
Just keep in mind thenumber of rows and columns that you actually have.
You could also tell it thathey, just select the rows in this range.
So we could also do it likeevery row after the first one or we could tell it hey,ignore the first column or could tell it thathey, include everything till the second column but not after that.
It gives us the same result butwe can go over it like this.
This one says everythingafter the first row, which is the first row willcontain one, so it ignores one.
Everything from the first row onwards.
This could have a zero as well, right, which is the traditional one, which would include one, four, seven.
So we do it as one thenbecomes four and seven.
This one says hey, all columnsup till the first column, which is not going to includeanything which is like two, five and eight.
This column gets dropped after this two, this column, two, three, column five, six, column and eight, ninecolumn, that gets dropped.
Let's look at few NumPy array attributes.
These are very very helpful.
We can get the shape of thearray by printing arr.
Just three comma three, makes sense.
Next we can get thedimensions of the array, which returns the numberof array dimensions.
Because of the 2D array, soit will return two, right.
Next is item size, thenumber of items in the array.
This is the length of eachelement of array in bytes.
It will take each elementand based on the array type it will give you the number of bytes, number of bytes forthat particular element.
So one is usually showed in eight bits.
Integers are stored ineight bits and that's where it will give you the size.
Not very useful, but sortof something good to know.
Right, there is somethingcalled NumPy.
It creates an uninitialized array of specified shape and dtype.
So uninitialized whereit doesn't have a value, it just has been created.
If you do the following constructorwhere it take the shape.
So you need to give it anarray, which defines the shape of it and the data type.
It will just basically giveyou something with zeroes but essentially it'suninitialized, it's just empty.
Let's try this out.
So we give it a shape, let'ssuppose, three comma five and we give it a dtype of int.
Let's run this.
Okay, so what it is doingright now, it is picking up random values and it isjust populating those in the rows and the columns.
But it will look at the size,the size is being maintained.
Let's try with float and see what we get.
So again, the random values,it empty is basically useful that way.
Now it can also give zeroes,but then that just depends on whatever values it picks up.
Next is reading and writingfrom files using NumPy.
NumPy provides theoption of importing files directly into the ndarrayusing loadtxt function.
The savetxt function canbe used to write data from an array into a text file.
So let's try this out.
Let's savetxt and test.
Txt and let's pass it an array.
Sorry I got the spelling wrong.
Let's try this out.
Let's see if a file got created.
So yeah, test.
Txt got createdand this was returned to it.
Of course it writes itin the Python format, so let's try to create anarray using some other ways.
Let's create an array usingthis particular connotation.
Let's see what the output is then.
Right, so of course if youneed to write in a different way, you will need tospecify that in those with higher options, but fornow this is sufficient.
Similarly, you can loadit up using np.
Txt and you can give it thename of the test file.
So you can load it up fromthe, now notice that what is there on the file and what is being loaded is slightly different.
It's just that when it is storing it, it is storing it in a certain way.
When it is accessing it it is accessing it in a certain different wayand there is minor differences are ultimately the objecthere is not to have the file in a human readable format as such.
It's more about storingthe data on the hard drive and using it in the NumPyarray not any other way.
Now, another thing that wecould do is that you could save it so CSV file, usingsomething called the delimiter.
Now delimiter, what is doesis that it separates out the rules of subsequentvalues that are present in the column using a comma.
So delimiter is somethingthat you specify.
I'll just show it to you quickly.
First let me save thefile in a certain way.
First, let's save the fileand we will use the same save text feature.
Okay, we are not going to try out.
We can use the same method andwe can specify a delimiter.
So the first delimiter that we specify is going to be a comma.
We are going to separateour data using commas.
Let's try this out.
Let's see what is there in CSV.
Now you see the data comma separated.
We could have also separatedit using any other character.
Let's suppose you want toseparate it using the semicolon.
Let's check the CSV file now.
Now, these have beenreplaced by the semicolon.
You can also use anything.
We can use something like adollar, we can use any sort delimiter that we want.
It is purely up to us if youwant a delimiter of this sort.
Of course this is best touse comma because that way it becomes clear.
A reason that you would notuse comma is when your data in itself contains commas.
So if you have any string,if you have any sentences, let's suppose a Facebookpost or a Twitter post, which has commas inside it,then probably using commas is not a good option becausethe library or your Python programming, it will getconfused between what is the comma coming from youdelimiter or the comma separating the return or what is the comma that is coming from the data itself.
So likely it will read,it will break it up into comma that it will also considerthe comma that is coming from the data in itself when you're storing it in a.
So please be careful aboutwhat delimiters you're using, but typically if it'sa numbers kind of data, if it's just numbers, you can use a comma and it will work fine for you.
Other thing we can do is we canuse the genfromtxt function.
Over here we need tospecify the delimiter.
It's just pretty much thesame thing as I did earlier when we're using the low text file.
This low text when we weredoing, genfromtext does something similar and it allows you to again, deal with data in different formats.
Maybe you need to passthe file to somebody else and that is why you woulduse these file operations.
Or you're receiving thefile from somebody else, you're receiving the filefrom the sales manager.
The sales manager asksyou to go through it or something else and youneed to create a report on it.
That is where you wouldend up dealing with files because a lot of times,data in companies maintained in Excel sheets and filesand you will export it to a CSV or something else of that sort, which is compatible with NumPy and use it.
Now let's go on to Pandas.
Pandas is an open sourcelibrary providing efficient, easy to use data structuresand data analysis.
Now Pandas is basicallybuilt on top of NumPy.
Pandas is built on top ofNumPy and the name Pandas is derived, not from theanimal, but from Pandas data, which is sort of a very technical term, an Econometrics term frommatplot, multi dimensional data.
Pandas is well suited for tabulardata which heterogeneously typed columns, which basicallymeans that any sort of rows and column based datawhich contains different types of columns, whichcontain different types of data types, heterogenous.
It is not homogenous, whereone could be name, age, sex, gender, city,different strings, Booleans.
All of that heterogeneous data, it is very very well suited for that.
It is also suited for ordered and unordered time series data.
Time series being thatanything which is time stamped.
So let's suppose thereis a sequence of events, like you were looking atthe geographical use case, where it's a sequence ofevents which is marked by a time stamp.
So any sort of ordered andunordered time series data, that is also what Pandasis very very helpful for.
Now, the third thing is anysort of arbitrary matrix data with rows and columns labels.
Which is pretty muchsimilar to the first one.
Matrix data, rows andcolumn labels and I mean, it's like an Excel sheet.
It's kind of describing anExcel sheet in technical terms.
Any other form of observationalstatistical dataset.
The data doesn't needto be labeled at first.
Pandas is very powerful.
It gives you labeling tools as well, where you can label your data.
When I say label your data,I mean give a column name to certain columns whichare not present originally.
It will accept them andallow you to add your own column names as and when you want.
Now let's look at installing Pandas.
Same command, you needto pip install Pandas.
I'll just have a look at thevirtual environment I'm in.
So I'm at.
So this doesn't have Pandasand I'll have to install it.
Okay, so now Pandas is installed.
We should be expecting thatit will be available to us.
After the Pandas installation,let's look at the data structures within Pandas.
Kind of like how ndarray orNumPy array was available to us, similarly Pandas has some data structures that are available from itself as well.
The first one is series.
This is labeled homogenousarray of immutable size.
Immutable size being thatyou cannot change the size of this array after you havecreated it for the first time.
It is a labeled homogenous array, where the data is not heterogeneous.
Now of course we looked atPandas initial thing was that it could handle heterogeneous data.
For that you need dataframes.
Dataframe is a two dimensionaldata type available.
It can have heterogeneoustypes and it is size mutable, where you can change the size.
So as compared to series,for this one you can change the size of the tabular data structure.
So it like saying that youcan add rows and columns even after you've created an Excel sheet.
Series are most directlyyou cannot modify the number of values inside it afteryou have created it.
Third one is panels.
It is a labeled, size mutable array but the dimensions are three in number.
So it's a three dimensionalarray where even individual element can be a seriesor a dataframe in itself.
So that is where the threedimensions come in from.
Panels is the most complicatedone out of the three, because it's three dimensionalso it can be a little difficult to deal with it.
But it's one of theimportant ones for sure.
One thing to note, notice thatall the above data structures are value mutable.
So what does that mean? Value mutable means that we canchange the individual values but in case of series, youcannot change the size.
So you can have four elementsin that homogeneous array and you can change all the four elements, the value of the fourelements, but you cannot change the size of the array in itself.
Please, please notice theslight minor differences when it comes to mutation.
Mutation being change.
So what kind of changes are allowed on these data structures? You need to be sort of familiarwith when you're using them always there's a chancethat you might get stuck when you're programming.
Let's look at series.
Series is a single dimensionalarray that contains homogeneous data, thatis data for single type.
All the elements of aseries are value mutable and size mutable.
So the elements are sizemutable, value mutable and size immutable.
So you cannot change somethingwhich is like a float to an integer becausethe size of the element in itself is fixed.
Size of the element andsize of the entire series array is fixed.
You cannot change it.
So data can be a multipletime, such as ndarray.
So you can create a serieswhich contains ndarrays.
Or which contains lists, constants, series or dictionaries, et cetera.
The indexes need to be unique, hashable, and have the same length as data.
It defaults to a certain value.
We'll look at this in a short while.
Data type of each column,if none is mentioned, it will be inferred automatically.
So the data type needs tobe present for each column.
It will determine it by itself if you do not give it explicitly.
When it comes to copying thedata, it deep copies the data.
So what is deep copy? Deep copy is that it will copyas much as it can about that thing that it is trying to copy.
It is said to default, false is default, but it can do deep copy as well.
So deep copy means thatany of the references, so even the memoryreferences, it's not a copy, it's not a Xerox copy, it is like the actual document in itself.
So it is kind of creating aduplicate which is very very valid as the original one as well.
It is not simply a Xerox.
It's like getting aduplicate drivers license, which is as good as theother one, the original one.
It's is not a Xerox, it'snot a simple Xerox okay.
This is very important toknow when you are you know, trying to copy it and pass it around and you make changes toyour series data type, series variable in one placebut it does not reflect in the other place.
So you need to knowwhere your deep copying and where you're shallow copying.
The opposite of deepcopying is shallow copying.
Let's create a series.
What we need to do is thatwe need to import Pandas.
We need to create a series.
Let's run this.
If you see that it has givena series, it is an empty one, as there is nothing present over here.
The dtype is float 64.
Now let's suppose youwant to create a series with some data inside it.
One of the typical waysto do it is by creating and using an ndarray.
Let's also import NumPy over here.
Let's create an atarray.
Let's pass it to series.
Now let's see what is the output.
Okay, sorry, needs to be one dimensional.
Right, okay this canbe a little confusing, so let me instead giveit in the reverse order and let me give a littlebit of random values so that it doesn't seemlike the exact same thing.
Right, so nine has an index of zero.
Eight has an index of one.
Two has an index of eight.
The second eight has an index of two.
This indexing the left,this column zero, one, two, three, four, five, that isautomatically by Pandas.
This is not present in the original array.
Now you can define theindexing by your side as well.
I mean, you need to passan additional parameter, but if you do not it willautomatically index it.
Indexing being that it willautomatically set up an index next to it to which you canaccess it as I will show you later down the slide.
But just note that thiscolumn over here, this one, had been auto generated.
This is something that Pandasseries will always do for you.
Now you can also create it using a dict.
So let's suppose you wantto create a data dict and you want to set A andone, B as two, C as three.
We pass the data dict and then we run it.
See now, the index isthat it has picked up has been given by you in a way.
When you give it a dict itwill automatically create indexing using the keysand the second calumn will be the values.
So when you give it anarray, the indexing is done.
You can sort of think itlike this that when the array is given, the NumPy array, theelements are indexed anyway.
So this is zero, one, two,and that is what indexes.
In this case, when itcomes to dictionaries, the index is A, B, and C.
So index being, so thinkof an index of a book.
You need an index, go to a certain page.
The page here is one, two,and three in this one.
The index is A, B, and C.
If you want to go to one, youhave to refer to the index of index where it isand the index here is A.
That is what the series used to create an index for A, B and C.
Next is accessing data from a series.
Slicing works, as usual.
In the sense that let'ssuppose it had more values.
Let me just do that.
It will access two to four minus one.
Of course the last one is not included any slicing in Python.
So four minus one is three over here, which is the value of B.
That is how it will work.
We can always do this as well till the end or you can do it like this.
What happens if we give ita value that doesn't exist? Just to see, what if we give it seven? So yeah, it will just stopat F and not proceed further.
So yes definitely, there isa similarity between this and arrays, but then again,as you proceed with using Pandas in the future classesand the future modules in this one, you will figure out that hey, this is the normalfunctionalities present, less extra things are presenton all of these libraries and all of these new data typesthat we're learning about.
Next is dataframes.
Dataframe is a 2D data structurein which data is aligned in a tabular fashion,consisting of rows and columns.
Now the constructor forthe dataframe object, Pandas.
Dataframe, the first value is data, the second is the indexthat you want to define.
Third is the data typeand the fourth is a copy, the value of the copy.
Now which can be true or false.
Again, the deep copy concept.
Data can be multiple types, as I said, the array is this constant, as you have seen in the series one.
Index can be rows and columnlabels of the dataframe.
It defaults to np.
Similarly, like here wehave a default value.
It will, in this case aswell, have a default value depending on the data thatyou have passed to it.
Dtype of the data type of each column.
This can be essentially alist here, or any sort of iteratable or an enumeratewhich contains the data type of each and every column, since this dataframe containshead rows in its data.
It doesn't contain more data.
So data type here will bewithin it and that is what this will contain.
Let's create a dataframe.
We'll create a list.
10, 20, 30, 40 and we'll create a data.
DateFrame and tothis we'll pass the list.
Then we'll print the table.
You see the index wascreated automatically.
Now the column was given a name by itself.
Notice that the column wasgiven a name by itself.
You did not provide.
So zero, if you'rewondering what the zero is, this is the name of this column.
This is the name of thiscolumn where the values are.
Now why did it pick zero? Because so we had not givenanything to it by default.
When we don't give it anything by default it will automatically assumesomething and it has its own rules in which it willgive a name to the column.
Also call it the label.
So label and column nameare two things that I will be using interchangeable.
As you go down the nextfew slides you will see how we can change the label.
If you're using a dictionary,if you change the data to this it's a list of dictionarieswhere we have A as one, B as two, we have A as23, we have B as 41.
Then we have A again as45 and B again as 47.
Now if you look at this, A andB, where did this come from? It came from these keys.
The way again, don't be confused about when will the label be worked.
Whenever, so whenever it isconsidering a dictionary, even when earlier we sawthe dictionary right, what it does is that theright side of the dictionary is always going to be the value.
That is going to be thevalue in your cells.
If you consider this tobe sort of an Excel sheet, these are the values,one, two, 23, 41, 45, 47.
These are the row numberson the left, zero, one, two.
These are the column names.
Neither column name isnot explicitly provided.
It will automaticallytry to figure out hey, what can be the column name.
Now A is a column namefor these three cases.
B is the column name for the other ones.
Please notice that this ishow it will basically infer if you're not providing it explicitly.
What happens if, let's suppose,one of the dictionaries has column that has not beendefined for the other one? We have something called as nan.
Nan is sort of a null equivalent.
Equivalent would be equal,but it is similar to how we have a sort of not applicableor not present as a value for these other rows.
It goes from zero, whichis the first element right, and says okay, A, B have avalue but C doesn't have a value so it will put nan.
Then it will go to the first one.
Hey, okay A and B have a valuebut C doesn't have a value so it will put nan.
Similarly for the third one.
A and B has a value, C has avalue, so it will put at us 48.
So please notice howit works when nan is a, sort of a null, it's not anumber, it's stored in areas where no data is provided.
Now let's suppose youwanted to give it an index.
Till now what it was doingwas that it was giving a zero, one, two, based on theposition in the array.
So it was looking at the array.
This is the zero, this is the first row, this is the second row.
Let's suppose we don't want it like that.
We want to label it in a different way.
We will give it an index.
The index will definitelybe a list of items.
We want to be very explicitand say this is row one, this is row two, and we wantto give it a row three as well.
So we have row one,row two, and row three.
Why would you do this? Because it makes your code very readable.
I mean, it just makes yourdataset very readable.
That is why you might want to do it.
Okay, let's suppose for asecond these are test results for three people.
One's name is Jim, another oneis Dwight, third one is Pam.
These are the test results for somebody.
Now you're seeing how readablethis is as compared to how it would have been otherwise.
You can sort it like this, youcan store it in a dictionary key value pair as well, but then this is just way more readable.
If you look at the console right, when you're going through thisand you're looking at this, it is arranged like an Excel sheet.
That is where it comes ina lot because when it comes to data analysis, a lotof times it's just looking at the data.
Now we have Python and it doesstore the data really well in dictionaries and listsbut then viewing it is a pain unless and until you arrange it in a very particular fashion.
But this does that for youwhere it loads the data in a certain format and itmakes your life way more easier.
Okay, so there is anotherway to create the dataframe using Pandas.
You can convert a dictionaryor series into a dataframe and there's a trick to it.
This is another, so let's suppose I'm creating a series of 40, 45, and 60.
Index is maths, chemistry, physics.
I create a series two, Pandas.
Series and this is 70, 72, and 74.
Imagine a student's maths data.
Again, maths, chemistry, physics.
Now what we can do here is that we can do the dataframe object.
We can say Jim has series oneand Dwight has series two.
We can print the table.
Right, so now we have a dataframe for marks of Dwight and Jim.
Dwight and Jim aremaking the columns here.
The indexes are being created by maths, chemistry and physics.
The values are beingpopulated from over there, where series two belongs toDwight for maths, chemistry and physics and series one belongs to Jim.
So let's look at dataframeaddition and deletion of columns in the dataframe.
A new column can be addedto dataframe when the data is passed as a series.
This is something thatwe've already seen, right, where the new column can be added.
In the sense that let'ssuppose I add C++ over here and this has a value of 90.
Naturally for Jim, the value will be Nan, where it is not a number.
Does not give it, it doesn't exist.
Like you saw earlier.
Now, let's suppose you sawthis but you want to add a new column now, third student.
So how can you do that? Now adding it, creating aseries every time is cumbersome.
So let's add it directly.
Let's suppose we haveto add the data for Pam and this contains markslike 90, 91, 92, three and the index is maths,chemistry, physics, C++.
Let's add even one morecolumn for her for English.
Let's run this and let's see the result.
Okay, so what happened? Now, English was notpresent for the earlier ones when the dataset was created.
It will be ignored here.
The other ones are consideredbecause they were present at that particular time.
Had we had this like and wewere creating it earlier, then we would have had afifth column of English, but if you are adding it like this, let's see again what we get.
Then you're not gettingthe English column.
Column like this will be addedonly when you're adding it for all three.
So we will come to thatas well in terms of how to add a column, but thisis at least you have learned how to add a row.
A column, so your column isadded but the row over here of English has not beenadded because of this one.
Now you can also delete aparticular column from a table.
So let's suppose I want to add Pam, but I want to delete Jim.
So that's how you delete itby using the del keyword.
Now delete keyworddoesn't return anything.
But if you do something to getthe data as well as remove it you have to use something called as pop.
Instead of printing table,let me print Jim series.
This prints the seriesthat basically is for Jim.
This is 40, 45, 60 coming for Jim.
Delete and of course, even in the table, now Jim doesn't have a recordbecause that has been popped.
So again, to item Pam.
Slight difference between delete and pop.
Delete doesn't return the original element that is being deleted butpop will return the element.
So pop is like popping,you pop something out, you take it out.
Delete is that you just delete it.
So if you have a, let's supposeif you have a box of candies and you pop one out, you'retaking it and you're removing it from the set.
But if you are just deletingit, you're just destroying it, which means that you're not taking it away or you're not consuming it.
Let's look at additionand deletion of rows.
So where we saw recently,that you know we could not add the English row whenwe added the column later.
So let's see how we canadd a new row all together.
Now, first and foremost,you can select the data rows by passing the.
Let's use that.
Print table, let's look atthe mark maths for everybody.
Sorry, I did not use the loc function.
So you need to use the locfunction and you get the results for Dwight, Jim and Pam.
Now notice that how ithas automatically given in to you in a readable form.
It has not just returned17, 19, 40 and 92.
It has just given youwith the labels as well.
It will give you theresult of the labels intact so that you can make sense out of it with all the column names.
Now, in case you want to usea mathematical row reference, where you want to usethat I want row number two instead of the label of the row.
Then you need to use iloc,which is integer location, which will like this.
So we can try it out for multiple.
Name is given, straight even look.
So it gives you this as well.
So again, these are not simple, these are not just simple things.
You can actually access the name, there are ways to access it.
You can check what row nameis being accessed iloc.
Now, let's talk about appending to a row.
So to append to a row,what we need to do is, let's suppose we have andyou want to append to it.
So we need to define a new dataframe.
Now the new dataframeneeds to have the values.
So let's suppose we needto have the value of 30, 60 and 95.
The columns are Jim, Dwight.
Let's run this.
We have run into an error.
It says shape of pathsvalues one comma three.
This is implied three comma three.
So now you need to sort ofvisualize the data that you have.
Visualize the data inthe sense that there are three columns and you want toadd a new row for the columns.
So how would you do this? Let's look at what is happeningover here in the example that is given in the slide.
Now, 11 and 13 are beingadded to two and three.
Okay, now the columns aretwo and three and they are being mapped to this one.
That's all we had to do.
We had to kind of pass it asan array and give it the values within the array instead ofgiving it without the array.
So in the sense that otherwise,if you look at this right, it's an array of arrays over here.
It's an array of arrays,where for the first row, you can add multiple rows in this time because the dataframe willalways contain multiple rows.
Can always contain multiple rows.
So you wanted to add one single row.
Then I was doing it like this, right.
It was implying that Iwant to add three rows, but create a dataframe ofzero, but I wanted to create a dataframe of only one row.
I wanted to create adataframe of only one row which would contain themarks of the English column.
For doing that, I will usethis, where I will append this dataframe to the row.
Do this, right, and then I canchange the table zero index.
I can later do a label change.
Now drop function is used to drop rows where labels are provided.
So you need to provide the labels.
So let's suppose I wantto drop the table now, drop the row now.
I'll say drop zero.
Then I'll print table.
Okay maybe I need to use the numeric one.
Okay, now let's suppose insteadI want to drop the C++ row.
The C++ is dropped.
Similarly I can dropchemistry or I can drop maths.
Let's look at importing andexporting data using Pandas.
Now, similar to how wewere using it with NumPy, you can have a datasetand you can read the CSV by just giving it the path to the site.
So let's try this out.
Let's just create a new CSV file for us.
I'll just comment all of this out.
Okay, it will read this 10, 20, 40, 50, just present over here justlike we loaded it using NumPy.
But now it's a Pandas dataframe.
The data type is of a Pandas dataframe.
Any table that you have atyour end, it can be returned to a CSV file as well.
In the sense let's do this and.
So if you notice that datasethas been exported to the file.
The table that we had overhere has been exported.
Again, makes it easy to you know, kind of have a prepared dataset.
So it happens often that youknow, let's suppose you have a large dataset and youmight process it at once over one at machine and youmight give it to somebody else.
It is very handy that way,where you can just import it and export it in a very compatible way to different people andit can just give your data to different people, letthem handle the same.
Now similarly, you canread Excel sheets as well.
You just need to provide itthe path and you can write to Excel sheets as well in case needed.
So it's just like super simple,just like how CSV works.
In case you ever face issues with this, of course my firstrecommendation would be convert it into CSV.
All institutes can be converted into a CSV so you can do that as well.
Here is a use case or a problem sheet.
So let's suppose Bob wantsto track the demographics of his country by age for every year.
So what he wants to see isthat how does a country's population, the demographic,vary according to age for every year.
So how many young peopleare there in every year.
3% of the people are below 30.
40% are above 30.
So he wants to create thatsort of a distinction.
Now, he collects all of hisdata about the country's population from thecensus, from the internet, from numerous websites andhow he's trying to look at it.
But if he was to look atit just on an Excel sheet or even a Pandas dataframe,how easy would it be for him to digest thatinformation in a single quote.
More than that, if he hadto share that with somebody, would it be easy for thatperson to look at the same thing and infer the same thing? In the sense that he couldhave it arranged and everything but it's not immediate.
The solution that we have forourselves is a Matplot library provided by Python whichhelps us in plotting the data.
Now there's a saying that apicture paints a thousand words.
Even when you have very complicated data, once you plot it on a graphit becomes dramatically easier for you to understandwhat might be happening compared to looking at the raw data in it.
So a short introduction to datavisualization in matplotlib.
Matplotlib is a Pythonlibrary that is specifically designed for the developmentof graphs, charts, et cetera, in order toprovide data visualization.
Matplotlib is inspiredfrom the MATLAB software or the MATLAB programming language and reproduces many of its features.
This doesn't mean that youneed to know about MATLAB or anything, we willstill be writing Python.
But yeah, it is good toknow that it is based on something very strong.
One of those languages thatare used by CAD machines and scholars and universities.
A lot, provides a lot ofmathematical operations straight out of the box andthat way a lot of Python, so a lot of things thatyou can do in Python with respect to datascience and machine learning can also be done in matplotlib.
It's just that Pythontends to be more favored because ease of writingas compared to something like matplotlib.
So installing matplotlib,it's pretty similar to the previous commands that youhave been using for a solution.
This is how the screen should look like.
The command is pip install matplotlib.
All small, no difference from the way it is written over here.
Pip install matplotlib.
Pip space install space matplotlib.
Let's look at plotting in matplotlib.
So the first plot that we'retrying to plot is this one.
Let's head over to the ID.
The first you will importmatplotlib.
Pyplot as plt.
This is very very importantthat you import it as an alias because otherwise, thisentire name can be quite big.
This is again, a user that youwill see as multiple places.
This is similar to whenwe imported NumPy as an np and when we imported Pandas as p.
Very simple plot, we'regoing to plot one, two, three, four on the plot andyou're going to show the plot.
So this first command, thisjust sort of passes the value.
It is not that the plothas not been created, but it's just that youhaven't asked to show it or save it.
Unless and until you tellPython explicitly that hey, I want to look at theplot it won't do anything.
This is also important becausesometimes you might want to generate plots andsave them or sometimes you may be actively lookingat plot right there and then.
So let's take this for a spin.
You will notice thatsomething else will appear, it will not appear over here,whatever machine you're on it will appear in the tab.
Like in Windows it will appearat the bottom right next to the start menu.
So this is something thatmatplotlib generated for us.
It has several featureslike zoom and everything.
That's really nice, but here is our plot.
It starts at one and goes up to the four.
So we were plotting the y-axis over here.
If you look at this,we plotted the y-axis.
The one, two, three, four.
If I add a value of 10 let'ssee if axis goes to 10.
So you see that this value has gone to 10.
This is not that value at all.
Now, we haven't given thex-axis, but matplotlib has sort of figured it out by itself.
It has taken certain default value.
You don't need to worryabout it because normally we wouldn't be plotting it like this.
Normally we would beproviding our own x-axis, but otherwise it's like zero item of one.
The first item is at two.
So it's index spaced.
The way the x-axis has beenmarked that has been the index of these items.
So 10 to the 5.
0 because it's (mumbles).
List of verticalcoordinates are the plotted.
You're importing matplotlib as plt in the and displays the plot.
The x-axis values are implicitfrom zero to n minus one where and is the length of the list.
This is five minus one,four, zero to four.
Now, we can also specifythe x values by ourselves.
The way we can do this islet's suppose we collect this into an array.
So if I am movable to it.
For the sake of demonstrationI'm going to create some mock data for the x-axis.
The second array that I will get with, and this is list comprehension.
So this I'll print it foryou here as well first so that just in case youforgot what this does.
This will plot the y valuesand this is going to square all the y values.
One square, two square, three square, four square, 10 square and will collect all of them as x-axis.
Now let's see what we get.
X-axis is two, four, six, eight, 10.
Now you have the y-axis over here.
First set of values was thex-axis and then the y-axis.
Now you can also use NumPyto generate the list of items and you can use that as well over here.
What we'll do is thatwe'll use NumPy dot arrange and will plot that plot.
You want to plot x and we wantto square the other value.
Let's see what we get.
Let's compare it to whatyou wanted to print.
X is going to be one,two, three, four, five.
This is going to besquared up from zero to 25, which is a maximum square.
Let's print arrange, let'sprint this just so that we can confirm that.
So arrange will take you fromzero to five in steps of 0.
So that's continuous value.
It is plotting all of these values that are being provided over here.
Let's go to the next one.
Now, often we wouldrequire multiline plots.
That is multiple lines on the same plot.
This is again, very commonlyused, and couldn't be an easier way to do this.
There couldn't be aneasier way to do this.
All you need to do isyou need to call the pot again and again.
Think of calling thismethod plot over here as a way of putting a line on the graph.
So every time you're callingplot because putting a line on the given graph.
So what we'll do is that we'lldo is equal to range of five.
It's a simple Python rangeof five and we'll just one it would be power of 10, butyou can see the difference.
These are done power 10 isthat you can spot a difference between them (voice cuts out).
So naturally this line overhere, this is called 10.
This is for 10 and thisis for the star two.
X-axis that they have is the same one.
X-axis is a shared from zero to five.
You would call that as range minus one.
Now in case you wanted toplot multiple lines using the same plot function.
If you think this is a little cumbersome, you can simply copy pastethe same thing over here, as a single one, and it will run.
We got a similar graph.
Let's try another thing.
Let's try it with three lines.
This time we're gonna doi**100, so that you can really really see the difference.
It seems like we havegone out of range here.
We can plot multiplefigures using a single plot function as well.
Let's try with this.
One thing that you mighthave just seen that earlier when I was showing you theplots, even though we are used three of them, the linesare not clear enough.
So the reason for that wasalso because they had sort of merged together.
The values had increased so much now, but you have like comparative values so we can see the difference.
Let's see if we can removethe x to make it work or not or do we need to give therow every single time.
Seems like that doesn't work.
It's giving a different sort of a plot.
We need to give it the same x-axis.
That needs to be followed everywhere.
That's when you will get the three lines.
So in case you want to showa grid in the background, this is just a visual thing,so you can just set this parameter to two, prp.
So we have a grid in the background.
Another thing that we can dois that we can limit the axis.
Rather than limiting theaxis, let's suppose it's a very large graph and youjust want to see a certain portion of it.
Do a demonstration.
The first is for the x-axis.
Suppose only 5, 10.
I want to see the portion ofthe graph which is between five and 10 and for they-axis I want to see something between which is 15 to 20.
This is a coordinate space.
X can be between five to 10and y can be between 15 to 20.
Let's run it.
So we got nothing becauseprobably the graph is not even in this space.
Like there's no pointthat any of the graphs have in this space solet's change this and see if we get something.
Let's change it to zerocomma four, zero comma three, and zero comma three.
We should definitelyget something over here.
We get the three lines.
Let's suppose we startwatching it from this point.
So think that we are goingto cut a slice of this graph along this axis and this axis, where we just want to see the lines.
We don't want to see a single line.
We can set x as one two three and same goes for the other one.
If you just set it to onetwo three, one two three.
It's just launching.
You can do another one,we can do one two, one.
Anything beyond 1.
5 and we are lost.
Let's do 1.
25 and two, 1.
25 three and over here let's to 2.
You see how we sliced that portion? This is the limiting the axespart is really for seeing a particular section of the graph.
That's how you need tothink about using it.
Now you can also usealternate xlim and ylim in case you think thatremembering this is a problem.
So we can do the same thing by using xlim, which would be 1 pointsame thing, 1.
25, three and this one it would be two to three.
We'll pretty much get the sameresults, same kind of graph.
Of course it's up to youwhich one you want to go for.
I would personally preferthis one because at least it's explicit.
You know what you're trying to look for in term of what you're setting.
Let's talk about adding labels to a plot.
Very simply, we can just, okayso let's remove this first.
X-axis and y-axis, as exactlyas I wrote it in Python.
In case you want to add a title.
Let's talk about adding alegend to our particular graph.
So (voice cuts out) the plot statement in caseyou have multiple plot.
So don't forget to give the call to the legend function as well.
You need to set thelabel over here like this and then you need to callthe legend function as well.
Let's run this.
See, now it's lookingmuch more professional, much neater as typicallywe see things to be.
Let's look at saving plots.
Very simply, nothing much to do.
We can just give the name of the file.
It will create the file by itself.
So this is created nowand this is saved on file.
Savefig, F-I-G, save function.
Next, let's talk about plot type.
Matplotlib provides altoof types of plot formats for visualizing information.
This includes scatter plot, histograms, bar graphs, and pie chartsand many more apart from these as well, but these are theprimary ones that are typically used or asked from developers.
Let's have a look at these one by one.
Histogram, so histogramdisplays the information of a variable over a rangeof frequencies or values.
It shows the distribution of a variable or a range of frequencies or values.
Please notice what it is forand sort of try to understand that it's not a bar graph.
It displays thedistribution of a variable.
Now, what we're going to dohere is that we are going to use this function calledrandom.
Randn from NumPy and we haven't covered itearlier, so let me show you the documentation of this function.
Randn,so what is does is that it generates an array of shape d0, d1.
So whatever shapes you givehere, that is the shape that will generate, filledwith random float sample from a univariate normalGaussian distribution of mean zero and variance one.
Please know that theinput variables over here are not the range or anything of that sort that we are looking for.
This is a dimensionality of the matrix.
Let me show you by example.
Two lows in c columns.
Show us in columns thatvalues that are according to Gaussian distribution, which is a mathematical expression.
So it's calling somemathematical equation, that's what you need tounderstand in case you're not familiar with Gaussiandistribution or you haven't studied it or like you don'tremember it right now.
But it's just over here that we define the rows and the columns.
We are not doing anything else.
Now you want to print400 rows and 10 columns.
where basicallyPython is obfuscated that but we can do it with 40and 10 as well I think.
Or let's try with two.
Yeah, this is much cleanerfor two rows and two columns.
So let's plot a histogramwith one of these values.
We are going to take y is equal to this and then we will do.
Hist,histogram short for histogram.
We're going to give it the data value.
Then we say plot.
Let's run this.
So we get a histogram.
Now don't confuse it with abar graph because it looks like that but it isn't around that idea.
In case my sample set was larger,which I will just make it, it will look a lot more likethe graph that you just saw on the slide.
It's showing the distribution.
This is a little heavy, it's stuttering.
Now the histogram groupsvalues into non overlapping categories called bins.
This is just sort of acategorization that the histogram can do for you and to getit done we need to call a second parametercalled the bin parameter.
It's just the bin value.
So let's take it for a run.
So the reason why I changedthe parameters is because it sort of becomes heavy with 100 100.
In the sense that with that,the size, the plot size is pretty huge and so itwould take a lot of time to kind of, that is why the rocket figure was jumping up and down.
Just keep in mind it'snot that it's stuck, it's just that it istaking a lot more time and the time would belesser if you were on a more powerful laptop or acomputer, which has less power or enough RAM.
So then it would be still, butit's just that the computer is doing all thecalculations to plot this.
It's not exactly trivial in terms of time.
So the output takes a little bit of a time and there's a little wait time.
Next is the bar chart.
To plot the bar chart we needto provide it two arrays.
The first array is themidpoint of the load face of every bar.
This basically means this.
So this is one, two, andthree, where the midpoint of the bar graph should be.
This is the height of thesuccessive bar graphs.
So let's try this out.
We'll do set a bar graphas the arbitrary values and we're going to givethem values of 45, 85, 89.
Let's take this for a run.
So the gap came because I gaveit a very random hard value.
This is not how we areusually conditioned in school to kind of make these sort of bar graphs.
It's just that over here Iwanted to prove a point that you can pass any sort ofrandom values and it will take the midpoint over here.
If you look at the midpointof this and you look at the values that are comingover here on the right.
So around 1.
5 is when you will see that's exactly in the middle.
Yeah, this the exact middlewhere my cursor is right now.
6 for this particular bar graph and the 4.
8 for this bar graph.
So you can also plot adictionary using a bar chart.
I have the code ready for this.
Now what we are doing hereis that we are iterating over a dictionary using the enumerate object.
Let's first have a lookat what enumerate does.
Basically it gives an iterator object.
It will just give me the i and the keys.
This is saying that the zero is the first and this is the second one.
So what enumerate kind ofdoes is that it will put it into tuples of this sort.
It's returning that and we aretaking the for the zero value but on the zero point withthe key value of this.
We are going to plot the bar graph.
The zero here, coming fromthis i that is being set, and the one coming fromthis i and then the two.
Now you can also do it like this.
i+1, doesn't matter, itbasically the involvement of dictionary over here isjust doing an enumeration over here.
As the demonstration thateven if you have a dictionary you can sort of manipulate Python.
So all of this, it's not thatit's a matplotlib concept really, it's more of programmingthat you can cover it like this as well in caseyou have a dictionary and you want to plot it on a bar graph.
Yet another way to doit is to show xticks.
Xticks, what they do is thatthey will take the dictionary, they will take the rangethat you set over here, which is zero, one and two,and they will replace them with the labels that you want.
Xticks, if you lookat this, zero, one, two.
I could have gone for thedictionary keys as well, but I wanted to show youa different demonstration.
A-tick, b-tick, not tick.
Let's just do it as b-bar.
Now if I want to do asimilar thing using just the dictionary, I can do it in this way.
It's because what is d.
Keys, it's acb.
So earlier I had done itbut passing it manually, passing it and rewriting it by hand.
But you can of course, use the values provided by d.
Keys as well.
This is for placing thelabels on the x-axis.
Let's look at the pie chart.
Pie chart is relativelyvery simple to make.
All we need to do is nowwe need to give the size initially over here, thesize of the pie chart.
We need to give the dimension.
So size of the plot ininches, which is let's set it as three, three then what we can do is that we can set the pie plot values so it will calculate thepercentages by itself.
Please note it that youdon't have to pass it the percentages, it willdo a sum and calculate the percentages by itselfand you pass the labels.
Should have been labels.
Let's run this.
We got a nice little plot here.
You can similarly increaseas many number of labels as you want and just keep dividing.
So let's suppose I take avery disproportionately sized value and I say this machine learning.
We get these values on a pie plot.
Next let's look at a scatter plot.
Scatter plots displayvalues of two data sets visualized as a collection of points.
Now we need to do hereis that take this up.
This x is the number of points.
Randn, let's print x andy and see what exactly this is going to be.
These are two points.
This is one point andthis is another point.
These are two values of x andthese are two values of y.
When you do a scatter,you're saying that oh, I want to plot two points.
So there are two pointswhich have been plotted.
Now the values that aregenerated over here, these are again Gaussiandistribution values.
So think of them asrandomly generated values which are following certain rules.
They're random but theyhaven't picked randomly based on certain rules.
Two here is really the numberof points that you want.
'Cause this weird of arrayof sort, like an nd array of sort of a thousandcolumns and this one will be again, one cross thousand.
This is square two do ascatter plot of these.
Let's do the big one.
These are thousand pointson this plot over here.
We can do lesser always, wecan do 20 and we can 20 again.
So these are 20 numbers.
Let's see if we reduce thenumber do we get an error? The numbers (mumbles) what happened? So x and y must be the same size.
You will get a value error.
Because what happenedwas that there were x, there were 10 x coordinatesfor which it could not find a match in the ycoordinate, in the y section.
So it wouldn't work.
Finally, let's look at styling a plot.
So let's suppose we had a line plot.
Arrange one comma three andthen you are doing a plot of y.
What the second parameteris doing is that it is this finding a color.
Now you may ask which color is y.
Now this is based on a color code provided by matplotlib in itself.
So C and then we assign plot dot show.
So these are the color code.
B is for blue, c is cyan, g is for green, k is for black, m is for magenta, r is for red, w is white, y is for yellow.
So please, please keep this in mind.
This is not just assimply translated always because black is for k, surprisingly.
You need to refer to the color list.
You got color linesaccording to the coloring that we wanted to give it.
Now sometimes you style yourline in a particular fashion.
For that we also have option.
Now you can either do it in a single plot, as you have seen over here,or you could potentially do it even in here as well.
So let's try this out onone of the lines first and see the output.
We made it a dashed line.
Let's see if we can give it acolor at the same time or not.
So let's give it m and c is taken.
Let's give it a red color.
This can't happen right.
You can't give it the colorand styling at the same time but you will have to use something else.
But for now, okay if you wantto make dotted and a dashed line and then you want tomake a line which is made out of colon statements.
So this is how you do it.
This is a colon line,this is a dot and dash.
This is just a dash.
Solid line, dashed line,dash and dot and dotted line.
Now, you can also havecustomized markers and if it's where the line stylingas you've just seen.
Let's try this out for one of them.
Let's see what o does andlet's completely remove the other two for the timing.
Let's try it out on thisbecause the other one had very equal number of points.
Let's see what happens with d.
D is for diamond, o is for a circle.
This is talking about thepoints that are being plotted.
What is the line styling for the point? But that will happen.
The particular point.
So the other one was theentire line was being styled in a certain way.
This is referring to the control markers, which is the specificdata that you have given.
This is not for any of the.
Let's suppose I was doingthis, this is specific by square values.
Let's try it with the hat onit and you have it with this.
So again, it's just styling.
(pleasant instrumental music) So introduction to data manipulation.
Using NumPy, Pandas andmatplotlib we will analyze, visualize, manipulate largedatasets in order to extract valuable information and insights from it.
Let's look at the basicfunctionality of series ndim.
Now, ndim is an attributeavailable on the dataframe object from Pandas and it returnsa number of dimensions of the data structure.
So let's try this out.
Here I have a dataframe, whichis Pandas series dataset.
Naturally the dimension is onebecause if we print dfo here, right, it is a singular one.
This is the index column andthis is the actual data column.
Now of course we can doit in this way as well where we pass it additionally.
Then if you look at the dimension, the dimension has changed right, because now there are two columns.
So this is basicallythe number of columns.
Ndim basically tells youthe number of columns at a present near data stream.
What axes does is that itreturns a list of the axes of the row labels.
Let's have a look atthis, what do we get here? Okay, we got zero, step, 50 and one.
The index is an A and Band the dtype is object.
Let's suppose we changethis to one to one zero one or let's suppose a randomnumber as one zero three and let's leave it at 51.
Let's see what we get.
Right, so if you look atthe range index it to zero.
It starts from zero then itgoes to tops one zero two and is in step one.
So just to kind of reiterate over this, it is referring to this particular column.
Zero, it starts from zero, goesup till the maximum values.
Don't look at this one, zero.
This is the maximum value.
This is the stop.
So it is a total cumulative value.
These are 102 items, right.
The range index and a step of one.
So in case you want tocreate a range, you will say zero to 102, where the lastnumber is not included.
So 102 will not beincluded, but from zero, if you go in steps ofone, we will reach 101.
So we have to stop at thenumber which is lesser than 102, which is this one, correct.
So this is basically again,sort of giving an idea about the data, part of dataanalysis to kind of know your data in depth reallywell, even before you start doing anything with it.
Next are values.
As the name implies,pretty simple actually.
So let me remove thedf string and run this.
What do we get? So we get sort of an array of two values, the two columns, one and51, two and 52, three.
Each individual row isa collection of arrays.
Not exactly arrays.
You've noticed that theseare more like a series.
If you had just one and nottwo, let's see what we get.
Then we get each individual one.
Let's look at the head command.
Very very important, you willbe using this quite a bit in real world scenarios.
The head, what is does,is that it gives you, so the first five rows by default.
Just that much, becausetypically your data will have lots and lots or likemillions and millions of rows, hundred thousands of rows,and you maybe don't want to print all of it becauseit takes too much time.
It doesn't make sense printing all of it.
You just want to lookat the first five rows.
Or if you want to lookat the first 50 rows.
Or let's suppose you want tolook at the first 10 rows.
So you could pass it avalue like 10 over here.
Without value, it will takea default value of five.
If you pass it something itwill take a default value of 10.
Now, similarly you havesomething called a tail.
So head and tail.
Tail will give you a value from the back, the last five values.
You can of course,specify and I want to see the last 20 values.
Please notice that these are more for viewing the data yourself.
Yes, you can of courseput it another variable and everything.
So not for use essentiallyfor slicing or dicing.
So I don't prefer doing that.
Well, so slicing and dicingbeing that if you want to take out the last six rows, I might write in a differentway, the way that discussed earlier of slicing a dataframe.
But these are more for quicklyaccessing parts of the data which are at the start or at the bottom and going on from there.
Okay, now let's look at some very, very useful functionality.
Now we have something called the sum, a variable on a dataframe.
For that let's set upthe dataframe properly.
Let's suppose we have adataframe with r values and we will change this tonp.
Arrange and we want to go from one to a hundred in steps of two so that we have all values,because it will start with one and then one, three, five, seven.
Then we have even values.
Similarly we havenp.
Arrange in steps of zero to a hundred to two.
Let's just first print thedataframe that we will get.
We have got even valuesto 98 to less than 100.
We've got r values less than 99.
Let's see what do we have.
Something very simple thatwe have is we can just call.
Sum on it and let's see what it does.
So otherwise you would haveto write an if else condition to do this, to checkyourself, which can get tricky as you have complicated data.
Sum sums up the datafor individual attributes and gives it back toyou, but even in forward.
It automatically justlooks at it and you know, it just creates that for you, right.
Now, the std function isfor standard deviation.
So if you want to observethe standard deviation in your data you just call std on it, dataframe.
Std and you willget the standard deviation.
Of course it's going tobe similar for these, because these are similar kindsof numbers, similar ranges and it depends on theranges of the sample set rather than the valueof the number in itself.
Because standard deviationis basically counts the deviation from themean of the numbers.
Because these numbers arevery homogeneously spread from each other, you wouldexpect the standard deviation to be similar.
Okay, next we are goingto cover iterating through a dataframe.
So let's suppose you want to go row by row or column by column andthat is what we will cover over here.
So let's first create adataframe as given on the slide.
Let me remove this.
So we're going to create adataframe which is going to have random from five commafour and then columns are going to be, we havefour columns over here.
We're going to add col-1,col-2, col-3, col-4.
Now, kind of like how welearned about iterating over dictionaries, we cando the same over here.
We can just print the key.
Let's see what.
So you might remember iteritems.
This is even used in caseof dictionaries, okay.
This is a variable, this isthe same method available for dictionaries as well,even the dataframe object has this method available on it.
So let's see what it returns for dataframe and if it firsts ask forour expectations or not.
Just try to think about whatit did for dictionaries.
For dictionaries it returnedthe keys and values over here.
The key and the valuepart of the dictionary.
Let's see what it does for the dataframe.
Okay, so for dataframe, whatit does is that it picks it up column by column.
That is the key.
Then it gives you the value.
So in fact, let me separatethis out because it might not be exactly clear which is which.
So this is the key and thethis is the value, right.
Let's run this.
The key's column one and thisis the value that is being printed for you.
Let's just look at the typeof value that we have a deeper understanding ofwhat it is returning.
So it's a Pandas series datatype.
It is a Pandas seriesdatatype that it is returning once you're iterating over it.
So as the slide says, thekey value pair iterated over consists of the column labelas the key in the series object of column values as the value.
Let's look at iterrows, soover here we have another one called iterrows.
This one is not available on dictionaries because it doesn't fit with it.
This is a dataframethat's a specific method that is available to you.
What it does is that ititerates over the row label as the key and the seriesobject of the row values as the value.
Till now what you were doingis that we were getting all the values for column onewhen we were doing iteritems.
For iterrows we are ableto iterate row by row like you would do in a database.
Like if you were to lookat an Excel sheet right, you would go, what isthere under the first row, what is there under second row.
For column one the value is this, column two the value is this,column three the value is this and so on and so forth.
That is the same progression.
Let's see if we can usevalue.
Ndim over here.
Right, so let me mute this out.
Let me remove this.
So if you look at the value dimension, it is one for the seriesbecause it returns a series dataset but labelwith the column length, right.
So itertuples return aniterator yielding a named tuple for each row.
So in case you want tonamed tuple for each row, that, so this is againsomething you would use when the use case presents itself, but maybe you want itin a different format.
So this is basically adifferent format than getting it in the Pandas data series dataset.
But it is still iterating over the rows.
What it does is that itdoesn't return two values, it returns an individual row.
But if we look at the typeof the row, let's do both.
Okay, so it's first andforemost, it's a Pandas class that is being returned.
It looks like a tuple, butit is a Pandas data class.
The second thing overhere is that you get it in a named sort of way.
You know which is which.
So you know which is which.
For index four, you have all these values.
Index three you have all these values.
Index two you have all these values.
It is a named tuple that is given to us.
Now let's look at some more operations in the Pandas dataset.
Now this is very, veryimportant, please make sure you are paying attention.
This is the groupby operation.
There is a lot of ways thatyou can group values by.
Let's try this out.
I think I have the dataset up somewhere over here.
Okay, let me quickly set it up.
So if you have a data frame,we have a dictionary first actually, you don't even have a dataframe, you have a dictionary,which contains some data related to the World Cup.
It contains a name of the team.
So we have West Indies, West Indies again, India, Australia, then there is Pakistan, thenwe have Sri Lanka again, then Australia again andwe have Australia again.
Then we have India and thenwe have Australia again.
Then we have the rand column,which is values 7, 7, 7, 1, 6, 4 and then finally, we have the year.
Maybe some of you havebeen able to figure it out.
Some of you are alreadyfamiliar with this dataset, the real dataset, but someof you are maybe trying to figure out what it is.
It's actually the winnersof the Cricket World Cup in different years.
That is what this dataset is about.
Who won in which year.
We will simply create adataframe out of this.
Let's see what we get.
Okay, I missed a comma.
Let's run this.
Okay, so from 1975 to 2015,we have a list of teams.
We have the rank that they initially had and the fact that they won.
So this is the initial sortof rank before they started.
The year and the teamwhich won the World Cup in that particular year.
This team won the World Cup,West Indies won in 1975.
Sri Lanka won in 1996and these are the ranks.
Now, of course this data would usually be, like if you were to kind of deal with it, you may think about hey,let's save this in a database.
That's fair enough, or youmight be getting it from an external source, itmight be streamed to you in the sense that it's at theend of API, like a Json API.
Or you're reading it from a file.
But let's assume that you havethis data in your code now.
Now, of course there arethings for those of you who are familiar with sequel.
There are things such asgrouping everything in sequel but then it tends to belimited in what it can do.
Plus, with the flexibilityof Python and the ability to easily you know, writeconditions and everything, dataframe having a groupby function, which allows you have columns.
It's immensely powerful.
It makes life much, much easier.
You just get your dataset into Python.
You don't have to worryabout the syntax or sequel.
We can write all sorts of conditions.
You can view your data in different ways, which sequel might or might now allow for or which it might allowfor but it will make it very cumbersome.
Naturally, you don'twant to be messing around with the database andvariables and everything and writing your first conditions.
So you can write theconditions, even in sequel, but then strongly not recommended.
So let's look at what this will do.
If we go ahead and do adf.
Groupby and you pass it the value of team andthen you as for the group.
Okay, so it has groupedthe different teams for us.
The value that is available over here, so it's a dictionary, by the way.
The value that is availableover here are the indexes where you would find it.
So West Indies is at zero and one.
India is at the occurrenceof two and nine.
Australia is at theoccurrence of three, six, seven, eight and 10.
Sri Lanka is at the occurrenceof five, which is correct if you look at this.
Sri Lanka is at zero ontwo, three, four, five.
That's how it appearseven in the dataframe.
So nothing but justthe group data present.
Now you can iterate bygroups, you can pick up each group value, look at thenumber of occurrences that it has had and you know,make decisions accordingly and proceed from there.
Now let's suppose you want togroup by a multiple columns.
This is that you wantedto group the team by team and the rank, where you're creating pairsof the team name and the rank that the team had.
When India was ranked two,there are two occurrences.
When Australia is ranked one,there are five occurrences.
But let's look at somethingthat is occurring again.
So Australia, as you seerank one, Pakistan ranks six has a fourth occurrence.
West Indies in rankseven is on zero and one and Sri Lanka on rank four is on five.
So let's actually createsome data that you can sort of see a duplicate.
I'm just gonna manipulate the rank here.
I'm going to change therank of Australia to two.
This last one, Australia,I'm going to change the rank to two to see if we can get a new group for Australia in two.
Immediately if you look atthis, so Australia rank two.
Then Australia rank one, group as well.
So this is one group, withAustralia being rank one.
This is another group withAustralia being rank two.
Unique combinations of agiven country or a given team and its rank that is provided here.
Now to iterate over groups,what you will do is that you will simply usesomething called grouped.
So for name and group,then df.
Groupby and suppose we want to iterate by team.
Let's see what we will get row by row.
Okay, we have got the nameof the group then we have the group in itself.
Let me in fact, draw up aseparator so that we can see it clearly.
Okay, group is Australia.
So you have the group nameas Australia and this is the dataset related to that group name.
Group name is India and thisis the dataset related to that.
So you can iterate over each of these now.
Let's look at what isthe type of the group that we know what data, whatobject we will be dealing with.
We will be dealing witha Pandas dataframe.
In fact, have to dosomething in group.
Sum, let's suppose just for thesake of it I can do that.
Doesn't give me insightor I can do group rank.
Then only the rank column is printed.
Or I can group, so I'mbasically on row nine and I'm dealing with a Pandas dataframe.
I'm not dealing with anything else.
So years get displayed.
Now let's suppose you wantto get a single group.
I mean, you don't want to iterate over it, but you just want toget a single group now.
So let's suppose we createa new one called dfgroup and we go by df.
Groupbytheme and then we want to get values for India.
The method is get_group and we say hey, we want the India group.
Naturally, if we have the wrongkey, it will throw an error.
It will say that hey,the group doesn't exist.
You can try it with Australia as well.
Next is aggregations.
So aggregated function isa single aggregated value for each group.
You might want to do asum of certain things or you might want to dosome sort of aggregation, an average or a standard deviation.
All of these are aggregatingfunctions which give you a single reduced valueof a particular column or a set of values.
So let's suppose we need toobtain the group of all numbers labeled all in the dataframe.
This is one we are aggregating it.
We have aggregated all thegroups in the single value for each group.
This is the group of all odd numbers.
Next is concatenation.
Okay, let's suppose we have alist of of World Cup winners and we then we have alist of World Cup chokers.
Chokers are basically the teams here that were supposed to win thatwere doing really well in the tournament, butthey kind of fluttered at the last moment andthey couldn't complete it.
They couldn't like really win,but they were the favorite to starts with but theyfluttered at the last when they could havereally made a difference and they could have really won.
So points would be 95, 764 and 656.
Let's add the points here as well.
We did not have them earlier.
874 and 753, 855.
So let's create two dataframes.
We can simply concatenate,we can call pd.
D of one comma d of two.
Let's look at concatenation now.
So concatenation is a processof combining two or more data structures.
One thing you need to keepin mind while concatenating two dataframes is thatthe number of columns in each of the dataframeshould be similar.
So let's try this out.
Let's suppose we have adataframe with three columns, E, A, and B.
E has values K0 to K3.
A has values A0 to A3.
B has values B0 to B3.
Similarly year old have adataframe with D column, column C and column D.
What you're going to dois you're going to try and merge the two together.
Let's see what happens.
If you see, we haveconcatenated this dataframe to this dataframe.
Now why are the values nan for A0 to A3? Because they key C doesn't exist.
The key D doesn't exist.
Similarly for values whereC and D columns are present, A and B don't exist sothey're appearing as nan.
I hope this is clear to all of you.
We can also use somethingcalled append function, which would append the data instead of concatenating the data.
I'm call this function, pleasenot that you have to call it in the dataframe object,not through Pandas.
This is for Pandas.
Concat,right, when you're concatenating.
If you want to append adataframe to another one, so you're going to say left.
Let's see the result.
This basically determines,let's suppose we try to append a right to left.
Let's see if there isany difference in output.
Yes, there is a slightdifference in output.
So when you are appending leftto right, it takes the left dataframe first and appendsthe right one to it.
This portion is rightbeing appended to left.
This one, right havingleft appended to it.
This is right and this is left.
Let's see what the axis column would do.
So we type in print thengiven the axis column, axis values.
Okay, this doesn't tell us much.
Let's give it a different valueand see what the output is.
There's no access name two.
So let's try to look at the axis for this.
Right, so if you concatenateon the axis for zero you get this sort of an output.
The left array is comingfirst and the right arrays and then the other one.
If you have axis of one however, so axis list terms for the dimensionality.
Axis of zero, so this isbasically a 2D array, right.
When it comes to thedimensions, this is a 2D matrix, left and right.
When you're giving axis aszero what is happening is that it is concatenatingalong this vertical.
When you are concatenating across, this is the defaultvalue as well by the way.
When you're giving an axis of one, it is concatenating across the columns.
So try to imagine an array ora matrix or a two dimensional space, axis zero is sayingthat hey, go along the x-axis and axis one is sayinggo along the y-axis.
So in a way that in this one,the first one acts as zero.
It is just adding of rows,one by one after the other, but when axis is one, thenit is combining the values.
So A0, B0, K0, so just combiningalong this particular key.
Let's talk about merging and joining.
So merging is a Pandasoperation that merges two particular dataframes.
So yes, we just did mergingas well, but they can be a different way thatit can merge two datas.
So merging along a particular column.
Let's suppose we had to mergeon wold championship year and World Cups played year.
Or better than that,let's merge on the team, which is common.
So when you call merge youbasically remove the duplicate column and instead give the final result.
So let me show how we can do that.
Let's suppose I'm trying to merge left to right and I'mgoing to suggest that hey, you merge on the value of key.
Let's see what I get.
Okay, my bad this doesn'tneed to be an array.
This should be sequential values.
Interesting, so we got a resultlike we did here in fact, with the axis is equal to one.
Nearly same because when justconcatenating it was keeping both the key columns, but when it merged, it merged and gave you a singlecolumn that it has merged on because this is a simpleconcatenation operation.
It will just put them together.
This is joining on a certaincolumn, where you're saying that hey, for every valueof Key0 there is a value of C0, D0, A0 and B0.
You're merging these twotogether and the common path gets is not duplicated.
Now, there are multiple kindsof way which you can merge or join two dataframes.
One of them is a left join.
Now, think about it again.
This is what makes it very very useful, where as you say through Pandas, you can just load a dataset into memory.
Now it is providingalmost DB level querying in the application itself.
Very powerful, very easyto write, reduces like, in case you are workingas an analyst specially and you are dealing with adatabase, reduces the amount of time you have to spendwriting database queries because they can tend tobe very long and confusing as compared to simple Python code.
You know again, becauseof features like these that you see it being useda lot in data science, machine learning and so and so forth.
So let's see what the leftkind of join will do to it.
Let's suppose I say merge left.
Okay, we get a sort of output.
Let's do a comparison.
It says merges is on the left object.
The left object is thefirst object over here.
The left most object, soI should actually say.
The left most object.
So it's right and left.
So don't be confusedwith the names over here.
Okay, let me just actually rename them.
I'll call them A andI can call this one B.
So first I was doing A commaB, let me now do B comma A and see if the result changes.
Okay, the result did change.
So left basically says A itmeans as it is, merge B to it.
So merge B to is, which is this one.
When you just interchange,it gives you the opposite.
When you merge from Bto it, so B comes first and then A comes.
This can be important ifyou are, you know you have unequal values or columnsor something of that sort.
So yeah, this can be aparticular way in which you arrange your data.
This can matter, especiallywhen you're trying to analyze the data.
Let's look at the right join then.
What does the right join do? Let's take this and merge.
Pretty sure that some of youmight have already guessed what it might do.
Right, so if you look atthe right join immediately after this one, okay.
Let's do the right join on this one.
Let's see what we get.
This one is pretty clear,right, what is the difference between left join and right join.
Let's look at the outer join.
In this particular case, afull union of the two columns as mirrored on both objectsand let's see how we can get an outer join.
This is, as it says, itmerges two objects based on a full union of the twocolumns of both objects.
So what it does is thatit does a full union, make sure that both of them are present.
Right, similar output, butthis time it's an outer join.
So the reason for similar outputs is because how the data is.
If the data was a littledifferent then you would have of course, naturally havea different kind of output.
So let's say the datawas kind of like this, that there was a fourthvalue, a fifth value.
Let's see what we get on an outer join.
So on an outer join we get both of them.
If we were to do aright join in this case, what would we get? So right and outer are similar.
Let's see if right and left are similar.
Okay, so left is not similar.
What left does it says,I'll keep all the values of the first dataframe.
For B, if there is any extrarow, which is in this case, K4, C4 and D4, I will ignore it.
I get this output, K4, C4, and D4.
Can't ignore there is no fourth row because it was doing a left joint.
It was giving preferenceto the A dataframe.
When it is a right join,it will give a preference to the B dataframe.
When it is an outer joinit is not that it is giving preference to either.
It is basically going to joinboth of them respective of different values.
So let's suppose it has a K6 okay, this was an A6 and this was a B6.
Now you might think thatokay, now both of them are five rows, but the realquestion is the key matching.
Basically trying to match on the key.
Now when it is an outer join,it is ignoring the fact that okay, this key, K6 is notpresent in the right one.
K4 is not present in the left.
So it is giving you this and this row.
However, when it comesto merging on right, it is only considering the key,all the keys from the right dataframe, which is B dataframe.
When you're doing the mergeon left, it is considering all the keys from the left dataframe and making sure there's arow for each one of them.
So if you look at this one right, this is K0, K1, K2, K3, K6.
The entire A dataframewill definitely come.
If B doesn't have valuesfor a certain key, they will come as nan.
The first one gets preference.
Right is the second one get preference.
Outer is nobody gets preference,both of them are included.
I hope this is clear enough to all of you.
Let's look at the final join,which is the inner join.
What does an inner join do? Inner join basically is likethe simpler thing that you were doing earlier.
It will just merge on thekeys that are common to both.
So an inner join or a normaljoin, so the default value, when you do this, in hereis the default value.
So line 24 and 25 are same essentially.
24 25 are the same line.
Let's look at a particularuse case of Pandas.
Let's suppose you'reprovided a large dataset of country wise statisticsto extract knowledge from.
Country, landscape, population, GDP, rural population, a lot of other things.
Let's look at this dataset here.
You have country, you have land area, population, GDP, rural.
Rural is, I think, a kindof score that the person is of rural population,whether it has internet or not or what percentage has internet,what is the birth rate, death rate, elderlypopulation, life expectancy, female labor and unemployment.
We have it for about 216countries, which pretty much represents the world as of today.
Let's explore the dataset.
First and foremost I'll import Pandas and then I'll import the dataset.
I need to give it a path of the CSV.
First thing I need to lookat is the number of rows and columns, which isthe shape of the dataset.
So it has 215 rows and 12 columns, okay.
First is the header itself,so it has 215 countries to line 216 and it has 12 columns, one, two, three, four, five, six, seven, eight, nine, ten and twelve, which is correct.
Let's check the kindof data types available in this dataset.
Okay, so the country is anobject in terms of the Pandas dataframe and everythingelse is a float64 datatype, which is also correct, becauseall of these are numbers.
Let's look at the first five rows.
One thing that it does thatwhenever the number of columns are huge, or like they'reof a sort of more than let's suppose these sixvalues, are seven over here, seven columns it can accommodate.
Otherwise, it will go the next line.
But you can basically seethat it is referencing it as row zero.
So this is just a representation.
It's not that it isbroken down or anything, it is just displaying itby sort of breaking it to the next row.
It doesn't have the space,even though if you look at my computer it has the space.
But then, there's justlike a limit in terms of to what line it will print to.
It will just break it upand these columns are, so this is essentially five rows.
So don't think of theseas 10 rows, this is zero Afghanistan's land area,population, GDP, rural, internet birth rate and thenAfghanistan's death rate, elderly population, life expectancy, female labor and unemployment value.
So you can check the first (mumbling).
Let's pick a random number 16, run it and there we have it.
Okay, let us look ata statistical summary.
You might remember thatwe did this earlier.
Very intuitive function if you look at, I'm just calling dataset.
This is like one of thosethings that I definitely like and really admire about Pandas or even Python in general.
It's very descriptive.
You can just sort ofread it and understand it there and then.
Now, for land area, count,mean, standard, minimum, 25%, 50%, 75% max values.
So we can immediately see thatokay, the GDP mean is 14333.
Rural or internet allover the world it's 43%.
Female labor all over theworld is at an average 58%.
Life expectancy has anaverage of 70 years.
Elderly population is thatof 7.
9% all over the world.
Death rate is 8%.
Birth rate is 21% and unemploymentoverall the mean is 9%.
The count tells us thathey, these are the number of countries you havecaught the value for.
Then you have the standard deviation, you have the minimum values.
The minimum value for bothis 10%, death rate is 3%, elderly population is 5%,life expectancy is nine, nine years old, that's pretty low.
Female labor is 17%.
So you know, it tells yousomething about your dataset.
So maybe you know, after youlook at this and you're like, nine, which are these countrieswhich are sort of driving this value down to nine interms of life expectancy.
The minimum value, I'm sorry 45, yeah, 45.
Sorry, my bad, it was not nine.
Instead of the standarddeviation, that's the deviation from the mean.
That's minus 61 plus 61 from 70.
Doing this generally givesyou clues as to where you might want to lookto a data scientist, oh this is interesting.
Which countries have the best internet? So which country has 96%internet, I'd like to know that.
Right, 96% of land massis covered in internet.
Is this provided internet,that's great, right.
So that's where usually a lotof data science starts from.
You look at your data, youmake some sense out of it, and then you decide okay, I'mgoing to use this algorithm or this analysis, or thisanalysis process to kind of distill more insights andopportunities out of it.
Let's look at extractinginsights from data.
Can we write a program tofind the list of all countries within a size greaterthan 2,000 or thousand square kilometer? Let's first select onlythe specified columns on the dataset.
So to do this, what youwould do is that you would select a make a variable selected data.
I hope you guys remember loc.
We are going to go columnwise.
We will reference the column.
The first is that you wantall the rows and but only the columns country and land area.
Next we're going towrite a simple for loop to iterate over this.
For I in selected_data.
Pretty simple, right? We just wanted to findcountries with a size bigger than 2,000 square kilometer.
We just write the simplefor loop and here we go.
See, so easy, the name ofa country is greater than.
India, China, Canada,Brazil, Australia, Algeria.
These are land masses.
You can also do withgreater than thousand.
The number will definitely be higher.
We can do less than 500,let's see if there are any countries of that size.
Pretty small, so a lotof countries actually, a lot of countries of that size.
There's two less than hundred.
I'm pretty sure we might find some.
Okay, a lot of countriesless than that size.
Let's see if there's acorrelation between the GDP per capita of countriesand their birth rates, which is to say that hey, ifthe GDP of a country is high or low, how does itrelate to the birth rate in that country? What we're going to do isthat we're going to go ahead and plot it in matplotliband going to find a, we're going to plot ourfindings on matplotlib so that we can visualize the data.
So if you look at thisparticular problem statement, the reason why you wouldthink about using matplotlib is that it might not beeasy to see a correlation through numbers, through just numbers, as compared to it beingrepresented on a graph.
So we're going to callplt.
Figure and we're going to create a figure instance in memory.
Gonna set the size, that's 50 comma 50.
Next, we're going to read the dataset.
We have already done that.
Now, we're going to use thisone and we're going to just like GDP birthrate.
Okay, so we're goingto plot a scatter plot.
What we did here was thatwe created a NumPy array using the column, just to see these data.
Then we want to create a scatter plot.
Of course you want to callx limit so that we can just see between zero to 2,000 portion of it.
Let's go ahead and dothat, go ahead and run it and we have the rocket back with us.
Yeah, so I mean, it gives youa certain level of insights.
These values are, of course,provide coincidentally, here a little bit didn'tmap this particular axis, but you guys should mark the axis.
Now 2,000, that would typicallybe a value of the GDP.
Birthrate is going to be (mumbles).
So birthrate, not sucha strong corelation.
In fact, countries with ahigher GDP, if you look at the countries with the highGDP, a lot of them have values less than 25.
But if you look at the density over here, countries with lowerGDP, they tend to have, so look at these values, for example.
GDP is the lowest, but it'sone of the highest birthrates.
You look at this one,this is the top right, nearly 50% birthrate,huge, massive birthrate for a country to have and lookat the GDP, it's tiny.
So the corelation sort ofexists, but not in the favor of the countries withthe high GDP countries.
The lower GDP probablyindicates inefficiencies, indicate developing economy,you know kind of struggling with population crisis, population boom.
Definitely India and Chinaand third world countries, those would contain.
So yeah, kind of gives you an insight into what is happening.
Really cool and you can do much more.
You can do similar analysison so many other things in this dataset and draw insights from it.
Please go ahead and look atthe columns, keep changing them one by one one by one, keepjust running particular thing and I'm sure you will get a lotof insights out of the data.
So this graph's a little different.
It maybe just that the data that I'm using and the one that is usedfor this one are different.
Doesn't matter though, youcan just use the data that you find on the course materialor you can use your own.
Let's compare the GDPs ofthe 10 richest countries of the world.
This time we will only selectthe country and the GDP.
We already have the plot.
Let's sort the data.
So it's sorted the data by GDP.
I'm just going to removethis and we're going to print sorted_data.
Iloc and we're goingto print the first 10 ones.
We are also going to plot a pie graph.
We have the graph, this looks prettier.
Let's select the sorted datafrom different countries.
The top 10 GDP countries in the world.
(pleasant instrumental music) Let's look at developingweb maps using Folium and Pandas modules.
What do we mean by developing web map? Let's look at the problem statement.
Let's suppose there is aperson called John and he works for a disaster management organization.
He's a researcher, currentsearching on volcanoes in USA and population in different countries.
So he wants to basicallymap the population and the volcanoes in a given region.
What is the benefit? He would be able to know howmany people can be shifted to other countries orwould need to be shifted to other countries ifthere's a volcano eruption happening in a given region.
So he wants to look at the fact that hey, there's a volcano whichgoes off in a certain region and there's a large population or there's a small population.
Where all could it immediatelygo to in case there are there's a national disaster of that sort? For this he wants to designa map which will give him an idea about volcanoes and the population of different countries in a single map.
The single map bit is very important here and we will look into it interms of how to get that.
Let's look at the logic to implement this.
The first thing that we needto do is we need to use Folium.
Folium is a Pythonlibrary that can help us in dealing with and generating maps.
This is not Google mapsthat we are talking about or Apple maps.
We're talking about generating our own map using the library called Folium.
Next is that we would wantto create markers to show specific locations in a map.
Markers meaning these icons, these pins.
Next we are going toimport the Pandas library for data manipulation.
We have already used thisin the previous classes.
Through it you're going toread a data file containing the list of volcanoes andthen use Folium to mark volcanic location on the map.
Finally, we are going toimport something called world.
Json and this fileis going to allow us to mark the country of apopulation and together using the first part and the secondpart, where we have created the markers over here and using this, we are going to generatesomething that's going to look like this, whichis showing the population of the entire world.
Let's look at the stepsto design a web map.
First and foremost youneed to install Folium.
The command is verysimple, pip install folium.
In case you have anyissues or doubts with this, please reach out to support.
Step one, so Folium makes iteasy for you to visualize data that is being manipulated in Python on a very interactive leaflet map.
Leaflet map being just asmall map, not a navigable map like Google maps.
It's a leaflet map likea big static sort of map that the physical maps that we're used to, just the digital representation of it.
Just the digital representationbeing Google maps is very interactive.
There's a lot more features,a lot more interaction, a lot more detail.
But Folium is not going tohave that level of detail.
So for example, Google mapshows you locations nearby, interest areas, hospitalsor train stations or metro stations butFolium doesn't do that.
It's just a leaflet map.
In the sense that the levelof detailing is not as much.
It's not meant for navigation.
The next is that Foliumresults are interactive.
So there is some level of interactivity, but not as interactive as Google map.
But definitely, there are certainfeatures that you will see soon enough which make it interactive.
Leaflet is the leading opensource java script library for making mobilefriendly interactive map.
Now, don't be confused bythis, it's just that not that we are going to learn java script.
It's just that Foliummakes use of the leaflet java script library, whichcan create these maps.
Once we look at this, youwill realize that it will in turn maybe using some(mumbles) as java script to generate these maps.
For that, the Python libraryFolium basically relies on leaflet, which is a java script.
Let's look at the commands used.
So over here, first andforemost, you import Folium.
Then you have Folium.
These are the values thatthe constructor takes.
It takes a location of arrays.
This is the latitude, longitude pair.
Next, it takes the zoom levelwhen you're starting the map.
So when you're launchingthe map you need to define the zoom level.
The values can be anywherebetween one to 200 or 300.
I will show you the different results for the different value.
Then the title of the map.
So this is the title of themap that we will be generating.
Now, for the sake ofdemonstration, let us try this out.
So I've given it a randomlatitude longitude.
I've given it a zoom level.
Let's take this for a run.
So now we have Folium.
Htmlpresent for this.
Let's open it in the browser.
We are not seeing anythinghere but if you see that I keep zooming out, we do see something.
It's just that the zoomlevels were set very high.
So let's set it to 200 andsee if we get a better result.
The process is finished, thefile would have been refreshed.
Now if you look for thefirst temp still too deep.
So let's just set itto a very low value now and see if that works.
Yes, that works.
Zoom level 10 works.
Let's try a deeper zoom leveland run this and reloading.
So it seems that the valuethat this can take are limited by those numbers, butzoom start essentially gives you the level of depth.
Let's try even a lower numberand see what it shows us.
Okay, now a wider map.
Let's try 12 and see if 10is the upper limit or not.
10 is the upper limit, evenwith an 11 I think we should get a problem.
Okay, 11 apparently works,but 12 just does not.
So now it's one to 11 sort of values.
So the minute I go above11, it just goes blank.
There is no read here.
This is again, one of the things.
Like it's not like Google maps.
It doesn't have that levelof detailing, but yeah, you can zoom out and you canview the rest of the map.
Now what about the latitude longitude? That's where you want to placethe center of the map at.
So it starts where the centerof the map is that coordinate over here, that is whatthe center of the map.
So if were to change thisto, let's suppose, 11.
Let's see what would bethe new center of the map.
Let's reload it.
The center is this much.
Let's change it to five.
This is going to be a drastic shift.
Okay, now if you see, it's in the ocean.
So we moved downward, square of it.
Let's set it to 40 and see what we get.
Okay, we are somewhere, wehave almost reached Kurdistan.
We are quite high up.
We have crossed India andwe are now in the Chinese region of sort.
So this is how you can createa simple map using Folium.
Now, notice that thisdoesn't have any markers or anything of that sort on it yet.
Now, the next thing is thatyou can create something called a feature group.
Now this is another classpresent through Folium to you.
What happens is that youcreate different feature groups for different kinds of things.
So feature group that youwant to create right now is volcanoes.
What it will do is that itwill give you an instance where you can add itemsto this group of items.
So we will be creating objects.
We have created a featuregroup object and we can add a list or a bunch of itemsto it, not just one of them.
So let's first create this feature group.
Now what we can do is that wecan take a set of coordinates, like this, and call thiscommand call fg.
So I'm not going to do thisexactly, but I'm going to show you something similar.
So I'm going to do fg.
Add_childand I'm going to add a pop up and I'm going to place it slightly different from wherethe center of the map is but within range.
For this, let's set it up as nine.
Next, what you need to dois that you need to add the child to the map as well.
This is the map, thisis the feature group.
Feature group has children and then the map also has children.
Don't be confused bythis, it works like this.
There's map, it is a featuregroup and feature group has children.
So map is the father offeature group, so map.
Add_child will take the feature group object.
Let's run this.
Let's reload the map.
Let's see where the markeris, here is a marker.
Now if I click on it, it says hello.
Why is the text hellobeing displayed over here? Because that is what you haveasked in the pop up variable.
So I can say my home and Ican set that color as blue.
If we load the page, we arezoomed in, let's zoom out.
And here, now it's in blue color and it says what I've told it to.
This way, yes you canmake the map interactive.
This is a sort interactive map here.
So yes, of course, you can addmultiple coordinates as well, like it is being done over here.
So if I was to create ageneralized version of it, this is how I would do it.
Now you should see two markers, right.
What we did was that we justcreated an array of coordinates and we iterated over it.
So it kept adding multiplechildren to the feature group.
Coming to step two, Pandas issomething that you're already familiar with, now whatwe have is that we have the volcanoes USA dataset.
This will be provided to youin your resources section.
Now, we have the volcanoes dataset.
How can we make use of this? We just import Pandas, weneed the volcanoes dataset and then we can printcertain things from it.
So let's just first run this.
It shows us what's in the file.
There's a volcano, there'sa number, there's a name and there's a locationand there is a latitude and there is a longitude.
The status, elevation type, time frame.
So there are different datapoints about the volcano and we can see that forevery volcano you have the longitude and the latitude as well.
So let's print this instead.
Just the latitude and the longitude.
So if you notice over here,what you have done is that in the dataframe you havejust accessed the column lon and lat we got to knowfrom looking at the dataset.
Let's run this again.
Now the first array is thefirst series that you see over here on the top.
This is the longitude andsecond series that you see over here is the latitude.
There's also something calledan elevation of a volcano, it's also an important data point, especially when doingconsidering disaster management, that's the objective of this exercise.
Because that impacts the overall scale of the disaster as well.
This is how high the volcano is.
It can have such a deterministicfactors in terms of what would be the strategythat we use by John, anybody who's planning for this.
So let's come to using Folium.
Now the first thing that youwill do over here is that we have just repeated the code here.
We have got the list of, yougot the separate individual arrays with longitude,latitude and elevation.
Next, we are going to define amethod called color producer.
Now what the color produceris going to do is that it is going to return astring which contains color vision the elevations.
So we are seeing that forevery volcano which is less than a thousand elevation, itwill return the color green.
For thousand to 3,000it's going to be orange.
For everything else, whichis basically volcanoes above 3,000 meters, we aregoing to return a color red.
The reason for this isthat we will be marking different volcanoesdifferently in these colors so that we know which volcanois what is the elevation we just created.
Next, we are going to startthe map and we will pass in these coordinates.
Now, why these coordinates? It's just that thedataset was of that sort.
So the dataset that we have given you here is USA volcanoes and thisbasically latitude longitude pair is set accordingly.
We give it a zoom level, we set a title and we get a feature groupof volcanoes like we did.
Next, we're going to use a zip function, lat, lon, elevation tocreate an iteratable that we can go over one by one.
So this is lat, lon and theelevation, which will merge lat lon elevation and allowyou to access them like this in a sequence.
Next, nothing much, again, location.
So we pass it a latitudeand longitudinal value over that value.
Then for the pop up wewant to show the elevation.
So this is in meters, so theactual elevation we're going to convert it intostring, add meters to it, and then for icon, Folium.
Icon,we are going to pass it the color producer.
The result of the colorproducer over here.
So let me just show youwhat will happen over here.
This it the elevation, right.
This is what is going tocome up in the pop up.
So let's take this for aspin, ran the other one.
So you see, how for differentelevations we got different results over here and these elevations, this is converted into astring here, meter was added, and then we called colorproducer on the element red, orange, so and and so forth.
What we would have produced is map two.
Let's open this one.
So we have all of these mapped.
All of these pointers are volcanoes latitudes and longitudes.
If we click inside itwe get the elevation.
As I said, this is a USA based data.
That's why we set the latitudelongitude to that pair.
As per this, we havegreen, orange and red.
That method is being called here.
Icon basicallyis another class present from Folium which tells youwhat should the icon be like.
It can take various parameters.
Let's look at what all it can take.
So you can pass it the color,the icon color, the icon type, like it can take differenticon types as well.
Right now it is taking this pointer, but you can pass itdifferent icons as well.
The angle that you want it to be at.
This is for you to explore.
Really depends on how.
What is the requirementfor representation? How are you supposed to represent it? You can just you know,tackle it accordingly.
Now we have got this.
So as I was saying thatthis accepts strings only.
So we are converting it into that value.
Next is that now what youwant to do is that you want to take world.
Jason and we now need to map a population feature.
So for that what we do isthat we create a new variable, fgp, which is a featuregroup named as population.
Then we call add child on fgp.
Now this is a different style.
So here what you did wasthe we created a loop, but what another thing thatyou can do is that for adding a child, you can passmultiple values as well.
So we could have created multiplevalues out of this as well or pass a current kind of object.
Over here, in add child,you're passing Folium.
This function is used to show geographical data and maps from Jsons.
So the data that we're goingto pass it is going to come through io.
You need to import ioif you're in Python 2.
So you do an io.
Open,which opens this file, which is world.
This is world.
It's a very heavy file.
It's just a Json file containing the world data of population.
I open it in a write mode,but then in coding UTF8 sig and I read the data.
The data is read into thetest variable call as data.
So we are basically geoJson, what it does is that it will mark the populationof different countries and then we are passingit as style function, which is written using lambda, where we're going to givedifferent kind of colors based on the geography.
So green if this is less than this number.
Orange if it is, and if itis less than this, then red.
Like it is at the high one thenyou're give it a red color.
Now let's add this to themap as well and run this.
Let's see what we got as a result.
So if you see, differentcountries are marked with different colors and wewe have the volcanoes as well.
So in a single map, whatyou have been able to do is we have been able tomap different countries and we have been able to mapthe location of the volcano.
(pleasant instrumental music) Next we have a veryinteresting case study called the Titanic data analysis.
As some of you might knowthat there was a ship called Titanic, which sailedfrom Southampton in US to UK and it basically crashedand sank on its way there.
It hit an iceberg and it sunkto the bottom of the ocean.
It was a big disaster at thetime because it was the first voyage of the ship and itwas supposed to be really, really strongly builtand one of the best ships of that time.
So it was a huge disaster and of course, there's a movie about it as well.
Many of you might have watched it.
Now what we have isdata of the passengers, those who survived and those who did not, this particular incident, this tragedy.
It has been compiled overthe years and published.
Now what you're going togo is you want to look at this data and analyze whichfactors would have contributed most to the chances of a person's survival on the ship or not.
So whether the person survived or died, does it have to do with anythingcommon that they shared? Did the females or didthe children survive better than the males? Did the rich passengers survive more than the poor passengers? Or if the fare amount, theamount of money you paid to get on the Titanic played any role? Maybe people who paid moremoney got evacuated first and there weren't anything left.
What about the workers? The workers survive, what arethe chances of your survival if you were a worker on theship, not just a passenger.
So all of these are very,very interesting questions and you will basically gointo them one by one now.
Now, here is what ourdata is going to look like or it looks like.
You have passenger to the passenger ID.
It's nothing but just aserial or increment ID.
There is no inherent meaning to this data, it's just row numbers in a class.
Next is survived, whetherthe person survived or died.
Zero is for did not surviveand one is for survived.
Next is the name of thepassenger, then the gender, age, then we have something called sibsp.
Number of siblings orspouses aboard the Titanic.
So for this person, if theyhave any brothers or sisters traveling with them, what is the number.
Or their better halves,that they had their spouses traveling with them.
Next we have parch, whichis the number of parents or children aboard the Titanic.
So if I'm a person, are myparents aboard or are my children aboard the ship? That counts.
Next is the ticket number.
Inherently again, sort ofmeaningless as the passenger ID.
Then there's the fare, amountof money the person paid.
Then there's a cabin number,which is basically just the particular cabin thatthey were on in the ship and finally, we have embarked, Cherbourg, Queenstown or Southamptom.
Embarkation being wheredid they get on the ship.
So first and foremost,let's load the dataset and see what we get from it.
I have this here.
First foremost, we importthe libraries, import Pandas, NumPy, matplotlib, math.
We have the Titanicdataset present over here in the same folder.
We are going to just do a pd.
Then, we want to printthe number of passengers in original data, whichis nothing but the length of the index.
Try to recall, just tryto jog your memory about how you got the length of a dataframe.
So you can just, if youhave the number of indexes in a dataframe, whichare the number of rows, then you have this particular data.
So this is df.
Index is an array.
You take a length and youconvert it into strings so that it can get appended.
Let's run this.
This is basically going totell us how many passengers do we have information for.
We have 891 passengers, as we can look at.
That's a decent size.
We can draw insightsfrom a data this size.
Next is let's look at the head list, look at the type of data we have.
So as we said, passengerID survived, Pclass, sex, age, sibsp, parch, ticket, fare, cabin and embarked.
So this is just a very cursory analysis.
Let's dive a little deeper.
A lot of times what happensis that when you get data set from outside world and becauseit's an imperfect world, a lot of times you will find that the data has certain missing values.
So think about it in thisway that whoever collected this data of Titanic passengers,they could be missing information about so many of them.
The person couldn't be found,or some sketchy details came in or somethingelse, something was told, something was not told,something was never known.
There are lots of scenarios like that.
So let's just print the sum.
Let's just have a look at how many of our values are missing.
We call isnull and then wecan call sum method on it.
Let's see the output for this.
Age is missing for a lot of them.
Cabin number is missing for a lot of them.
Well, age could have been avery, very important criteria, but apparently we don't havethat for a lot of people.
So let's see how we'll get by without it.
But cabin number is not as important, so we are good that way.
What you are going to do,so all this issue is that we are going to removepassengers which have ages null.
So we are going to use thenot null method available in the Pandas library andwe are going to pass it the column for age from thedataframe, where it is not null.
Now here we are using the word wrangle.
Wrangle is any sort ofmanipulation or any sort of dealing with the data.
So in this case, wranglefor us basically means that we have removed certain valuesand we are just dealing with a modified version of the original data.
Now the number is 714.
We removed all the passengersfrom our consideration which did not have a age provided to us.
Next, we are going to look at gender.
So number of passengersin age embark wrangle data where age and embark both missing.
This is the embark wrangle data.
Now, what you want to do isthat we want to group data by gender and then let's first do that.
So for doing that, weare going to define new variable gender data.
We're going to take thisdataframe, embark wrangle, which doesn't have any missingvalues for age or embarked.
Now notice we did not do this for cabin, even though there werea lot of arrays missing, because inherently it is useless to us.
Now of course, this is whereyour judgment call comes in.
This is where your brainis used to decide okay, this is not useful and we cando without thinking about it.
For this, we simply callthe groupby function, which is going to groupbyon the gender or sex and then what you're going todo next is very interesting.
We're going to take out the mean.
The mean of the gender data.
So let's see what we get for this.
Now, what we have done firstis that we have calculated the total survival rate, where for the survived column is.
So 40%, only 40% peoplesurvived the disaster.
That is what we calculatedbecause we took the column survived, which was zeroesand ones and we took the median value.
There were 41 ones ofthe hundred were zeroes, then it would be 40 over 140.
That's what this mean willbe to figure out the survival rate, otherwise it wouldn'tbe the survival rate.
It's just that becauseit's ones and zeroes.
That's why you have got themean survival rate here.
Next what we are goingto do is you're going to get the mean data that wehave created by gender.
So we're going to look atthe mean values for all the data points.
For a survival rate or chanceof 40%, what was the average gender, what was theage, what was the pclass, what are the differentvalues for an average case? So for an average case,for a gender of female.
Notice first and foremost,how this created two rows for text based data becauseit automatically realized that hey, there is no senseof saying that we took a mean of male and female.
Where male and femaleare completely different.
They've been marked as such.
If they were not marked assuch, if there was no sex column then of course therewould be only one row.
But because we haveidentified this as a feature, this has something whichis a contributing factor.
We are saying that gender isgoing to be a contributing factor to the survival rate.
We can sort of see that how,because for females passengers, the survival rate is 75%.
For male it's only 20%.
Immediately we know thatfemales were preferred over men.
Women were preferred overmen when it came to survival, which is usually the casein cases of disaster.
Then we have other insightsabout the average age of female passenger that gotsaved and the average age of a male passenger.
We have the pclass, whichis sort of consistent, that it's greater than two,but still, a lot of even female passengers from thelower classes were saved.
We also had the sibsp, whichindicates that a lot of female passengers weretraveling with children, as indicated in parch.
71%, a lot of them werereturning with the children, as compared to the men.
It's at 27%.
So again, women and children.
If this 27% of the men whowere traveling with children that adds to the chances of survival.
The way this adds to thechance of survival are if I'm a man and I'mtraveling with a child and I don't have a siblingor a spouse and I don't have a parent, then I get the second preference because I'm traveling witha child and the child cannot be left alone.
So for women, the preferenceorder comes with children, then women, then the nextone is men with children because the child needs to survive.
Whenever the child getsthrown into the lifeboat, the men and women were solelyresponsible for that person.
The child also get added,like we can build that as a hypothesis.
All of these are theories.
This is basically hypothesizingas a data scientist or as a engineer, but youvalidate all of these hypotheses through data.
So I may be wrong about what I just said.
Maybe data will tell us a different story, but that's really the nature of the game, where you have to be readyto be wrong about things and you're to constantlystill sort, sort and sort until and unless you areproven wrong definitively and then you move on to anothertheory and another theory.
But being proven wrongdoesn't mean that you should stop there and then.
It means that you should keep exploring.
It at least cuts down one ofthe options in front of you.
That's through data.
We are playing by eliminating choices.
So few other observation, thewomen tended to be younger, travel with another person, have a greater numberof siblings or spouse, say best be feature andchildren parents feature again stood out.
So another interesting thingis that women paid more for the fare, $47.
00as compared to the men.
Now, because the survival rateis so high, then the pclass socioeconomic status, whichis indicated by the pclass is also lower because it hasa lot of mixed people then.
At this high a number, thepclass average would be a higher number of a lowernumber because a lot of people get mixed.
Let's do further investigation on gender.
So now this tells us thenumber of female passengers and the number of male passengers present.
Next what we are going todo is that we are going to set the columns of df becausewhat you are going to do is that you're going to setthe columns as sex and total instead of sex and passenger ID.
So let's do that and print total df again.
Now it has changed, the data point hasn't.
It's just that we havechanged the column name.
Earlier it was mean data and by gender, but now you change it to sex and total.
Now next you're goingto get the gender list in another variable andwe will delete that column from total dd.
Let me print total df hereand let me print gender list.
Why we deleted, I'llcome to that in a minute.
So first is that we printed total df with the sex column deleted.
This is with the sexcolumn deleted in total DF, this variable.
Then we have gender data, which is basically grouped by object.
I mean it's a class object.
We can't do anything with it till now.
Next, we do something calledas gender survived dataframe.
Let's print and see what weget over here as a value.
So now what we have isthat we have gender list.
Earlier we had a total numberof people, total number 259, 453, now we have the survived, 195 and 93.
These are not two differentnumbers that we have.
Let's delete the sex column from this.
So here we have the gender survived df.
Next we're going tocombine the two together.
What we're going to do isthat we're going to take total df, from which wedeleted the gender column and we're going to take gendersurvived df, where again, we delete the sex column.
We are going to combine them but why? What will this give us? So it will give you the surviveand the total by gender.
We just did not need itover here, that's the reason for skip deleting it from elsewhere.
Next we can go ahead and plot this.
We can plot, on a graph, howthe survival rate varies.
The first thing is that wecall the.
Plot method bar graph that is available to uson the dataframe itself.
We pass the color limegreen and dodge blue to give it a title.
We give it a label, X label, y label.
Now we are going to plotthe gender across the x-axis and the number of people on the y-axis.
We are going to set the text.
Next we are going to setthe survival gender list.
This is the survival genderlist in zeroes and ones.
Zero is female, one is male.
The total gender list,again, male and female.
Now we have also defineda small function to create labels on the plot.
What you will do is we'll passit the survival gender list over here, which containsthe data for the survived, according to gender.
We will take this pltobject present over here and we are going to plotthe x plus x adjust.
It's just a minor adjustment to plot it.
Y plus y adjust and we'regoing to give it the color and the font weight.
So how we have done it till now.
You do this for both survival gender list and total gender list and then next, we are going to plot this.
This is the effect of gender on survival.
If you notice, effectivegender on survival, the color, x label is gender.
So basically whatever you passto it gets printed over here on the plot.
So as it appears, onaverage women were more than three times likely to survive than men and due to the binarynature of the survived field data type, where we arenot considering injured or anything else, justsaying pure survival.
We assume that there was nothingbad that happened to them afterwards, which wouldbe considered as caused by the incident or the accident.
We can confidently say that75.
3% of the women from the data set survived and only20.
5% of the men survived.
Now we can also draw furtherinsights from whatever we have done till now.
Here are a few additionalquestions that may help provide further explanation forwhy the women survived three times more likely than the men.
We can look at what is theeffect of age and survival rate.
We can look at what isthe effect of company, whether somebody's travelingwith a person or not.
We can then also look at theeffect of the socioeconomic status, which class, howmuch fare did they pay on the survival rate of the said person.
Let's first look at effectof age on the survival.
So what we're going to dois you're going to take the embark wrangled dataframeand we are going to group by survive and then we'regoing to take a mean of that.
Then you're going to print it.
So the groupby on surviveand then we took a mean.
If you look at age for female,the average age is 30.
For men the average age is 28.
So here we can see thatsurvivors on average are younger, are presumably from ahigher socioeconomic class.
That is a more luxurious class.
9 versus 2.
5 over here.
Pclass, luxurious classticket and a high ticket fare, $51.
00 versus $23.
Traveled with less siblingand spouses and traveled with more parents and children.
For survivors, you cansay that they're younger, they are traveling with fewerpeople, at least when it comes to siblings and spouses.
They have paid a higher amountbut class is not as great as the other one, but thereare certain things that we can ascertain from this data.
So continuing further on this,to figure the effect of age.
First thing we're going todo is we're going to split the data into children and adults.
The way we are going to dothis is that for children data we are going to set thecriteria, age being less than 18.
We are going to create afilter on the existing dataset using this as well.
This syntax can be confusingto a lot of people.
If it is confusing in termsof how this is working, this is where Pandas comes into it.
I'm filtering on this Pandas dataframe.
We're using a condition onthe dataframe on itself.
This is a very, very common way of using the Pandas dataframe, if you're already not familiar with this.
Next we're going to get the adult data.
Again, the condition is ofit being greater than 18.
Let's get a count of the childrenand a count on the adult.
We take the children data,passenger ID dot count, and take the adult data,passenger ID dot count.
So the reason why we aretaking the passenger ID is because these are going to be unique.
Passenger IDs are notgoing to be repeated.
We can expect that.
Now let's check out thesurvived children count.
We take children data on thecolumn survived we do a sum.
Because survive is one andzero, so is number of times you count one it's just goingto add up to be the number of survived children.
For survived adult countsit can do something similar.
Next, we are just going toput this simply into lists.
We are going to create achildren list of survived children count and total children counts.
So think of this second oneas the total children count.
Similarly for adults, weare going to create a list.
Then finally, total.
So let's stop here andprint these three lists.
Children, adult list,and then the total list.
So, just to kind of sum it up,what we are trying to do is that we are going toget the different counts for survived versus totalchildren and we are also just accumulating data.
Where is the intuition of, wheream I getting the idea from? It's just because Iwant to compare by age.
That is the initialidea we're running with.
We're saying that hey,let's split it into children and adult and then let's look at the data.
So let's run this.
The data is that 70 childrensurvived out of 139.
For adults, 218 survived out of 573.
The total number ofchildren are 139 and 573.
For the total survival,you can just add two items on the list, survive list.
Let's create a Pandasdataframe out of this.
We will call it CVSA dataframe.
What are the values in this dataframe? The children list and the adult list.
First row for childrenand second row for adult, coming from this data.
The columns are survived and total.
Let's see what we are going to get here.
So you can see, for survivedchildren we get 70 and 139.
For adults you get 218 and 573.
We kept collecting and kept collecting it and this is where we ended up using it.
This first element is anarray, the second element is also an array, whichis making an entire row.
The index is children here and adult here.
We are setting the index.
This is corresponding to this row.
This is corresponding to thisrow because index is row-wise.
And then the columns aregiven as survived and total.
Let's create a simple plot out of this.
We are going to call plot.
Barand then you are going to simply, as we have been doing till now, set the title, set thevariable, set the xticks and we're going to callcreate value labels as well.
Function as well and finally,after all of that is done, we're going to plot it.
Let's run this, let's see the graph.
So we can see for adults,this is what is happening and for children, you cansee a ratio of 70 to 139.
We have just taken thedata and drawn it up.
Next, we can createanother plot where we take the survival rate for children and adult.
Now this is childrenagainst the total children.
Now we're going tocompare children to adult.
We create a new list and we call plot.
Bar and we change the table, wechange the title and the label.
Let's run this.
This is one plot and we get another plot.
This gender of the survival.
We get another one, survival rate between children and adult.
This is right, children havesurvived 50% of the time and adults have survived 38% of the time.
How did this graph come? Through this, where wechildren data dot mean survived and adult data dot mean.
So we took the mean and themean will be these values,.
After that we justplotted it, nothing else.
This is survival ratebetween children and adults.
So what can we conclude from this? We can conclude that 50.
7% ofthe children from the data set survived, while 38.
1%of the adults survived, resulting in childrenbeing 1.
3 times more likely to survive than the adults,which seems logical, given that is how humanbeings tend to respond in a tragedy where theysave the children first as compared to the adult.
Let's look at the agedistribution of all passengers.
Let's see if that gives us any idea.
For this we're going todraw a simple histogram.
For this, what we are goingto do is you're gonna take the embark wrangled data.
In case you've forgotten whatthis is, we used it earlier.
Let's see what theembark wrangled data is.
This is the embark wrangleddata where we have removed the null values and we're justgoing to plot the histogram of age distribution of all passengers because we have taken theage column and we have called.
His bins with arange of zero to 100.
Because that is the rangewe expect from zero to 100 that is the human age.
The human age is not in decimals with us.
It can be represented in decimals,but then you just want to see how many people are of what age.
Just in case you're not ableto recall any of the syntax, how is this drawing a histogram,what all these are doing, I strongly recommend please go back, have a look at themodule around matplotlib.
This is a histogram of age.
As you can see there are a lotof passengers in this range between what seems like 18 inhere the age for at least 38.
This is 38 or 35 actually,and this is round about 14.
So from 14 till this number,I'm watching the x-axis that is appearing over herewe can just see in histogram.
Let's look at the distributionof survivors in particular.
This was not just forthe ones who survived.
Let's look at the ones who survived.
We call the same one for survival.
We have the survivor data.
You can put the age label.
Let's see what we get.
This is for the survivors.
Now, as you saw earlier ofcourse, it is going to favor the young way more, wherefor children it's going to be quite high.
The survivor rate is quite poor after 40.
Let's now look at theeffect of age on survival.
First let's look at survive status.
We are trying to do a describe.
Let's look at what thedescription tells us.
For survive, we have themean, standard deviation, minimum, 25%, 50%, 75% values.
The mean is that pretty much as expected, but these other survived.
Now what we are going to dois that we're going to take the age data, because we wantto check the effect of age on survival rate.
We are going to group by age and then we're going to take the mean.
We'll create a list out of it.
Next we're going to use ascatter plot to plot the effect of age on survival rate.
This is how the survivalrate varies with the age.
Some of these numbers youcan see the survival rate just goes down to zero.
For some of these numbers, thesurvival rate is almost 100%, where we can say that ifthe person is of this age, out of this age, out of this age.
This is not saying thatthis is a certainty, but this is more of whatthe data is telling us, that this is true.
That if you were of this age, for example, if you were of 80 yearsold and on Titanic, you would definitely survive.
But this is not pattern,these are just probably in a chances because this is not the norm.
But otherwise, this is amore realistic version of it.
Now, please notice thatwe have used a little more complicated scatter plothere to kind of draw it out.
Because you can see thekind of effect we got.
So what I would recommend isthat you look at this yourself.
What is alpha, what isC, what Cmap, edgecolors, the colorbar, which isappearing on the side, if I can show it to you again.
So this color bar again,how it is coming up.
Things that is reallyrequired in programming that doesn't get talkedabout as much as the ability to read documentation and teach yourself.
Because it's a lifelong process.
Everyday will be challenging.
Everyday there will besomething new to learn, something new to do.
Because of that nature,you need to develop that independence of sort.
What I would recommend isthat you look at this piece of code, go back.
If you can't figure it out, Iwould suggest that you reach out to the support becauseI can tell you right now what all of this means,but then (voice garbles).
What I would insteadrecommend is that you change certain some of the values.
So one thing is that what if we set ss10.
Can we sort of figure out what changed? Maybe, I don't know, the dotshave gotten a little tinier.
Let's change S to areally low value of one.
Let's see if we can discern what changed.
If you notice, now the dotshave almost disappeared.
S basically determinesthe size of the dot.
If you said S to, let'ssuppose, a hundred, a bigger value than before.
This is purely experimentation.
I encourage you to do itmore and more with whatever code you are getting from us.
Now, this will help you out quite a bit.
Now let's try it with this one as well.
What is vmax? Now 34.
If you notice, the number ofpassengers, this color bar range has gone up from13, it was earlier 13, in case you did not notice through 34.
If it set it as 35, eventhe 35 number will appear in the bar.
So notice how there is no 35 right now.
Let me run this.
Now there is a 35.
Let's do a little more dataanalysis before we leave this data set.
The first five ages onlyhave two or less passengers in their age group.
The first give ages onlyhave two or less passengers in their age group.
With such a small sample size,the survival rate may not be representative as thosewith larger sample sizes.
So in this sense, what weare saying that let's suppose there is an age of 12 yearsold, but it has only two or less passengers.
Either one or two passengerswhich are 12 years old.
Now, if you were looking atthe survival rate of 12 or one, versus 22 where you had 100 passengers or you had 50 passengers, youdon't have enough sample size.
The denominator in thatequation of the total number of people of age 12 andthe total number of people of age 50, it is not largeenough for you to sort of be certain that yes, this is the pattern.
Because one of out two or two out two is not a big thing to happen.
But 50 out of 50 establishes a trend.
If 50 out of 50 people of age 22 survived or did not survive, it tellsyou very clearly that hey, there is something happeningwith that age group.
Somebody's by the ones,you know who were 22, they were considered tobe too young or too old or something else or they werehelping out in a certain way that none of them survived.
Let's do one thing where to getmore insights into our data, let's remove passengerswith ages in which there are five or less passengers in the group.
For doing this, what youwill do is that we'll get the total count age.
This is nothing but the totalcount of passengers in itself.
Then we are going to get count age gd5.
Gd5 stands for greater than five.
So where we are going todo, we're going to take the count age, the countage of particular column, your sorry, dataframe,and we are going to count for where it is greater than five.
Next, we are going tocreate a list out of it and we are going to keep only data where, so we're going to call theembark wrangled dataframe.
I'm going to use that andwe're going to filter that on this list.
This list is going tocontain certain ages.
Let's see what ages it contains.
These are the ages it contains.
Now we are simply tellingPython or telling others that hey, just keep thepassengers which have ages as one, two, three, four, nine, so on and so forth.
Remove all the other ones.
Let's see what we get.
Now the number of passengershas gone down because we have removed certain age categorieswhere we did not have enough representatives.
This is only useful whenyou are looking at things from a age perspective.
If you're looking at itfrom a gender perspective, probably you would haveremoved people of a gender if it was not representative.
If there were only two females on board, there is no point of lookingat it from hey, okay, you know, this is the survival ratein females and males.
Because there were sizeablepeople in those sample sizes and this is something youneed to be very careful about.
Sometimes you might getdatasets where there is a classification or there's a labeling for a certain age groupbut you can get thrown off.
So let's suppose in theprevious scatter plot, if you saw with certainty that okay, anybody of age 80 would have survived, how do we know that this isn't one person? Because this is the survival rate, this is not a survival count, how do you know this is not one person? One out of one doesn't confirm anything.
What we are saying is thatno, these are essentially would be able to considerthe ones where there is enough sample size.
We take the data that wehave and we plot a histogram of the ages.
These are for ages whichhave enough sample data.
That is more than five people.
You immediately see thatthere are a lot of gaps.
These gaps weren't presentearlier, but now these gaps are present because we have removed these.
These groups did not haveenough number of passengers.
Anything less than five numberof passengers is removed.
All of these are greater than five.
You can even discount for these ones.
You might want to considereverything above from 10.
Depends really on themax number of passengers in a certain age.
Now let's look at effectof age on survival rate given that you've cleaned the data, given that you've removed passengers.
Total of less than 31.
So we are going to drawa scatter plot again.
Going to group data by age.
Going to get the mean.
Going to get the number of passengers in the age greater than five.
We're going to do a countand then we're going to do a scatter plot.
This is the outcome.
See, immediately thattop row gets disappeared.
The top row is no longer available.
If you notice now, those, that top line, that survival rate of oneis completely disappeared.
There is no age group nowwhich has a survival rate that one of all of thepassengers are that.
Because it seems thatwe had a flawed input.
So if you went by what wesaw here, effect of age on survival rate was effect of age on survival rate cleaned it up.
These are two different onesand this is a more valid one if you ask any datascientist because sample size wasn't filtered out properly earlier.
So here are some otherplots that you can look at.
Effect of socioeconomicstatus on survival rate.
Then we can also look atthe ticket class and fare.
It can be indicatorsof socioeconomic datas.
You can focus on investigatingticket class as that it's specifically stated as a proxy for SES by the data dictionary.
We can focus on the ticket class is what we are trying to say.
That both can be indicators,but you can look at ticket class because even if faremight have been lower, but then those days they werevery strict on who traveled in what class.
You had to be a certain gentrytype to be able to travel in first class.
Now, that way we can notjust say economic status but the socioeconomicstatus of ticket indicate, where we can say that moreprominent, socially prominent people were given the preference.
For fare rate and ticketclass can be expected more.
That is, a first ticketis likely to cost more than a third class ticket.
You can also consider thatfor a third class ticket, most of then would have been bought.
Nobody would have given them away.
Whereas in a first class ticket scenario, the first class passengerscenario people would have bought for other passengers,people would have given them for free, they mighthave gotten at a discount because hey, there's aprestigious politician and he's traveling on thefirst voyage of the Titanic.
You know, he might havejust paid nothing but like 10 pounds or 10 dollarsfor it to get on that boat.
This is something that we leave for you.
We suggest you go ahead anddo it and you should sort of see this sort of a plot for survival class of one, two and three and then the number of passengers in each.
Then you should do non-survivorclass ticket distribution like this is survivaland you see naturally, they are higher at first,but the disparities aren't much if you take it turn of survival.
But when you look at thenon-survivors, number three, these are the ones who suffered the most.
Overall, the effect ofclass on survivor rate, survival rates are betterif you're a first class 65%, the second class 47% and the third class would be 3%.
There's a reason why we haven'tshared the code for this because it is suggestedthat you go back and do it by yourself.
There is enough sample codefor you to kind of work on any of these graphs we areproducing any given graph.
(pleasant instrumental music) Let's talk about the firstapplication, web scraping.
Here's a problemstatement, Bob is an intern and he wants to have all thehyperlinks from a webpage.
All the links that arepresent on a webpage, he wants to collect all of those.
Let's suppose it's aninformation website and he needs that information for becausehis manager has asked him for that or hissuperior has asked for all the links or all the informationabout travel website, let's suppose that.
Now one way of course, thathe can go to the website, copy the link, open them one by one.
Just dull, like copypasting back and forth.
Yes, you can do that for ahundred websites, let's suppose, but you can do it a hundred times.
But then there are coupleof problems with that.
First, it will be slow.
It will be prone to error.
It's just like apart from other things, it's just a very boringthing to do, right, copy paste and copy pasting all the time.
So here is where Bob thinksthat he should be sort of smart and do efficient workrather than clerical work.
That's why he wants to writea program that will save all the hyperlinks froma specified webpage and he wants to write in Python.
Web scraping comes to the rescue.
Web scraping is a commonlyknown technique where you can extract large amount of data from website.
You can save it to localfile in your computer or to a database.
Now, what kind of data? This could be text data,this could be image data, this could be just collectionof links, as Bob says.
But usually it is a bunch of things.
So here is the fun fact,Google or any of the search engines that you use today,it works on web scraping.
So part of it is that the results are not generated immediately.
In the sense that Googleindexes or crawls and scrapes data from the entireweb, website by website, following link after the other.
It's like think about it likethis, it goes to Wikipedia.
Org it copies the text overthere and gives it as Wikipedia.
Then it finds all the 10, 20,100, 200 links over there, follows each one of those andkeeps doing this repeatedly over and over and over again.
Just by doing that, it hasmanaged to scrape or crawl.
So think of it like aspider as shown here, going from link to link.
It's actually called ascrawling, crawling and indexing.
So there a couple of librariesavailable in Python for this and these libraries are very advanced.
Advanced in terms ofthey're really ready to use.
You don't have to do alot of things to kind of get going with using these libraries.
The first one of theseis called Beautiful Soup.
It's a Python library forpulling data out of html and xml files.
Now what happens is thatif you go to any websites or let's suppose we go to Edureka.
If I do a right click andI can do something called view page source this will give me all the htmland css present on Edureka.
This is html css thatmakes this page this.
So let's suppose is wesearch for something like Python certification training,we can find it in this text.
Of course there is a lot ofother html css, a lot of things present over here, butyou can sort of map it to whatever you see over here.
The images, the amounts andeverything and everything.
The amount is present over here,right, all of these things.
Now, what happens in scrapingis that you programmatically download this entire page.
You get all of this data.
So one thing that I could,you know, right now select all of it, control Cand then go to pycharm and paste it over here.
So just take, it will take asecond, there's a lot of text.
See, so it's all text atthe end of the day, right.
A website is nothing but text.
One thing is that I could have done that.
The other is that I download using Python and then I search for data using Python.
So right now if I searchedfor data and I really read line by line line by line, andthat will take a lot of time.
I mean, it's better to justcopy straight from here then.
So what happens scraping isthat where Beautiful Soup comes in handy is that yougive it the html text document, which is nothing but verysimilar to a text file like Python file, and you ask it to search for certain things in it.
How you ask for it tosearch for those things we are going to learn that.
One other thing which isgoing to help us is Requests library in Python.
It is used to making http requests.
So the downloading part ofthe webpage, Requests library is going to help us out with that.
So let's look at the request library.
What we can do is thatwe can take a webpage.
This is a sample webpage, it'sfrom the Thomas Cook website.
Let's just open it.
It shows holiday packages ina place called Escaraline, India and it seems to be loading.
So we're going to take this url over here and request.
So the get method is available on this.
Get method simply downloadsthe entire html file, the response of this httprequest, which is a webpage.
It gives you a variable inreturn that we are storing in r and r.
Content is whereyou can see the contents of this webpage.
So let's run this fileand see what we get.
Okay, seems to be some issue.
Instead, let's try andsee if we can make it work on Edureka.
Okay, so we got Edureka'swebsite and there is some data or something available over here for sure.
This is what I just did, right.
Now we have it in a variable.
Let's move further.
Next, what we need to dois that we need to import Beautiful Soup library.
Now and please note, that youdon't import from Beautiful Soup, you import from the caller BS4.
That's just like theconventional for Beautiful Soup version four.
Now, what you do isthat you call something, you create it on call soup,you pass it the content, and you specify that youwant to use the html.
Parser, where you want to use it as an html.
This is a function called as prettify.
This will print a prettyversion of this webpage.
So let's try this out.
Okay, so we now have a somepretty version of the webpage.
What this does for us isthat it helps us in sort of analyzing particular thing,how the webpage is structured.
So we can do that in the browser as well, but maybe in Python aswell sometimes, you know, you might get a responseand you want to just have a look at it in the console.
We can see the structurein a very clean cut way.
We have already done thiswhere I've shown you how to view the html code of a webpage.
Now, let's suppose youwanted to get all the links.
Very simply what you'llsay is for web link and soup.
So we are going to usethe find all function and we're going to find all thetags which are in the a tag.
Next, we are just going to print web link and see what we get.
As you can see, we have gottenall the a links over here.
All the a links.
What we need to do nextis that we need to use the get method on this.
We just want to get the href attribute.
We don't want to get anything else.
We have all the links over here that are present in the href.
Now let's create an empty arrayto store all of these links.
So what we'll do is that wewill just call link start append and we startcollecting all of these and then we can just print.
So it's just we're goingto collect all of the neatly in an array anduse it for the processing.
So we got all the links in the array.
This is how you canbasically scrap the webpage.
Now of course you can get more information from the webpage as well.
Let's suppose you wanted to getall the prices for a course.
So what you do is that yousort of analyze the webpage.
You see what kind of data present.
So if you look at thisone, right, iconinr.
So anything which has aclass of iconinr seems to, you know it always carries the amount.
So let's confirm this.
Let's search for iconinr ina webpage again and again.
So wherever there is aninconinr there is some sort of a price being shown.
This could be a way to findthe prices for the various courses and store them.
So if you have ever seen anyof those price comparison websites, this is how they work.
What we need to do is that,let's have a look over here.
Or rather we need to selectthe after discount class because these are notthe discounted prices.
We want to get the non-discounted price, like a posted price.
Let's try to see if findall will help us out there.
We want to find afterdiscount and because this is a css class, so let's seehow we can select by class using Beautiful Soup.
It suggests that you usefind all and you pass it the element type and the class type.
What we did not pass is the element type.
So first is span andthen you need to give it class parameter like this, class_.
You're going to see thatit's going to be class after discount.
Let's see if we getany result out of this.
Okay, so we got something.
Now how to get the text out of this and how to get the course name.
(pleasant instrumental music) Let's go to application number two, data visualization on the browser.
Till now we have usedmatplotlib to visualize data in the Python window.
Let's look at how wevisualize data in the browser.
So Nate is a data analystwho wants to analyze data and create a visualization on the browser for presentation purposes.
Bokeh is a Python interactivevisualization library that basically producespresentations and visualizations for the web browser.
It is used widely as itallows for building complex statistical plots quickly using some very, very simple commands.
So let's see what it is all about.
First, you need to installBokeh, the other command is pip install bokeh.
Now, let's plot a verysimple plot using Bokeh.
I have something ready for you here.
Plotting importoutput file show and figure.
Now these are methodsavailable from Bokeh.
You create a dataset likethis and basically you create something called is a step using figure, where you set the widthand the height of the plot.
Then you create a circleusing the dataset x and y and then a size.
This would be the size of the circle.
You set out what file is colorchart and then you finally call them show method.
Let's use this.
Okay, so this is scatter charts.
Show method basicallyopens it up in the browser, the default browser in your system.
Let's try to change certain things.
So we had the dataset X,which is still one, two, three, four, five, andwe have the dataset five, which is still one to 25.
We have passed the dataset X and Y circle, now let's suppose the size of this was 50.
Let' see how it will change the output.
You can see the size of the ball.
This is the size of the circle.
So this is basically tellingit hey, draw a circle at this X and this Y of size this.
So as many points are present to X and Y it will plot all of them.
This is the plot width and the height.
Now that we're crossing the height, let's set it to be higher.
We can make it something like this.
Now you see it's a big giant plot.
All of this is being madein the browser using Python.
Let's suppose we wanted tobuild a scatter plot instead.
Now for this what youare going to do is that you are going to download adataset from this particular source, it's an xlsx.
It's just some sample data.
You can go back and have a look at it.
We're going to just read it through Excel.
You need to installthis library called XLRD before you can do this.
It's something that youshould install in case this is not running, itwill probably throw an error that XLRD was not found.
It's just pip install XLRD incase you run into an issue.
Next thing is that we takethis dataset and we take the temperature columnand we divide it by 10, just to sort of normalize it.
Next again, we create a plot figure.
We give it a text, color,area, all of these things.
These are just markers, okay.
Pretty self explanatory.
Text color, text font.
Because it's the browser,it allows you for a lot of flexibility that way.
Then you create the circle.
You set the size of the circle.
So let's run this.
This is the result that youwill get and while it may seem like you can't make toomuch sense out of it, just notice that all of theseare very individual points and these are being markedto the size of point one.
Now I could regenerate thegraph but I think it would take a lot of time.
But notice that it's a lot of points.
It's not just like youknow, 10, 100 points.
These are probably 20,30,000 points at least in the very least beingplotted, all across this graph.
At least keep that in mind.
(pleasant instrumental music) let's move on to the next topic.
So for the third applicationthat you're going to learn today, we need to firststart understanding how a computer reads an image.
You and I see this image, right here okay.
But we perceive it in a certain way, we look at it like buildingand sky and that's how a human being would react.
But this is skyline for a certain city.
There are lights, there arewindows, it's evening time.
All of those things.
But here is sort ofhow a computer sees it.
The computer sees each and everypixel, each and every point in the image as acombination of RGB values.
The RGB values contributeto what that pixel's color is like.
So each individual pixelis basically nothing but a number for the computer.
It's a set of numbers, allthe images a set of numbers, it's a two dimensional matrixthat you have been given by the computer.
Now the size of theimage would be B cross A, which is the height andthe width of the image.
So whenever you hear the word resolution, like one, zero, to four, cross 768.
So it's one zero to fourpixel columns and 768 rows.
In two three, why in two three? Because R, G, and B.
That is how the sizeof the image comes out because all these threedata points are present for every pixel.
The R, G, and B valuebecause that's how a computer shows an image.
So if you have ever noticedhow image looks different on different monitors.
So the same image can lookdifferent on different devices and screens.
It's because pixel initself, how does the screen or the device that you'relooking at the image on, how does it treat the R, G and B values? What is it able to dowith those three numbers? These are numbers, simplenumbers, like 255, 237, and 108, right, for R and for G and for B.
These are numbers.
How does it really manifest them? How does it paint that color? That is where all the sharpand contrast and quality and all of those things come in.
Because even for the sameresolution, what a laptop might show might be verydifferent from a full size desktop, right, with a bigscreen or a better screen.
So this is how the computerbasically look at an image.
But coming to OpenCV it's a library.
But it's not just a Python library.
Now you will find OpenCVis something that occurs in C and C++ and then allthe popular programming languages, Python and Matlab and Java and Dsharp and Dotnet.
They wrote libraries ontop of it, which are using the original CC++ code.
Why? Because the OpenDV libraryis super super powerful.
It has a huge community behind it.
There's a lot of things thatare developed under OpenCV, a lot of image processing that you see, a lot of things you seeon your cameras these days which recognize facedetection and everything, that is essentially,a lot of that is built on top of OpenCV.
So it's open computer vision essentially.
It's not just Python, otherJava, also for example, has a wrapper or a librarybuilt on top of OpenCV.
So let's look at a few basicoperations using OpenCV.
First one is to read an image.
The first thing you needto do is first of all, you will need to installOpenCV on your machine.
The command is pip install opencv-Python.
So let me just show it to you over here.
Rather, let me open a terminal.
So if you had to openCV-Python.
Now I'm not going to showit to you here because I've already had it installed.
Just one word of caution,this is a heavy library, so please make surethat you are in a stable and good internet connection,otherwise it might fail while you're trying to install it.
It's one of the heavier libraries.
Just make sure that you're ona stable internet connection before you try to install it.
Okay, going back to this.
Now, what you do is that you import cv2.
So inside Python iscalled, not called OpenCV, it's called CV2 and simply,if you want to read an image, you pass the name of the image, imread, followed by the the formatthat you want to read it in.
You can read it either asgray scale, which is basically reading it in black andwhite, ignoring the colors, or you can read it in a color format.
So let's read the image.
Right, so one is for coloredand zero is for gray scale.
Gray scale is somethingthat you will hear often.
Gray scale is nothingbut converting an image into black and white that we often see.
Now that you've read theimage, let's print the image as seen by the computer.
Let me anchor all of these.
Okay, let's run this file.
Okay, so nothing butthe sequence of numbers.
As we said, RGB, R, G, and B.
Let's look at the axis.
It has three dimensions.
Let's look at the head.
So we basically get a NumPyarray and dimensional array.
In this case it's a two dimensional array because height and width.
then each of the arrayshas a particular value.
So let's see what do we have in zero.
What is at the zero value? Let's look at the firstone, zero comma zero.
It's a two dimensional array, right.
Okay, these are the values right here.
This is one pixel being described.
Similarly we can havemultiple pixel values.
Different values of R, G, and B.
Now we have something called image.
Shape, which tells us of thedimension of the image.
This is available throughNumPy, not OpenCV.
We have just read the imageinto a NumPy array over here and image.
Shape is somethingthat is available to us in NumPy.
Now we know that it's 10681024 and three.
Three being number ofpoints that each particular position has.
Now we have something called imshow.
What imshow does is that itwill just open up a window and show the image.
You can give it the titlethat you want, Penguins image.
Now why is this important? It's important because oftenyou would need to manipulate the image and then seewhat was the outcome of the manipulation.
Before we do anymanipulations there is just, understand how we canshow and hide images.
If I don't do this, if you notice this, it will run but before I canclick on it it will disappear.
That's because we did notsomething called a cv2.
What this does is thatit wait key basically holds the upper for us,where we are saying that hey, so don't consider it zero.
We can basically pass itany key that is available on the keyboard, any letter or key, and we can say that hey,when this is pressed then sort of close the window.
So it waits on our key input.
If we don't give that, ifwe give the value of zero, then it will waitindefinitely for any key.
Zero is for any key, not for the zero.
So let's run this and see what we get.
So we got the penguinimage, but this has now been opened through Python.
This is not opened byyour regular image viewer, this is something that hasbeen opened through Python.
Next, what we can do is thatwe can resize this image.
What option we have availableto us is that, okay, let me remove this first.
We can take that original imageover here and you can call the resize method on it.
The second thing needs tobe the shape of the image.
So what we're going to do isthat we have the image shape as 768 right.
We just want to half it.
Let's call those two height and the width and divide it by two here.
Then let's show the image.
Should not show the resized image.
So as you can see, theimage is now resized.
Okay, this is very neat feature.
I was telling you thatthe computer views it as a pixel values.
If you notice, the more I zoomin to it, because I opened it in the Python windowand not the regular window.
You would not see thisin the regular window.
You can actually seewhat the pixel values are as you zoom in to a region.
So for white you can seethat almost all of them are equivalent for white regions.
As we go to something reallyreally white over here, you can even see thatit is showing up here.
So this is really, reallywhite, almost clearly white.
Snow white, right or water.
This may be water, butif you look at this, it's not, so if you lookat black, it's almost like all values are almost going to zero.
You can zoom in there, yeah.
There is also the writeoperation available to us where we can write theimage permanently to file.
Where I don't need to showthis or this or even this.
I can just resize the image and call this and let's run this.
We have the output here.
This is the resize image now.
It's half the width.
It's not 1024 anymore.
768 has also been halved to 384.
Next, let's look at facedetection using OpenCV in Python.
So here are the stepsrequired to do face detection using OpenCV.
The first thing that you'regoing to do is we're going to use something calledthe cascade classifier.
Now, the cascade classifieris basically your dataset, which previous researchers, and you know, programmers have developed before us, which kind of helps the computer, helps OpenCV in identifyinga face as a face.
Now of course there aremultiple kinds of classifiers depending on the logic butcascade classified is one of the most common and popular one.
This is going to be the step one.
This is going to trainthe computer essentially, to identify face as a face.
You could also think aboutthis as something related to machine learning becauseessentially the computer is going to learn throughthe cascade classifier what a human face looks like.
But we are not going to gointo the machine learning part of it.
For now we are just goingto understand that this is an application of machinelearning and that is where the cascade classifier is coming from.
Step number two is that ifyou're going to use OpenCV to read the image, you do a NumPy array.
Using the classifier we aregoing to determine the range of values, the pixel values,which are forming the face.
So if you look at thisstep three, we are drawing the rectangle that usedepicting camera allowed on Facebook, where itwill detect the faces and it would encircle that.
So the way this is going tohappen is that the cascade classifier and the NumPy array.
You're going to use themto detect which pixels in a given image are sortof approximately can be the outline of a human face.
Let's look at the code.
The first thing that I'mgoing to do I'm going to use the cascade classifier.
This comes straight outof cascade CV2 in itself.
This is present as a class.
It take an argument, whichis the path to an xml file, which is present over here.
So let me just open itfor you in a second.
This is a heavy file so itmight not make a lot of sense to you right now, which is okay.
You can think of thismore like training data by which you're teachingthe machine that hey, based on these kind of numbers,you can sort of classify a particular set of pixelsas a face or not a face.
The internals of how theywork have been utilized inside the library, OpenCV.
Of course you can look intoit, you can dive deeper into how this exactly works,but then you really need to think about if you want to.
This is how it works in theindustry where if something is working out of the boxfor us, where you know, we don't have to know the internals of it because the way it is workingout of the box is working really well, then wedon't dive deeper into it unless and until it stops working well.
So tomorrow if you want toidentify not human faces, but let's suppose facesof monkeys or birds or something else, then youmight start diving deeper into hey, how is this exactly working.
But if today we talk aboutit, hey we want to quickly build something whichwill identify human faces, what these libraries do is that I mean, we could spend to writea classifier by yourself for example, would take anywherebetween six to 12 months.
Now imagine if you had to dothat for writing us something which is so common these days, right.
That is where we don't read in deep.
It has already been done,we just start using it.
This is now saving theinstance returned by this class in the face cascade object.
Next I'm going to read the file, which is this particular file, photo.
To it is the same fileas is shown on the slide of this actress.
Let me see if I can zoom out.
As you can see, this is the image.
Next, I'm going to convertthis image into a gray scale image because that iswhen a classifier works somewhat better than witha non gray scale image.
Next, we can find thecoordinates of the faces by calling them detect multi-scalemethod on face cascade.
The first parameter is goingto be of the gray scale image.
Another parameter is the scale factor and then minimum neighbors.
Then we can just print the faces.
Let's see what value weget by printing the faces.
So if you see, I've gottencertain coordinates.
These are essentially pixels.
These are the four cornersthat have been given to us by the detect multi-scale method.
Now, the scale factor isbasically decreases the shape value by 5% until the face is found.
The smaller this value thegreater is the accuracy.
So what we are doing is thatif you look at this image over here, we are reducingthe value of the shape.
The value of the shape beingthe rectangular coordinates.
You're decreasing thevalue of the shape by 5%.
The reason to do this is that it gives us a greater accuracy.
Otherwise, let's supposeif we were not to do this, let me just note these values down.
We got these coordinates.
Let's try it out without a scale factor.
Okay, I think we need to go scale factor of greater than zero.
Let's give it a scale factor of zero one.
Now if you look at this, theaccuracy's not that great.
It has given us a range of possible values because it was only 1% reduction.
So the output is different this time.
The numbers are different.
Now it is not so sureabout where the face is.
The scale factor greatly, greatly matters because it will decrease thevalue of the shape that it is trying to detect by thepercentage to the percentage 5% of pause is coming from1.
05 until the face is found.
The smaller this value, thegreater is the accuracy.
But it will also take alarger amount of time.
Next, now that we havegotten the faces, the array, let's try to add the rectangular block.
For this, very simple.
We just iterate overwhat faces returns to us and we use CV2.
Rectanglemethod to draw a rectangle.
The way this works is thatyou pass in the image, you tell it that x,y is whereyou want to start off with and then x + w and x + x.
Basically, these are the othercoordinates of the rectangle and then you pass itthe color that you want and other things.
So RGB value of the rectangular outline and the width of the rectangle.
These form the x and the ycoordinate, the four points that you want to draw on.
Finally, we will resizethe image a little bit and we will show the image.
So let's take it for a run.
If you notice, we havedrawn a, and if you notice the coordinates, theywould be somewhere matching the ones that you got.
Let's try again with a increasedvalue of the scale factor.
Let's see what we get.
Right, so this time, if younotice that because it was so high, there's no horizontal bound.
We just got like okay,this is the vertical bound where the face might be in.
So the accuracy isn't that great with a larger percentage of reduction.
If we increase this to 1.
20, let's see what the output will be like.
Okay, we got slightly betterresults, but now there is no lower bound.
So of course, these are morelike picking parameters.
You can try out different valuesto get the kind of results that you want, depending on the image.
Of course, not everybody'sgoing to have the same kind of image set.
The dataset that you'regoing to get is going to vary in terms of the problem statement.
You might just be gettingvery simple data sets which might be just you know,the typical passport size photographs, which are nottoo difficult to deal with or you might be getting somethinga little more complicated like CCTV footage orsomething which is posted on social media.
So there are variousthings to play around with.
So it's not just that youwrite these five, six lines of code and you have writtensomething which is that easy.
I know it looks easy rightnow, but when you start dealing with the real worldscenario of applying it in actual application, thatis where you will be faced with certain practicalchallenges, where the image might not be that great or it mighthave certain other things.
It might have noise.
Because this time, ifyou look at this image, it's a perfect image.
It's a very good imageto do this analysis on.
But it's a very front-facing image.
But what is if the imagewas turned to the side? Next, let's talk abouthow capturing video works and using the computer web cam.
The way video captureworks is that it captures the images one by one andafter the other in sequence.
So because the images are put in sequence and they're continuousone after the other, they seem like video.
Now if you might rememberthe principle of physics called persistence of image,where an image is created data faster than 1/16 of a second, they appear to blink together.
So if you look at this right,actually these are four slides but if I'm repeating them fast enough, you will get a sense of asthat this particular person is moving, but they're just static images.
This is, for example,slide one, slide two, slide three, slide four.
But if I curse throughit really really fast, I can make it seem like a moving image while everything else remains set.
Keeping that principle inmind, you're going to write a loop which is going to giveus each of the individual frames of the image thatwe're looking to capture and using that, you wouldbe able to do some sort of image processing, wherewe will process the video in some format or the other.
The way a captured videoworks is that we first start with importing CV to CV.
Next, we call the videocapture method on it.
Now, we can either give itthe path to a video file or use numbers.
The numbers specify whichweb cam we would be using.
So zero stands for theprimary webcam that is present on your laptop or your desktop.
Typically, you can give it a value of zero or you can give it the pathof any of the video files that you have on your laptop.
Now, the next line of codehere is video.
This is very important toremember to kind of release the camera, because whenyou're capturing the video, it's sort of a CPU intensive operation.
It takes up a lot of CPUand a lot of computer's RAM.
So please make surethat when you're writing these programs, you'rereleasing the video properly.
Otherwise, if any of thescript or any of the code is running on yourlaptop or in the computer or in our server, it mightjust not get released and that might become a problem.
It might crash your server or your laptop.
Now, zero is for the built in camera, that is the primary camera.
If your external camera'sattached, you can put one to use that and if you havemultiple external cameras for some reason, you cankeep just adding the number one after the other.
Let's go ahead and executethe code and see what happens.
I have it written over here.
So just, so I imported CV2 anddoing a video capture.
Let me comment this and then run this.
See if it can actuallystart capturing the video.
So nothing was happened becausewe hadn't actually you know, done anything with the capture.
One thing that you wouldnotice if you're doing this by yourself is that thecam light will turn on if you're capturing the video.
Then it will turn off for a split second.
Do notice if a longer edition,let's go ahead and add time release in the time module.
This is just to confirmto yourself that hey, this Python code is actuallyable to turn on the camera on your device.
Now of course you cannotsee it because you will see that the camera light willturn on for three seconds now instead of just immediate, quickly turning on and turning off.
Now, let's do a little more interesting.
Let's add a window that shows the video.
So for this, what we're goingto do is we are going to call the video.
We're going to call the video.
What it returns to us isit returns two things.
One is a Boolean data type,which is true if Python is able to read the video capture.
So in case, let's supposethe camera is faulty or there's some problem withthe file that you're reading, this will of course return false.
This is a quick check tosee whether you have got the frame correctly or not.
The second one is a frame,which is nothing but an image.
So it is a NumPy array, itrepresents the first image that the video captures.
Let's print the check.
Sleep for threeseconds and then let's call video.
Let's run this.
So you can see that we got a NumPy array.
Now, yeah, it's 000, nottoo bothered by that, but so we don't carewhat the exact values.
It would be to presentin the image for sure, that it is captured.
This is the kind of outputthat we got, correct.
Next, what we can do isthat we can show the frame that is being captured.
How do we do this? For that, we can call the imshow method and we will name the windowscapturing video frame and we'll pass it the frame.
Why can't we pass it the frame, even if it's an image show method? Because the frame is nothing but a image.
It's a NumPy array we'representing an image.
You're also going toadd the wait key method so that now it doesn'tdisappear all by itself.
It waits for us to kindof release it instead of it getting shut off by itself.
You also need to call the CV2 dot to destroy all the windows.
Let's run this.
So the very first frameis going to be blank.
That's just how it typically tends to be.
Let's try and see if we can, let's see if we get adifferent result placing this over here and then goinginto the sleep mode.
Okay, still black, not to bother, but this is sort of the very first frame as the video camera is turning on.
So if you notice over here, there's a slight whitishness, right.
This is just from thecamera getting turned on but once we complete this entire project, we will be able to actually see the image that is being captured.
Now, the next question ishow to capture the video instead of the firstimage frame of the video because the first image frameis not very interesting.
It's all just black.
Let's try to capture the entire video.
So for this, we are goingto utilize the while loop.
We are going to actuallydo it each and every frame in this while loop.
So I have already prepared over here.
In the while loop we'regoing to check for the true condition, which is goingto make it run indefinite.
So why are we making it run indefinitely? Because we don't want to say that okay, the video will be three minutes in length or five minutes or fivehours in length, right.
We don't know that.
So we're just going to startwith hey, let's just run this and we are going to come out of the loop using something else.
We are going to get the frame over here, as you would till now.
Next, we are going tocover it in two gray scale.
Then we'll start showing this over here.
That is there becauseit's the entire project that we are building butlet us understand things piece by piece as well.
So we're going to add await key and then if the key is equal to, we're going toset it to let's suppose, q.
So if you press the q key youjust want to break out of it.
But this is how we are goingto break out of this loop.
Outside this while loop,you're gonna go ahead and do video.
Destroy all windows.
So let's try and run this.
Okay, now we are able tocapture it because, now this is inside the while loop.
It's running constantly and ifyou notice this is my finger and now it's not blank anymore.
So that was because it was a first frame and first frame isalways going to be black because we are turning it onand like in that split second it's just an image.
So it turns out to be blackand slowly it just starts capturing, gathering all the light.
That's just how the camera works, right.
Okay, let's quit it.
So I press the q key and it disappeared and the process finished.
We are going to try and usethis to build a motion detector.
Let's look at the problem statement.
Now the task is that giventhat you have a video stream from a webcam.
We have to detect motion onany movement in front of it and we have to return a graph.
It should contain how longthe human or the object was in front of the camera.
Each bar of the graph shouldrepresent the time span and each gap, like this,should represent when there was nobody in front of the camera.
Let's look at the logic.
What we are going to do isthat we're going to save the initial image in a frame.
We're going to convertinto Gaussian blur image, which is a more manipulableimage, it's just a more convenient way of doing it.
You're going to takethe frame of the object and convert it in to Gaussianblur and you're going to calculate the differencebetween the first and the second frame.
So we're going to do thisrepeatedly for each and every frame that we capture.
So each and every frame we will bringing each and every image.
We're going to do this overand over and over again.
Now, we are going to definea threshold to remove the shadows and other noises.
I'll show you how we will dothis, so in case, you know, there is a shadow orthere is some other thing like a flickering light,that shouldn't be counted.
Let's suppose you'recapturing a video stream through a CCTV camera and oneof the light gets flickered or it gets turned on or off.
But that shouldn't basicallybe counted as a motion.
That can be counted as amotion because we are working with the fact that anychanges in the frame.
So we are saying that hey, thisis and image and it changes in this image shouldbe counted as a motion.
That's the simple logicthat we are working with.
Now, if there is anychange in that due to light or you know sunlight or sunsetor something of that sort, like sort of image or change,we need to factor that out.
So that is how we define our threshold.
We're going to define theborders of the object.
We're going to add a rectangularbox around the object and calculate the time andobject appears and exits in the frame.
Another thing that we'regoing to do is that wherever there is a new object,you're going to identify any new object that appearsin front of the change in our frame.
We're going to identifyit and we're going to add a rectangular box around theobject and we're going to calculate the time when theobject appears and exits the frame.
So we have already done the part where we converted into gray scale.
From gray scale we canconvert it into Gaussian blur.
Let's do that.
I'm going to add this line over here.
Now convert it into Gaussian blur.
Next we also need tocheck for the first frame.
So we're going to definea first frame object outside is none.
So to make sure that wedon't trigger a change.
So if you notice, the firstframe was blank, right.
If it is a first frame, wemight get a motion detection just for turning the camera on.
Naturally, we don't want that.
Let's add the conditionthat if it is a first frame, let's change it to be thiscurrent frame and continue.
So we will do not count the first frame.
First frame of whenthe camera is turned on and the motion detection,this is going to ignore it.
This is basically us ignoring,ignoring the first black.
Okay, now let's gothrough it line by line.
Delta frame, now we're goingto calculate the absolute difference between the first frame.
Check the frame is just goingto give us the difference in the two images.
This gives us the differencein the two images.
This is the first frame andthere is the current frame.
Gray is the current frame.
Now the first frame will,of course, keep on changing from time to time.
We're going to, as you willnotice, as you go through the code, you're gonna changethis first frame as well, as you loop over the entire thing.
Next, we are going to builda threshold, as we mentioned.
Now, what this will dois that it will convert the difference value that isn't 32 black, which means that any differencevalue with less than 30 in this dataframe.
So we're going to pass it the dataframe.
Any difference of less than30 in intensity, it is going to convert it to black.
The difference greaterthan 30 will convert those pixels to white.
After that, we are going tocall the dilate method on it.
Dilate as the literal wordmeaning goes, it expands.
It will expand on the, it willexpand the particular areas that we have found on the threshold frame.
What the threshold framedoes it discounts for the shadows and the lightsflickering and all of those kind of things by minimizing the effect, by converting them to black.
What this will do is thatif there is a shadow it will basically convert it intoblack on a threshold frame.
What the next step, thisdilate frame is going to do is that it's going to zoominto those threshold frame, which has been refactored,which has been cleaned out of all the noise and the random signals.
Next we want to find outthe different contours in our image we do it usingthe cv2.
As you can see, cv2 hasa lot of these methods right out of the box.
You may wonder, how youwould do it if you were not so familiar with the library.
Okay, how is it that Ijust like, from my end, I just know all of these methods.
It's not something whichhappens over night, it's the same thing.
Like, you Google, you search,you read through the library documentation, you figure outhey, what does this method do, what does this method do and you figure your way around.
Now, I would basically saythat it is better that I leave this part up to you, where youexplore it out on your own.
In case you get stuck,please reach out to support.
They will definitely help youout in terms of understanding these lines individually.
But really, the betterway to do this is to read the documentation in depth, in detail, and try to understand it from there.
Play around with it andsee what you can get.
Next is that for anycontours which are less than you know, in the dimension,you will do the move them out.
So let's do that as well.
When we add a condition,we're going to get multiple contours because it cannot– Identify multiple thingsin a given image, right.
We are going to say that hey,if it is less than 1,000, we just continue on to the next thing.
We just continue on to the next thing.
Otherwise it will do builda rectangle for ourselves, where the cv2.
Rectangle isgoing to build a rectangle around the object that it effects.
So contour is a object.
Think about cnts as multiple objects.
Any object which isn'tthousand, ignore it.
Otherwise, identify it,place a rectangle around it, place a rectangle around it, as I showed to you in practice as well.
Next we are going to show themeasures that we're capturing.
We're going to show the different frames, so you can see what thedataframe looks like, what the threshold frame looks like and what the original frame looks like.
Let me run this and let's see what we get.
So what will happen isthat a bunch of frames will open up, if you notice.
This is the dataframe.
If you notice it hasa Gaussian blur to it, it is looking a little grainybecause it just makes it a little easier to process data that way.
This is just a grayed versionof the original frame.
This is the original color frame, this is what we are capturing.
This is the grayed out version of it and this is the delta version of it.
Let's look at the threshold frame.
Yeah, here it is.
So yeah, now this is themost interesting one.
So if you look at thethreshold frame right here, this is where it dilates,where we are converting it into black or white.
So it is just trying to find objects.
If you notice, thisover here is an object.
Now, I move my fingerin front of the camera.
This is my finger okay, right now.
Let's look at the original color frame and this one side by side.
So this is my finger or this is my hand.
Okay, this might be a better way to do it.
If you notice, thethreshold frame is counting with certain values, right.
So as we sort of saw overhere, it will take the delta when the motion is not happening.
So when my hand was staticit was looking white, but when I was moving it,if you notice that is when it was converting it to black or white.
It really is focused on the difference.
It is not about painting the object, but it's more aboutlooking at the difference on the contours and taking it from there.
As always, you have the wait key and we can destroy the windows.
Now, let's try to calculatethe time for which the object was in front of the camera.
For this what we're goingto do is that we're going to define a dataframewith column start and end at the very top of aprogram, so let's do that.
Let's take the dataframe over here, and next we want to set a status.
Status is going to be zeroinitially, because we are just starting recording and there is nothing.
The first data's going to be zero.
When the object is detected,that it when it is, you know the contour isnot, the contour is greater than thousand in length, inheight, in the contour area, you're going say the status has one.
We are also going to usesomething called status list.
What is a status list going to do? Status list is initially going to contain, it's just going to be areaof two items, none and none.
We are going to append to the status list every time after the for loop finishes, where if it detected,something of a motion in that particular frame, it will append.
This could be zero or thiscould be one for that frame.
Next, we're going totake the last two values of the status list array,because we are just concerned with the motion that happened.
We are not concernedwith the previous motion.
We are just to get the past motion.
So we're collecting all theprevious motions separately, but for this current scenariowe are just concerned with keeping the length ofthe area to be of two values, none and none.
We're going to slice it andyou're going to capture that.
Next, we're going to add a condition.
The condition would be thatyeah, so if the status list previously was one and now it is zero.
Or if it was previouslyzero and now it is one.
Then you will take this times array.
We're going to take this times array, let me define it over here, which is going to be an empty array first.
You're going to add thet at this particular time there was a motion thathappened, so either one.
This is basically continuingwith a status change from a previous frame to a current frame.
Then finally, we are going torecall all of these changes over here like this.
What we can do is finally,that we can iterate over the times that you havecollected and we can write it to a CSU file.
Let me take this for arun, that we have some data in our CSU file.
Right, so this is theoriginal color frame.
If you notice, my finger wouldbe recognized as an object.
Okay and yeah.
So this should give usenough data for movement.
Let's see what time.
It seems to empty.
Let's sort of run it onceagain and see what we can get.
Might happen some bug.
Let me try to move it a little more.
Let's see if we get something now.
So yeah, if you notice, nowwe have the time over here, the start and the end time in a times.
We have two instances of movementthat we captured quickly.
One is this and the other one is this.
We have actually detected motion.
Now we can actually goahead and plot this.
The video part of thecapturing part is done.
All we need to do now is just go ahead and plot this on a plot using Bokeh.
For this, we are going to useBokeh and we're converting the time to a stringformat, first and foremost.
So that is over here.
Let's first import Bokehand then convert the time.
Just so that's it's prettybecause if you notice over here, it's written like this, right.
It's not very human readable, it's converted into more readable format.
Next we are going to usethe column data source with a frame to pass it the data stream.
As we have been learning tillnow, we're just gonna use the same methods we havebeen learning about Bokeh and we're going to plot this.
So we're going to create afigure, x-axis is going to be datetime, height of 100, width 500.
The title will be motion graph.
So we're going to setthe text or the markings, we're going to set thetooltips, we grab the tools, and then you're going to takethe plot, mark the coordinates and going to set theoutput file as graph1.
Let's run this and see what we get.
So once I quite the camera, sonow we have got a Bokeh plot, as expected.
You have to quite the camera,please keep that in mind.
You can actually, I mean, runit in two different files.
You can just you know,because you are mapping it in time.
Csv, you could justdo it in different files, just get times.
Csv firstand then run it over here.
But if you notice, now we'vegot the start and the end and it says okay, for30 seconds or 35 seconds we got this motion.
We have our file overhere as well, graph1.
Of course you will not getan exactly same graphs.
I mean, it depends on whatkind of a motion you do I front of the camera.
Thank you, thank you so much.
I hope you haveenjoyed listening to this video.
Please be kind enough tolike it and you can comment any of your doubts andqueries and we will reply them at the earliest.
Do look out for morevideos in our playlist and subscribe to edureka!Channel to learn more.