We all know how smoking is bad for our health ๐ญ Right? But How bad is it? And can we understand who is a smoker and who is not a smoker by just looking at the person's body informational data?
Smoking Problem
Problems With Smoking
let's take a look at 4 significant problems it can cause to our bodies๐ฌ:
- coronary heart disease(Coronary heart disease is a type of heart disease where the arteries of the heart cannot deliver enough oxygen-rich blood to the heart).
- heart attack.
- stroke.
- peripheral vascular disease (damaged blood vessels)
- cerebrovascular disease (damaged arteries that supply blood to your brain)
So yeah as we can see it affects most of the important parts of our body like ๐๐ง .
It's a big deal if we can help in some way to solve this issue or just Identify who is smoking and not smoking quickly by just seeing their body functions
Now how can we do that?
AI intro ๐จโ๐ป
Why do we need to use computers to solve this problem... and why not humans?
You see computers are faster than humans so why not utilize that.
But the catch is we are teaching the computer
to learn the pattern by itself...
it's really weird when you hear this first time. how can you teach a computer to learn by itself?..right.
But as it turned out you can teach a computer to learn anything in this world ... I MEAN ANYTHING !!
Then the next question will be HOW?
In one word
By using the Data
Now what is data ?... great question
its nothing but a collection of information for eg :
- Online Tracking (GPS collecting data from you)
- Social Media Monitoring
And so on
Well let's say you have data like this:
Here there are two dotted lines (yellow and purple)
What if I tell you to divide this data into two categories ...
Well it's easy as you can see we can just do this
And there you go you just did a Binary classification
But a slight difference is you did it with Your Brain
~ But how can we teach a computer to do this? And What is Binary Classification?
Wow, I think you are being fired with lots of questions in your brain right now...
That's a good thing... Let's Jump Right in...๐ฆ :
Binary Classification 0๏ธโฃ1๏ธโฃ
Do you ever decided one thing over another
That's what Binary Decision would look like and also feels like
Its nothing but 1s and 0s This or That, On or Off, Non-smoker or Smoker
But in the machine learning field its called Binary Classification
Categorizing Two things
What would it look like if I teach the machine to solve this problem
And Another Example:
Model Prediction:
As you can see my model can classify this data easily With a help of Neural Network
Code :
import tensorflow as tf
model_1 = tf.keras.Sequential([tf.keras.layers.Dense(10,activation="relu"),
tf.keras.layers.Dense(10,activation="relu"),
tf.keras.layers.Dense(1,activation="sigmoid")])
model_1.compile(loss="BinaryCrossentropy",
optimizer="adam",
metrics="accuracy")
history_1 = model_1.fit(x,y,epochs=50,validation_split=0.2)
In Image
It may look complicated if you are seeing this for the first time
Trust me it's easier when you understand the fundamentals of our code
Elon Musk Quote
"I tend to approach things from a physics framework"
you boil things down to the most fundamental truths โฆ and then reason up from there
If you want to learn more about the code and what's happening behind the scene ...just let me know๐ฑโ๐
Now let's solve our problem :
We need to find whether a person is smoking or not by using BinaryClassification
Lets Find the Smoker ๐ฑโ๐ค
The Features(The data we are going to take into consideration ) that we going to use is:
Most of the things we already know of... But the other unknown features are:
- HDL: cholesterol type
- LDL: cholesterol type
- Triglycerides: it's the main constituents of body fat in human
- serum creatinine: The amount of creatinine in your blood should be relatively stable. An increased level of creatinine may be a sign of poor kidney function
- AST: glutamic oxaloacetic transaminase type
- ALT: glutamic oxaloacetic transaminase type
- GTP: Guanosine-5'-triphosphate is a purine nucleoside triphosphate. It is one of the building blocks needed for the synthesis of RNA during the transcription process
- oral: Oral Examination status
Now we have somewhat important features in finding whether a person is smoking or not
Let's build a model and name him Robin
Code :
from sklearn.ensemble import RandomForestClassifier
Robin= RandomForestClassifier(n_estimators=2000)
Robin.fit(x_train_pre,y_train)
Robin.score(x_train_pre,y_train)
Output :
1.0
And there we go that's it that's our Robin model
And he has an accuracy of 1.0 (Which 100%
) on our training data
To finalize our model accuracy let's check this on data that it had never seen before Code:
model_2.score(x_test_pre,y_test)
Output:
0.8286201633898914
It identified 10 out of 8 times correctly whether a person is a smoker or non-smoker
Which is really good (not great but good).
And let's see what are the features our model used while its training process:
As you can see it used most of our features to make an assumption.
This is What a BinaryClassification model looks like in a nutshell
.
I used the word nutshell becoz I shortened a lot of processes that go behind this
Let me know if you want to know more about the coding part and what is happening in the background
Bye TakeCare (โ๏พใฎ๏พ)โ
The above code is available on my Github@justclickhere๐
My Linkedin profile:@justclickhere๐
What if we use AI to predict the housing price @justclickhere