Making a computer understand images💻
making a basic image classification model with neural networks
What are Images in the digital world Anyway?
A bunch of Red, Blue, and Green lights
Now this is what we call pixels
So now the questions are :
- how many pixels do I have on my device?
- why only use 3 colors to represent an image?
- why do I even have to know this?
feel free to skip these questions if you know the answers 🐱🏍
let's go through them one by one
How many pixels do I have on my device? 🐓
it's actually not that complicated to find out
if your screen's resolution is HD i.e 1280×720 In which 1280 denotes the pixels along the taller side of the screen i.e length and 720 denotes the pixels along the shorter side of the screen i.e width
Then find the product of the resolution
1280×720=921600 pixels
And wow
first of all, did you see that!! Maths is fun when you know why you do it in the first place
And Another wow
for the number of pixels that we have on our devices 📱(which is, in this case, almost close to 1 million pixels wow
)
The important thing to notice is it's just for 720p display which is nowadays a low budget device
😂
Why Use only three colors to represent an image? 🔴🟢🔵
In a single word
The red, blue, and green lights can make all other colors in our visible spectrum
By just combing the color we can get most of the colors that we are using in our day-to-day life
If you want to dive in more on this topic @howstuffwork
Why do you even have to know this? 🐱👓🤷♂️
For understanding the problem we are going to cover in the upcoming sections
And also it's so interesting to know how the technology we use day-to-day in our life works
The thing to note is:
This is how I approach a problem🐣
"tend to approach things from a physics framework"
you boil things down to the most fundamental truths you can imagine🌌 … and then reason up from there🙌
After reading this line it gives you a basic understanding of how to learn?
which itself Big topic🤷♂️
The key takeaway from all these questions are:
- now you know how to calculate the number of pixels in a given screen with just the resolution
- The Basic use of RGB color's in our pixels
Now let's get into the most fun part 🐱👤(AI)
Teaching AI To learn patterns from the Basic
images and predict them when new data comes in
We are going to cover a lot of things in this blog so let's don't waste any time and jump right into it 🦘
First thing to understand is what is a multiclass classification problem?
A really great question ...
What is classification in the first place?
@what an ai can learn from 1 and 0's in this blog I wrote about binary classification.. which is one type of classification (this or that, on or off, smoker or non-smoker
)Which is 2 class
Building upon that ... the word "multiclass" will describe itself (Multiple classes to predict on)More than 2 classes
Now let's go through the workflow we are going to cover in this problem
The things we are going to cover:
- how to load our image into the environment we are going to train our model? 🔃
- splitting our data into two parts training and testing 🪓
- Normalize the Data 0️⃣1️⃣
- Create our Nemo AI 🤖(our ai model name)
- Plotting the confusion matrix📊
- Predicting an image with our Nemo🤖
Don't worry if you don't know most of the things 👀, I can say 90 % sure that you will understand all the things I mentioned above after going through this blog
Loading the Data
First and foremost we need data to teach/train our model(Which is an image for this problem)
We are going to make use of TensorFlow.Keras.datasets
library
What is Tensorflow ? ...
TensorFlow is a free and open-source software library for machine learning and artificial intelligence
Next question .. what is Keras? ...
Keras is a high-level neural network library that runs on top of TensorFlow
In that library we are going to make utilize the datasets
module which gives a dataset for our problem
Code :
#Loading the data from tensorflow.keras
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist
What is Fashion_mnist ?..
Damn your firing all the good questions into this blog 🎯 Which is a Greate thing to do 🙌
Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples.
splitting the data 🪓
Our Nemo AI Needs to perform on the two different sets of data
Training Set: which is used to Train the Nemo(model)
Testing Set: which is used to Test our Nemo(model)
Code :
(x_train,x_label),(y_train,y_label) = fashion_mnist.load_data()
Here x_train,x_label
is training data and y_train,y_label
is test data
Classes We are going to Train our Nemo(AI model)
Classes are nothing but dress and shoe categories like i.e : "Coat","Sandal","Bag",..etc...
code :
class_names = ["T-shirt_or_top","Trouser","Pullover","Dress","Coat","Sandal","Shirt","Sneaker","Bag","Ankle_boot"]
Visualizing the images
code :
import matplotlib.pyplot as plt
index = 6
random = np.random.randint(0,1000)
plt.figure(figsize=(20,17))
for i in range(index):
random = np.random.randint(0,1000)
ax = plt.subplot(4,3,i+1)
ax.imshow(x_train[random])
plt.title(class_names[x_label[random]])
plt.axis(False);
With a help of matplotlib and numpy library we can see what the images look like
Here we can see what our images look like
Remember its only 6 images out of a training set of 60,000
our Nemo(model)🤖 is going to learn from 60,000 images just like this
let's normalize the data
what is normalization ?.... 🐓
Assume that you have data of x and y
x is one 1️⃣ and zero 0️⃣ whereas y is 10,000 to 20,000... here if we used this data to train our model ...
maybe our model is going to get a biased thought that may be higher the number higher the priority which is not good for our data which is a bunch of pixel values ranging from (0 to 255)
Images are stored in the form of a matrix of numbers in a computer where these numbers are known as pixel values.
These pixel values represent the intensity of each pixel.
0 represents black and 255 represents white.
So let's change our range 0 to 255
into0 to 1
code :
x_train_norm = x_train/255
y_train_norm = y_train/255
Creating our Nemo(AI) 🤖
Nemo = tf.keras.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(100,activation="relu"),
tf.keras.layers.Dense(100,activation="relu"),
tf.keras.layers.Dense(100,activation="relu"),
tf.keras.layers.Dense(50,activation="relu"),
tf.keras.layers.Dense(10,activation="softmax")])
Nemo.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
history = Nemo.fit(x_train_norm,x_label,epochs=20,validation_split=0.2)
With a help of TensorFlow.Keras Sequential
API, we can create a dense neural network model
The above code👨💻 is literally the brain of our AI 🤖
Which is going to understand the pattern between all the images and use them to predict the unknown images in the same classes
After Training our Nemo model reached 89% accuracy in the training images
Our model confusion matrix
Here this is what we call the confusion matrix...what are the things my model confused to predict
For eg: we can see our Nemo is mostly confused with shirts and t-shirts
Where y is the actual value/classes and x is the predicted value/classes
The contrasted diagonal line is are what our Nemo model got right and Its actually right
Which is mostly correct for all the classes
predicting with Nemo model
As you can see our model is correct with a 100% Confidence level except for the one on the Down left corner ...
our model accuracy is 89% which means out of 10, 8 times it predicts the correct class to the correct image
Which is great for our fun project🐿
To summarize :
We just created an AI Model(Nemo🤖) which can tell the image whether it's a dress or shoe or handbag and so on...
In other words, it can now classify the images into multiple categories
The Machine Learning field is a wonderful thing 🔥 ...
we taught the AI to see the image and predict it into the right categories it belongs to...
Which is not possible like 50 years ago 🤯.
Just think about that ... 🐓
We covered so many things on this blog all at once if you can't understand 😵...don't worry.
I may Explain this even in-depth in the future
Thanks for reading :)
Bye ☜(⌒▽⌒)☞
- The above code is in my GitHub @sriram403👨💻
- My LinkedIn profile @sriram