Making a computer to understand cartoons

Making a computer to understand cartoons

A model that can classify tom and jerry from Warner's brother animation

pexels-fauxels-3184285.jpg

For Who?

via GIPHY

Let's say you are bored and want something fun to do in your free time...

And you are in the right place

As the title says

we are going to build an AI model that can identify whether it's seeing tom or jerry

~ Like This :

Screenshot 2022-09-07 000205.png

Let's jump right in

📝 if you want to follow along with the code just click me (づ ̄ 3 ̄)づ

via GIPHY

Loading Data

For us to teach a computer who is Tom or jerry we need a bunch of pictures of tom and jerry first

Now the easiest way we can get a lot of pictures is Google, but it's going to be a little tedious to do it from scratch...

So here comes our angel

Tom and jerry Kaggle DataSets

Changing the image data into numbers

alexander-sinn-KgLtFCgfC28-unsplash.jpg

Let's change our images into numbers so our AI model can learn their patterns

If you don't know what I'm talking about .. no problemo just click me (〃` 3′〃)

Code :

import tensorflow as tf
train_df= tf.keras.utils.image_dataset_from_directory(dir,color_mode="rgb",
                                                              validation_split=0.2,
                                                              seed=42,
                                                              subset="training",
                                                              image_size=(224,224))

valid_df = tf.keras.preprocessing.image_dataset_from_directory(dir,image_size=(224,224),
                                                               validation_split=0.2,seed=42,
                                                               subset="validation",color_mode="rgb")
train_df,valid_df

Visualize Some Images

Screenshot 2022-09-07 002318.png

Screenshot 2022-09-07 002410.png

Screenshot 2022-09-07 002823.png

Let's Create an AI model

pexels-kindel-media-8566428.jpg

Code :

base_model = tf.keras.Sequential([
                             tf.keras.layers.Conv2D(10,5,activation="relu",input_shape=(224,224,3)),
                             tf.keras.layers.Conv2D(10,5,activation="relu"),
                             tf.keras.layers.MaxPool2D(pool_size=2,padding='valid'),
                             tf.keras.layers.Conv2D(10,kernel_size=5,activation="relu"),
                             tf.keras.layers.Conv2D(10,kernel_size=5,activation="relu"),
                             tf.keras.layers.MaxPool2D(),
                             tf.keras.layers.Flatten(),
                             tf.keras.layers.Dense(100,activation="relu"),
                             tf.keras.layers.Dense(len(class_names),activation="softmax")])


base_model.compile(loss="sparse_categorical_crossentropy",
              optimizer = tf.keras.optimizers.Adam(),
              metrics=["accuracy"])
base_model_history = base_model.fit(train_data,epochs=3,steps_per_epoch=len(train_data),
                                    validation_data=valid_data,validation_steps=len(valid_data))

It's a base model that we build which is only learned our images without any knowledge of what a face might look like

So for solving this problem we are gonna use a pre-learned model which as the name describes .. already knows what a face will look like ... like eyes, ears, nose ... so on...

How can we use them with a help of Transfer Learning

Now if you have no idea of what it is click me (@^0^@)/

Final Model

Code :

base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable=False
In = tf.keras.layers.Input(shape=(224,224,3))
Data_Aug = data_aug(In)
x = base_model(Data_Aug)
pool = tf.keras.layers.GlobalAveragePooling2D()(x)
output = tf.keras.layers.Dense(len(class_names),activation="softmax")(pool)
model_1 = tf.keras.Model(In,output)
model_1.summary()

Output :

Screenshot 2022-09-07 005031.png

Now As you can see you need to focus on two main things

  • Number of layers

  • And how many layers is trainable

  • And how many layers are non-trainable

Now let's unfreeze some of the pre-learned layers so we can use them to train our own image

Code :

model_1.layers[1].trainable = True
for layer in base_model.layers[:-10]:
  layer.trainable=False

Now let's train our model:

Code :

model_1.compile(loss = "sparse_categorical_crossentropy",
                optimizer="adam",
                metrics="accuracy")
model_1_history = model_1.fit(train_data,epochs=5,steps_per_epoch=len(train_data),
                              validation_data=valid_data,validation_steps=len(valid_data))

Prediction (final round)

Let's check our final model predictions

Screenshot 2022-09-07 011054.png

Screenshot 2022-09-07 011233.png

Screenshot 2022-09-07 011715.png

Screenshot 2022-09-07 011306.png

Screenshot 2022-09-07 011441.png

Conclusion🔥

As we can see our AI model is really 🤯 good at finding out whether a given photo contains tom or jerry or even both

With just a small amount of work🐱‍🐉 ... just imagine if you want to program this whole thing in the traditional programming method ...

it's gonna take a really long long long... time📝😑

But with a help of AI, we don't really want to do any hard work...🥳 it's mostly taken care of by our wonderful Computers💻

Well anyways thank you for your time ☜(⌒▽⌒)☞

All the Code: Click me

Linkedin: ◑﹏◐