Coding Challenge #153: Interactive Drawing with Machine Learning Model (SketchRNN)

Coding Challenge #153: Interactive Drawing with Machine Learning Model (SketchRNN)

[WHISTLES] Hello, and welcome to
a coding challenge– interactive drawing
with Sketch RNN. And maybe you’re watching this
and thinking to yourself, huh? What’s Sketch RNN? Then you’re in the
right place, because I will explain to you a bit
about what Sketch RNN is and provide you with
links in the description to a lot of background
material if you want to do a deeper dive
into the machine learning model that is Sketch RNN. But what I’m going to build in
this video that you’re watching is my own version of
this exact project. So this is a project
on the Magenta website. Magenta is a project
from Google that is around creativity and AI. There’s a lot of music examples
with the Magenta project. And what you’re seeing here
is the AI, so to speak– it’s really a machine learning
model making predictions– drawing a cat. And I can hit Clear here and
I can begin drawing the cat. Like I could just
stop there, and it’s going to try to fill in
the rest of the cat for me. Let’s see. If I try to draw one like
this with like a sort of body and like a tail, we can
sort of see what happens. So this is the– I’ve used Sketch RNN
before, because I can generate a drawing from
nothing with Sketch RNN. But what I want to do in this
video is create something where the person using the
computer draws with a mouse– but you can imagine all sorts
of interface interaction ideas that you could do– and then has– the machine
learning model takes over and finishes the drawing. And you can see there are
quite a few other models. Sketch RNN– the Sketch
RNN model isn’t one model. It’s actually a collection
of many models based on these categories,
because the data that was used to train the
machine learning model is from a project
called Quick, Draw! So Quick, Draw! is a game that you can play,
also from Google, where the website prompts
you to draw something, and then it tries to like
guess to see if you’re drawing the correct thing. And you people
playing that game, Google collected
all those drawings. And there’s lot of
interesting questions around the data set itself, but
it is an open source data set. There’s 50 million drawings. I think there are
384 categories. I’m not 100% sure about that. And so that’s the data set
that it was trained on. The kind of machine
learning model architecture is something called a
recurrent neural network. This is a wonderful
article that I read that really taught
me a lot about how recurrent neural networks work. It’s quite technical,
but also pretty friendly, and uses some nice examples to
describe how they’re working. But you can also read
the original paper by David Ha and Douglas
Eck, researchers at Google Brain, that describe
the Sketch RNN model, how it was trained, and all
of the details behind it, including the real lower-level
machine learning math details. You can also read this
excellent blog post on the Magenta blog called
“Draw Together with a Neural Network,” which mentions
other collaborators and gives you more details also
about how Sketch RNN works. Guess what, though? I’m going to start
coding now, because one of the projects that I work
on, which is an open source library for machine
learning called ml5js, is built on top of
TensorFlow.js, which is Google’s open
source JavaScript version of TensorFlow machine
learning open source library. And ml5 also includes the
Sketch RNN model as part of it. So if I go here to Reference,
I can find the Sketch RNN page and read a bit more about Sketch
RNN and get some starter code. First thing that I
need to do if I’m going to use the ml5 library
is make sure to import it into my p5 sketch. So on the Getting
Started page, there’s actually a p5 web editor
template I could click on. But I’m actually just going
to copy this reference. And guess what? New version of ml5
out today, 0.4.0. Lots more to say about that
in other videos to come. But I’m going to go back to
my sketch in the web editor. I’m going to click over
to find index.html, and I’m going to add it
as one of the libraries right up here in index.html,
click Save, and go back to sketch.js, and
I’m ready to code. [BELL RINGS] I’m going to add the
preload function in order to load the Sketch RNN model. So I’ll make a variable. I’ll call it sketchRNN. Then I’ll say sketchRNN equals– and I’ll go back
to the reference– ml5.sketchRNN.model
and callback. The model is a string that
indicates what category. Was is the thing
that I want to draw. And I can actually find a
list of all of them here, but I’m going to
start with just cat. And I don’t need a
callback because I’m loading the model in preload. So I can assume that once I get
to setup, the model is loaded. And let me run it. No errors. Good sign. Model is loaded. Sketch RNN initialized. Using Sketch RNN is
actually as easy as just calling a single function,
the generate function. I could say in setup
sketchRNN.generate. And then all I need to
do is give it a callback. In the callback, I’m going
to say got path stroke. I’m not sure what to call this. I’m going to call it
StrokePath, gotStrokePath. And I’m going to write
this function, function gotStrokePath, that receives two
arguments– an error, in case something went wrong, and then
an actual strokePath object. Or I should maybe
call this results. I don’t know. But I’m going to
call it strokePath. And I’m just going to say
console.log strokePath. Let’s run this and see
if anything comes out in the console. I love it when things work. It’s like so rare in coding. This is now the
foundation upon which everything else that I do in
this example will be built. So let me unpack this
for you for a second. A recurrent neural
network is a kind of machine learning model that
deals with sequential data. That could be text, like
a sequence of characters– R, A, I, N, B, O,
W. It could be text, like a sequence of words– choo, followed by chew. It could be music. Maybe a sequence is a melody of
notes and rhythms, a sequence. Each one of these
units in a sequence, you could think of as the state. So in this case the
state is very simple. It’s a single character. Here it’s maybe, with words,
a little more complex. And, certainly, musical
notes, the state might have which note is it,
what’s the amount of time, the duration of that note. Drawing– a drawing
can also be thought of as a sequence– draw, draw,
draw, draw, draw, draw, draw. This is a sequence of
vector paths or stroke paths as I’m calling them. Strokes– their
strokes of a pen. And you’ll notice back
here in the code there’s a dx, a dy, and a pen status. So all of the states, each
element of the sequence involves a vector path, a
change in x, a change in y, and whether the
pen was down or up. Is the pen down? Is the pen up? And there’s actually
a third state, which is is the drawing
completed, which is end. So if I– all I need
to do is figure out a way to say, like, OK,
you gave me this state. Now I’m going to take that
data and visualize it. Interestingly enough,
I’m going to visualize it in a very literal form by
drawing the path according to that vector in a canvas. But maybe that state could be
translated into music or words or some other kind of media. How could you create an
audio version of Sketch RNN? That would be something
to think about. There is a tricky thing
going on here, which is when do I choose to draw? Because if you’ve
used p5 before, you know that
there’s this function draw that you’re asked to
write which is always looping. That’s the animation loop. And, generally, that’s what
you want to do your drawing. But the stroke, the
data for the stroke, has to come back
in this callback. So I could get rid
of the draw loop and do some of my own timing
stuff and draw in here, but I think what
I’m going to do, I’m going to create a
variable called currentStroke. And I’m going to– whenever
I get a new stroke, I’m going to stay currentStroke
equals strokePath. Just going to set– I’m going to get
the data coming in and set it to a global variable. Then in draw, I’m going to
ask if currentStroke exists– I’m actually going
to say if it’s– yes, I’m going to
say if currentStroke. If it exists, then I
want to draw a line from some value x, y to x
plus currentStroke.dx, y plus currentStroke.dy. So, in other words, I
need some new data here. I need to have this idea of
where is the current pen. And this is up to me. This is not part of the model. The model is just telling me
relative directions to go. So I’m going to create my
own variable called x and y. In setup, let me initialize
it to just the center of the canvas. Let me just fill
in the background with a white background,
and then draw this line. Let’s say a stroke
0, strokeWeight 4 so it’s a little bit thicker. And let’s run this now. There it is. My cat. Meow. My drawing, of course, has
stopped, because I only asked– the generate function
just gives me what is essentially the
next path, the next vector. So once I have that, I need
to ask for another one. So there’s a bunch of different
ways I could implement this, but for me the logic
is such that setup is going to ask for the first one. Then I’m going to receive
it in the callback, draw it. And then right here,
after I’ve drawn it, let me ask for the next one. So I could just do
exactly this again, but what I want to also do, if
I’m asking for the next one, let me set currentStroke
to nothing again. Let me just sort of
clear that variable. So the draw might
continue to loop, but it won’t continue to draw
that same stroke over and over again until a new one comes in
and fills into that variable. So I think now, if I run
this, we’re going to see– ah, OK. Wah, look at this cat. Meow, meow, meow, meow,
meow, meow, meow, meow, meow. I drew the vector path
but didn’t move the pen to the next spot. So I need to say x plus
equals currentStroke.dx. And the same thing, y plus
equals currentStroke.dy. Let’s try that again. Whoa, there we go. Oh, it’s a cat. OK. So I can’t erase the
background and draw. And there we go. Now, this doesn’t look
exactly right, right? The thing that’s missing here
is I haven’t dealt with the pen. I really should only
be drawing the line. I always want to
move the x and y, but only if
currentStroke.pen is down do I want to actually
draw that line. Let’s try this now. Hmm. Something’s off. I know what the problem is here. I’m having a bit of
a sense of deja vu because I went through this
in the snowflake video. But if this sequence where
every single state has a dx, dy, and a pen state, the
pen state is actually describing what you should
be doing for the next stroke. It’s a little bit weird
but it’s off by 1. I suppose that’s because
the drawing always starts with the pen down. And, also, there’s
a pen state of end. So when you get a
dx, dy, you do that, and then the next thing is end. So this value that
comes back in the pen is actually for the next state. So I need to think about
this in a more clever way. So I’m going to have a
separate variable that keeps track of nextPen. And it’s going to
start with down. Then what we say is if nextPen
is down, which it will be, then draw the line. And then nextPen equals
currentStroke.pen. So I’ll save it
for the next time around, and then always
pick it up again. And I’ll obviously stop
if the nextPen is end. So I can say something
like if nextPen equals end, I could say noLoop, and return. This will just
kill the p5 sketch. It will stop. It kind of looks
like a cat, right? [BELL RINGS] Now I have this cat, and it is
the correct generated drawing, a duplicate of what I did in the
snowflake Sketch RNN challenge. But here I am ready to add
the next component, which is a person coming here
drawing their own starter path, and then having
Sketch RNN take over. How would I do that? So one of the things
that I have to revisit, I need to revisit the state. So any given moment
of the drawing is a dx, a dy, and a pen state. So I need to collect
a sequence of those from the person who is drawing. One way I can do
that is I want to– the drawing, I’m not going to
be too sophisticated about this. I’m going to have the user
start the drawing when they click the mouse, and stop
when they release the mouse. So I basically want two events
that are tied to the canvas. So I’m going to store
the canvas in a variable. And I’m going to say
canvas.mousePressed startDrawing, and
canvas.mousePre– mouseReleased. I want to say finished
drawing, but it’s not finished. There’s Sketch RNN. I can say sketchRNNStart. So I no longer want to call
generate right here in setup. I’m not going to
start generating. I’m going to first collect
the data from the user. And function startDrawing–
presumably right here is where I’m going
to start generating the drawing, sketchRNNStart. So I need a– what I’m going to call
this is the seedPath. It’s what I’m seeding the
machine learning model with. So seedPath is an array. And I’m going to have
a variable called personDrawing, which is false. And as soon as I– as soon as startDrawing happens,
personDrawing will go to true. Because in draw I’m going
to say if personDrawing, I want to collect those states. So what are the states? The strokePath is an object
which has a dx, a dy, and a pen state. Well, the pen is always
going to be down. Again, I could do something
more sophisticated where I could have
an interaction, that the person, the user,
could actually draw, stop, pick up, do different things,
and then have Sketch RNN know how to take over. But, by definition, the
way I’m building this is when the mouse is released
Sketch RNN takes over. So the pen is always down. And dx is– I could use built-in
variables of p5. It stores the current
mouse position minus the previous
mouse position. So this is actually
really easy to do in p5 because I have these
values already. So the difference
between the current mouse and the previous mouse, dx,
dy, and the pen is down. Then I can say
seedPath.push strokePath. And then when the mouse is
released, sketchRNNStart. personDrawing is false. Let’s give this a try. OK, big problem. I don’t see what I’m drawing. [SOMETHING BUZZES] That would be nice. Oh, it drew a cat as soon
as I released the mouse. So I need to add something in
draw which does the following. Hmm. I guess I just want to draw– I don’t want to do
exactly what I did here. So universally let’s set
stroke 0, strokeWeight 4. And let’s just take this
line function, put it here. And I want to draw x,
y, x plus strokePath. And then I want to
say the same thing. I want to do this. So, again, there might be a way
to consolidate this code and– but there it is. So this now, at least I
should see what I’m drawing. Whoa. OK, that’s weird. Oh, it’s drawing
everything relative to the mouse relative
to the center. That’s not good. Aha. So the first point that I’m
drawing, moo, is actually– aha. So x and y don’t get
initialized in the center. Of course, of course, of course. x and y get initialized
when I start drawing wherever the mouse is. That should fix this problem. Here’s my cat. Now continue drawing my cat. Weird. Wait. I drew the circle. I drew the cat’s face already. So it’s picking up
where I left off but it seems to be
starting the drawing over. Why? Because I never told the
model, the Sketch RNN model, about my seed strokes
in the first place. I still just call
sketchRNN.generate that first time. But guess what? The generate function can
take as a additional optional argument an array of states
that are fed into the model. And I have those
already in seedPath. Is that what I called it? So now– drum roll, please– I believe this is
the last detail. There’s plenty more I want
to say and a couple more things I want to do, but this
is the last sort of key detail here. [DRUM ROLL] [FANFARE MUSIC] That was good. But there’s a– I got lucky. So it’s sort of worked,
it sort of didn’t work. There is an issue. There is something
really important that I need to implement. And, actually, it’s
my intention for this to actually become
a feature of ml5 and it’s going to handle
it for you automatically. But that hasn’t been implemented
yet in ml5, so in this video I’m going to try it out. And then maybe in
a future video I’ll do a video about adding
this as a feature to ml5. And it has to do with the RDP
line simplification algorithm. Which, guess what? If you look at the
previous coding challenge, what a coincidence. It is the RDP line
simplification algorithm. So why does this matter? Let’s go back to
this example here. And I’m going to do something. Pay close attention. I’m going to hit Clear. I’m going to zoom way in close. I’m going to draw
very, very slowly a lot of squiggly lines like
this really, really slowly. And watch what happens
when I lift up the mouse. Ready? 1, 2, 3. Do you see how the
drawing changed? Somehow it’s very
subtle, but some of the points that I was
drawing were removed. The fidelity of the
line was lowered. Even though it’s capturing– I’m capturing the mouse
positions in my sketch at presumably 60
frames per second. I’m capturing a lot of points. So I’m giving the machine
learning model, the Sketch RNN model, all of these
states where the dx and dy values are really, really tiny. The model wasn’t actually
trained with drawings that have a super high fidelity
to them, with lots and lots of points close together. I’m actually not sure if that’s
in the original Quick, Draw! Data set, or whether that was
like a processing of the data. But I was in touch
with David Ha, who explained that
the RDP algorithm was used to simplify the drawings
when the model was trained. And so when you’re
feeding stuff into it, you want to have those
drawings retain that quality. So here is the code for
the last coding challenge, was the RDP line
simplification algorithm. And I’m just simplifying a
sort of mathematical curve just to recreate the animation
that’s on the Wikipedia page for the RDP algorithm. But I should be able to
take these functions– I’m going to create
another file called rdp.js. I’m going to reference
it here in index.html. I’m going to grab my
implementation of the RDP algorithm, which includes
all four of these functions. You can watch the other video
to see me write those functions. I’m going to paste
them all here. And what I want to
call is just rdp. So I give the rdp
function an array of a whole bunch of points. Then I– and I also
give it an empty array, that it’s going to fill
with the RDP reduced points. The way I created the example
is with a global variable called epsilon. So I’m just going
to sort of hard code in a value for that at 10. Now, right before I
generate the seedPath, I need to perform RDP
line simplification. Interestingly enough,
the RDP algorithm doesn’t know anything about
Sketch RNN and dx, dy, and pen state. So, actually, what
I want to do is I want to have another
array called seedPoints, and those are what I
actually want to collect. Oh. I may have just kind
of overcomplicated it. It’s OK. I’m going to comment this out. This is going to
become important again. But I’m going to say
seedPoints.push createVector, mouseX, mouseY. And then the line
that I want to draw is actually just mouseX,
mouseY, pmouseX, pmouseY. So let’s try this. So the drawing still works. What I want to do is perform
the RDP line simplification now. So I can go back to my
previous example once again and basically find
exactly this code. So I’m going to grab this code
and I’m going to put it here. And what this is doing is it’s
creating a new empty array and it’s calling
the rdp function on the allPoints array, which
is now called seedPoints, filling it with the simplified
version of the line. And then what I need to
do after that is– now I have these rdpPoints,
which is the simplified version of the line. I want to say for let i equals
0, i is less than rdpPoints– I’m going to
actually start at 1– rdpPoints.length, i++. And I need to create
the states now. Right here. This is exactly
what I want to do. I want to create the
strokePath, which is rdpPoints index i dot x minus
the previous one, i minus 1. Do the same thing for y. And then the pen is down. Then I can– I could redraw– I’m not going to redraw
anything just for a second. And then I want to put
that into the strokePath, and then call generate. All right. That was a little manic here,
which I guess all my coding challengers are pretty manic. But what just happened? Again, the idea, ultimately,
is for ml5 to handle this. I think that’s what
I would like to do. I’d like to create a
helper function in ml5 that Sketch RNN takes your
seed path in and like performs the line simplification for you. But this is my challenge
to implement it manually to see if it helps. So what I’m doing here
is I have a set of points that I’ve collected
in seedPoints. I call the rdp algorithm,
which simplifies that path. Previous coding challenge goes
through that in more detail. Then I have to
generate the states based off the simplified path. And then that’s what I put
into the Sketch RNN model. Not feeling super
confident here, but let’s give this a try. [DRUM ROLL] [FANFARE MUSIC] It looks the same. But I think, if I’m correct,
if I did this many, many times, I would have better results. I think, however,
it’s very important that I create that same
visual effect where I redraw the simplified line. And that’s actually a
pretty easy thing for me to do right here. Because what I can do
is, in sketchRNNStart, after I do the line
simplification, I can redraw the background. And then I can say beginShape. It should have been endShape. And I’m going to say
for let v of seedPath– I’m just getting all the
vectors out of the seedPath. vertex v.x, v.y. And I’m going to say noFill. I should have– I’ve already been
drawing, but just to be consistent, let me take
this here, put that here. So now I’m drawing and drawing
and drawing, erasing it, and then drawing
the simplified line. Let’s see how that goes. Mmm. Yes. I didn’t see it. What did I do wrong here? Oh, no, not the seedPath. seedPoints. And I should be
doing this before– technically speaking, it
doesn’t really matter, but this is what
I want to do here. So this is performing the
line simplification, drawing– and line isn’t really right,
but simplified path, path simplification. And then now converting
to Sketch RNN states. All right. Let’s try this one more time. Did you– did that simplify? It’s hard to see. Let’s try like a much
higher epsilon, like 100. Yes. Oh, yes, it did. It’s just not super obvious. Oh no no no no no. Ah. [SOMETHING BUZZES] Ah. Coding, coding,
whoops whoops whoops. rdpPoints. Simplified line is in rdpPoints,
not in the seedPoints. The seedPoints was
everything I collected. The simplified one
is called rdpPoints. I should use better naming. Now I think– let me go back and
change the epsilon back to 10, something more reasonable. And we’re going to try
this one more time. Draw it slowly so that
there’s a lot of extra points, and make a weird curve here. Here we go. 1, 2, 3. There we go. [FANFARE MUSIC] So that looks like it really
simplified it like way too much. So this would be an
interesting thing to tune, because I want this to be in– this will be like sort of
a hard-coded value probably in ml5, although maybe it’s
a parameter the user of ml5 could adjust. But this is what I would
want to sort of play with to figure out what
makes the most sense. But now I can really see the
line simplification happening. What it should actually
be, I have no idea. [SOMETHING BEEPS] So, as always, with any of
these coding challenges, I’m doing just a really
sort of basic version. I think that I have mostly
successfully recreated exactly this. But one of the
things you’ll notice is that there’s a lot more
thoughtfulness to the interface than the design. First of all, it’s drawing over
and over again what Sketch RNN an is drawing. It’s a different color. There’s a nice
interface for picking which model you want to load. You can kind of like
randomize stuff and clear it. And there’s a page of
information all about how this works. So maybe you want to
create your own version and think about what is
that interaction design, how are you visualizing what the
person is drawing versus what the model is drawing,
and how are you picking which model to draw. Maybe you can make
this into a game, an art project,
something that is– just tells a story, that
draws based on words that– text to speech, or speech
to text, or something that. There’s so many ideas that
I think you could explore. So I hope you make
one of those ideas and share it with me by
going to the and following the instructions
about how to share your community contribution. But I will, before
I go, just kind of give you a nice compilation. I think one of the
things that I want to return to is all the
different pre-trained models that are available. Now, I could also
train your own model, which would be a really
interesting thing to try. Very high degree of difficulty
there, but possible. But the reason why I’m assuming
there are models in here that are– in addition to
frog, there’s frogsofa– is because the frog
model was trained with only drawings of frogs. The frog– the sofa models
trained with drawings of sofa. The frogsofa models,
drawings of both, without a distinction
between the two. So the AI, the machine
learning model, is just learning about
paths that happen when you’re drawing frogs and sofas. And so we could try
crabrabbitfacepig. But I’m just going to enjoy
myself and try the everything model. Wait a second. I wanted to do it
again and again. So let’s at least add
that here as well. So if the state is end,
instead of saying noLoop, what I want to do is actually
call sketchRNN.reset, which will reset the model,
and then call sketchRNNStart. So if I reset the
model, call start again, it’s going to draw
a new version. No? Here’s the mistake. When I’m resetting
everything, I don’t need to do all this again,
but I might as well. So I could refactor this. (SINGING) I will refactor
this later, you know. At a minimum, what
I need to do is– actually, I don’t need
to do any of this, because seedPath is
a global variable. But I’m just going to do this. I’m just going to
reset seedPath. So, yes, we need
to refactor this, but I don’t want to add all
the seedPoints to the seedPath again. So let’s try this. What will the model draw? Enjoy. No? It’s not working still? Oh, this– so this
return is a problem. So I think I needed nextPen
to not be end any more, and currentStroke to null. Oh. Oh. x and y. x and y. [SOMETHING BUZZES] X and y. x equal rdpPoints, the last one. x and y need to just be
wherever this leaves off. I think this is the last
thing I forgot to reset, which I didn’t even
set for the first time. x, y needs to like pick
off where we left off. And I will say goodbye
to you by letting you watch a compilation of
Sketch RNN drawn cat pigs. [MUSIC PLAYING] This is way too much fun. [MUSIC PLAYING]

30 thoughts on “Coding Challenge #153: Interactive Drawing with Machine Learning Model (SketchRNN)

  1. Pen Up… Pen Down… Insert Logo's Turtle Flashbacks From The 80's Here…

    The Turtle has evolved. It "thinks", now… Nice!

  2. That was stonic by the way Dan can you post a similar libraries in other languages like C or python .. I love you ❀️❀️

Leave a Reply

Your email address will not be published. Required fields are marked *