{"id":432,"date":"2025-02-15T08:15:34","date_gmt":"2025-02-15T13:15:34","guid":{"rendered":"https:\/\/kushaltimsina.com\/blog\/?p=432"},"modified":"2025-02-15T08:15:35","modified_gmt":"2025-02-15T13:15:35","slug":"what-is-gradient-descent-in-machine-learning","status":"publish","type":"post","link":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/","title":{"rendered":"What is Gradient Descent in Machine Learning?"},"content":{"rendered":"\n<p>In this article, I&#8217;ll be explaining what gradient descent is in machine learning in a simple and easy to understand way.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is Gradient Descent in Machine Learning?<\/strong><\/h2>\n\n\n\n<p>Gradient descent is an <a href=\"https:\/\/www.complexica.com\/narrow-ai-glossary\/optimization-algorithms#:~:text=Optimization%20algorithms%3AOptimization%20algorithms%20are,maximizes%20a%20given%20objective%20function.\">optimization algorithm <\/a>used in machine learning to predict data. <\/p>\n\n\n\n<p>This is a probably really confusing explanation. <\/p>\n\n\n\n<p>So, please keep reading so that we can understand the full picture.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>An Example of Gradient Descent<\/strong><\/h2>\n\n\n\n<p>Imagine that we&#8217;re trying to figure out a relationship between the average height of parents and the height of the child.<\/p>\n\n\n\n<p>That is, if we know that the average height is, say, 68 inches, can we figure out how tall the child is?<\/p>\n\n\n\n<p>Of course, the height of the child depends on a lot of factors, like their age. <\/p>\n\n\n\n<p>A 6 year old child will be smaller than a 21 year old adult.<\/p>\n\n\n\n<p>But let&#8217;s assume that they&#8217;re all 12 years old for now.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Resolving Linear Regression for Gradient Descent<\/strong><\/h3>\n\n\n\n<p>We are trying to find the red line in the picture below. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"793\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40-1024x793.png\" alt=\"\" class=\"wp-image-433\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40-1024x793.png 1024w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40-300x232.png 300w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40-768x595.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png 1118w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The red line will basically tell us &#8220;if the average height of the parents is 68 inches, then the child&#8217;s height is probably Y inches&#8221;<\/p>\n\n\n\n<p>Remember linear equations from algebra? <\/p>\n\n\n\n<p>You can represent any line with <span class=\"wp-katex-eq\" data-display=\"false\"> y=mx+b <\/span>, where <span class=\"wp-katex-eq\" data-display=\"false\"> m <\/span> is the slope, <span class=\"wp-katex-eq\" data-display=\"false\"> x <\/span> is the x coordinate and <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span> is the y-intercept.<\/p>\n\n\n\n<p>In our case, we&#8217;re going to try to find a line, <span class=\"wp-katex-eq\" data-display=\"false\"> y_i = wx_i+b <\/span>, where <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> is what we call the weight. The <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> is our slope, and <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span> is our y-intercept.<\/p>\n\n\n\n<p>And we&#8217;ve subscripted <span class=\"wp-katex-eq\" data-display=\"false\"> y <\/span> with <span class=\"wp-katex-eq\" data-display=\"false\"> y_i <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> x <\/span> with <span class=\"wp-katex-eq\" data-display=\"false\"> x_i <\/span> to indicate that we&#8217;re now looking for the <span class=\"wp-katex-eq\" data-display=\"false\"> y <\/span> that corresponds to the <span class=\"wp-katex-eq\" data-display=\"false\">i<\/span>th point.<\/p>\n\n\n\n<p>So, basically, instead of looking for the entire line, the equation tells us that we&#8217;re just looking for the child&#8217;s height that corresponds to the average height <span class=\"wp-katex-eq\" data-display=\"false\"> x_i <\/span>.<\/p>\n\n\n\n<p>All we have to do is find the <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span>. <\/p>\n\n\n\n<p>Once we&#8217;ve found those two, we can just plug them into our formula: <span class=\"wp-katex-eq\" data-display=\"false\"> y_i = wx_i + b <\/span> we&#8217;ll get the child&#8217;s height.<\/p>\n\n\n\n<p>For example, if the weight is 1.1 and y-intercept is 8, then we can just say: <\/p>\n\n\n\n<p><span class=\"wp-katex-eq\" data-display=\"false\"> y_i = 1.1x_i+8 <\/span>. <\/p>\n\n\n\n<p>And if the average height of parent 1, <span class=\"wp-katex-eq\" data-display=\"false\"> x_i = 68 <\/span> (which is about 5 foot 6), then we can say:<\/p>\n\n\n\n<p>The height of child 1, <span class=\"wp-katex-eq\" data-display=\"false\"> y_1 <\/span> is given by:<\/p>\n\n\n\n<p><span class=\"wp-katex-eq\" data-display=\"false\"> y_1 = 1.1*68+8=82.8 <\/span>.<\/p>\n\n\n\n<p>So, given that formula, the height of the child is 82.8 inches, which means that they&#8217;re 6 foot 9.<\/p>\n\n\n\n<p>6 foot 9!?!?<\/p>\n\n\n\n<p>Exactly. The weight <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> and y-intercept <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span> in this example produced an unrealistic example. <\/p>\n\n\n\n<p>So, how do we find a weight and y-intercept that will actually map the data correctly?<\/p>\n\n\n\n<p>We use gradient descent.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The Sum of Squared Residuals<\/strong><\/h3>\n\n\n\n<p>To begin with, let me introduce you to the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Residual_sum_of_squares\">sum of squared residuals.<\/a> <\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>The Error in One Point<\/strong><\/h4>\n\n\n\n<p>Consider that the difference between the actual point  <span class=\"wp-katex-eq\" data-display=\"false\"> y <\/span> and the predicted point <span class=\"wp-katex-eq\" data-display=\"false\"> \\hat{y} <\/span> is equal to <span class=\"wp-katex-eq\" data-display=\"false\"> y_i - \\hat{y_i} <\/span>.<\/p>\n\n\n\n<p>What this means is:<\/p>\n\n\n\n<p>If our formula said that the height of child 1 was 82.8 inches and the actual height of child 1 was 70 inches, then:<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> y_1 = 70 <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> hat{y_1} = 82.8 <\/span>\n\n\n\n<p>And the difference between the two  is: <span class=\"wp-katex-eq\" data-display=\"false\"> \\hat{y_1} - y_1 = 82.8-70=12.8 <\/span><\/p>\n\n\n\n<p>This means that our formula gave us an error of 12.8 inches for the 1st height of the child and the 1st height of the parent.<\/p>\n\n\n\n<p>Great. So now, we know the error in our formula for the first point.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>The Error in Every Point<\/strong><\/h4>\n\n\n\n<p>But we want the error in every point. <\/p>\n\n\n\n<p>This means that we have to take the sum of all of the errors, right?<\/p>\n\n\n\n<p><span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR = \\sum_{i=1}^{N}{(y_i - \\hat{y_i})} <\/span>.<\/p>\n\n\n\n<p>This basically means &#8220;add up all of the errors from the first point <span class=\"wp-katex-eq\" data-display=\"false\"> i = 1 <\/span> to the last point, <span class=\"wp-katex-eq\" data-display=\"false\"> i=N <\/span>.&#8221;<\/p>\n\n\n\n<p>If we had 3 points, then <\/p>\n\n\n\n<p><span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR = \\sum_{i=1}^{N}{(y_i - \\hat{y_i})} <\/span>.<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> = \\sum_{i=1}^{3}{(y_i - \\hat{y_i})} <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> = (y_1 - \\hat{y_1}) + (y_2 - \\hat{y_2}) + (y_3 - \\hat{y_3})  <\/span>\n\n\n\n<p>This is the same thing as saying &#8220;the error for the 3 points is the error in point 1 plus the error in point 2 plus the error in point 3.&#8221; <\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Squaring the Error<\/strong><\/h4>\n\n\n\n<p>Great. Now, to prevent us from getting negative numbers and to make every error more drastic, we&#8217;re going to square everything.<\/p>\n\n\n\n<p>Remember how previously, our error in point 1 was 12.8?<\/p>\n\n\n\n<p>Guess what?<\/p>\n\n\n\n<p>Now, it&#8217;s 12.8^2=163.84 <\/p>\n\n\n\n<p>And what if the error was 0.05?<\/p>\n\n\n\n<p>Then, by squaring it, we&#8217;ll get 0.05^2=0.0025 <\/p>\n\n\n\n<p>Do you see the effect?<\/p>\n\n\n\n<p>Squaring a big number gives a big number, while squaring a small number gives a smaller number. <\/p>\n\n\n\n<p>And we&#8217;re using this for error, so if our error is really small and we square it, we&#8217;d be telling ourselves that the error is really small.<\/p>\n\n\n\n<p>And if our error is really big and we square it, it would look like the error is really big.<\/p>\n\n\n\n<p>And thus, we have our formula for the error, which we call the squared sum of least residuals.<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR = \\sum_{i=1}^{N}(y_i-\\hat{y_i})^2 <\/span>\n\n\n\n<p>So this whole <span class=\"wp-katex-eq\" data-display=\"false\"> SSR <\/span> monster is going to tell us how much error exists between our predictions and the actual data.<\/p>\n\n\n\n<p>And remember our buddy <span class=\"wp-katex-eq\" data-display=\"false\">\\hat{y_i}<\/span>, our prediction? <\/p>\n\n\n\n<p><span class=\"wp-katex-eq\" data-display=\"false\"> \\hat{y_i}<\/span> is actually equal to <span class=\"wp-katex-eq\" data-display=\"false\">\\hat{y_i}=wx_i+b<\/span>, because <span class=\"wp-katex-eq\" data-display=\"false\"> \\hat{y_i} <\/span> is our prediction. <\/p>\n\n\n\n<p>That&#8217;s the variable we&#8217;re trying to find the line for.<\/p>\n\n\n\n<p>Thus,<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR(w, b) = \\sum_{i=1}^{N}(wx_i + b -\\hat{y_i})^2 <\/span>\n\n\n\n<p>And <span class=\"wp-katex-eq\" data-display=\"false\"> SSR(w, b) <\/span> is written in terms of <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span> because <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span> are both variables.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Minimizing the Error<\/strong><\/h3>\n\n\n\n<p>Now, we can model out error, <span class=\"wp-katex-eq\" data-display=\"false\"> SSR <\/span> with a shape like this.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"881\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-53-1024x881.png\" alt=\"\" class=\"wp-image-484\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-53-1024x881.png 1024w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-53-300x258.png 300w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-53-768x661.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-53.png 1202w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR(w, b) = \\sum_{i=1}^{N}(wx_i + b -\\hat{y_i})^2 <\/span>\n\n\n\n<p>So, that red cloth looking thing is a model of our error.<\/p>\n\n\n\n<p>Great.<\/p>\n\n\n\n<p>It&#8217;s our error. <\/p>\n\n\n\n<p>That means it&#8217;s bad.<\/p>\n\n\n\n<p>So, we want this error to be as small as possible.<\/p>\n\n\n\n<p>In other words, we want to minimize this error. <\/p>\n\n\n\n<p>We want to figure out the points at the bottom of that red thing.<\/p>\n\n\n\n<p>The bottom of that red thing (in this example) is that point at (0, 0, 0)<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"957\" height=\"1024\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-54-957x1024.png\" alt=\"\" class=\"wp-image-485\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-54-957x1024.png 957w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-54-280x300.png 280w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-54-768x822.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-54.png 1234w\" sizes=\"auto, (max-width: 957px) 100vw, 957px\" \/><\/figure>\n\n\n\n<p>Now, note that the shape of this bowl looking thing varies.<\/p>\n\n\n\n<p>Sometimes, it looks like a bowl.<\/p>\n\n\n\n<p>Other times, it looks like, uhhh, whatever this thing is.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"958\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-55-1024x958.png\" alt=\"\" class=\"wp-image-486\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-55-1024x958.png 1024w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-55-300x281.png 300w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-55-768x719.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-55-1536x1438.png 1536w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-55.png 1562w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>And we&#8217;re trying to minimize this error.<\/p>\n\n\n\n<p>And one way of getting to the bottom of this error is by using gradient descent.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>An Example of Gradient Descent<\/strong><\/h3>\n\n\n\n<p>Let&#8217;s get started with an example of gradient descent.<\/p>\n\n\n\n<p>Let&#8217;s say that our points are these guys.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"970\" height=\"780\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-59.png\" alt=\"\" class=\"wp-image-494\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-59.png 970w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-59-300x241.png 300w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-59-768x618.png 768w\" sizes=\"auto, (max-width: 970px) 100vw, 970px\" \/><\/figure>\n\n\n\n<p>And we&#8217;re trying to find a line y=wx+b that minimizes the error, <span class=\"wp-katex-eq\" data-display=\"false\"> SSR <\/span>.<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR(w, b) = \\sum_{i=1}^{N}(wx_i + b -\\hat{y_i})^2 <\/span>\n\n\n\n<p>This is the same error equation from the previous section.<\/p>\n\n\n\n<p>In gradient descent, we start out by picking a random value for <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span>.<\/p>\n\n\n\n<p>Let&#8217;s say that <span class=\"wp-katex-eq\" data-display=\"false\"> w = 3 <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> b = 1 <\/span>.<\/p>\n\n\n\n<p>Then, by substitution, our line becomes:<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> y = 3x+1 <\/span>\n\n\n\n<p>Take a look.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"783\" height=\"1024\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-60-783x1024.png\" alt=\"\" class=\"wp-image-495\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-60-783x1024.png 783w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-60-229x300.png 229w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-60-768x1005.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-60.png 824w\" sizes=\"auto, (max-width: 783px) 100vw, 783px\" \/><\/figure>\n\n\n\n<p>Our line, in blue, is super inaccurate, right?<\/p>\n\n\n\n<p>It doesn&#8217;t go through any of the points at all. In fact, it&#8217;s pretty far away.<\/p>\n\n\n\n<p>This means that our weight <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> and bias (y-intercept) <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span> are inaccurate values.<\/p>\n\n\n\n<p>We want to find a good <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span>, so that our line has the least error.<\/p>\n\n\n\n<p>This is where the gradient descent algorithm in machine learning comes into play.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Plotting the Error Function<\/strong><\/h3>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR(w, b) = \\sum_{i=1}^{N}(wx_i + b -\\hat{y_i})^2 <\/span>\n\n\n\n<p>Now, we&#8217;re going to use gradient descent to model the minimization of this error.<\/p>\n\n\n\n<p>I&#8217;ve gone through and plotted every point and graphed it such that you can see the <strong>actual<\/strong> error graph.<\/p>\n\n\n\n<p>This green flat bowl looking thingy is the actual error<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"939\" height=\"1024\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-61-939x1024.png\" alt=\"\" class=\"wp-image-496\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-61-939x1024.png 939w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-61-275x300.png 275w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-61-768x837.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-61.png 1106w\" sizes=\"auto, (max-width: 939px) 100vw, 939px\" \/><\/figure>\n\n\n\n<p>Hey, look!<\/p>\n\n\n\n<p>From the top, it looks like a cucumber.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"966\" height=\"1024\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-62-966x1024.png\" alt=\"\" class=\"wp-image-497\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-62-966x1024.png 966w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-62-283x300.png 283w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-62-768x814.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-62.png 1030w\" sizes=\"auto, (max-width: 966px) 100vw, 966px\" \/><\/figure>\n\n\n\n<p>The x-axis (going left and right in the picture) is our &#8220;w&#8221; axis, representing our weights.<\/p>\n\n\n\n<p>And the y-axis (going up and down in the picture) will be our &#8220;b&#8221; axis, representing the bias (y-intercept of the graph).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"931\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-63-1024x931.png\" alt=\"\" class=\"wp-image-498\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-63-1024x931.png 1024w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-63-300x273.png 300w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-63-768x698.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-63.png 1294w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>And we are trying to find the w and b that minimizes this error.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"958\" height=\"740\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-64.png\" alt=\"\" class=\"wp-image-499\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-64.png 958w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-64-300x232.png 300w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-64-768x593.png 768w\" sizes=\"auto, (max-width: 958px) 100vw, 958px\" \/><\/figure>\n\n\n\n<p>In other words, we&#8217;re trying to find the w and b that takes us to the bottom of this vegetable looking thing.<\/p>\n\n\n\n<p>And one way to do this is by using the gradient descent algorithm.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Finding the Gradients of the SSR<\/strong><\/h3>\n\n\n\n<p>The gradient descent algorithm starts by choosing a random &#8220;w&#8221; and &#8220;b&#8221;, as we did earlier to get this line:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"547\" height=\"1024\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-65-547x1024.png\" alt=\"\" class=\"wp-image-500\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-65-547x1024.png 547w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-65-160x300.png 160w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-65-768x1438.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-65.png 780w\" sizes=\"auto, (max-width: 547px) 100vw, 547px\" \/><\/figure>\n\n\n\n<p>Here, w = 3 and b = 1, giving us y=3x+1<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR(w, b) = \\sum_{i=1}^{N}(wx_i + b -\\hat{y_i})^2 <\/span>\n\n\n\n<p>And in our SSR error function (the vegetable graph), that point is represented by that gray ball, at the point 205. <\/p>\n\n\n\n<p>This means that <span class=\"wp-katex-eq\" data-display=\"false\"> SSR(3, 1) = 205 <\/span>.<\/p>\n\n\n\n<p>That 205 is our error. <\/p>\n\n\n\n<p>That&#8217;s basically how bad our line is at predicting the points. <\/p>\n\n\n\n<p>It&#8217;s 205 bad. OK!<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"606\" height=\"716\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-66.png\" alt=\"\" class=\"wp-image-501\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-66.png 606w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-66-254x300.png 254w\" sizes=\"auto, (max-width: 606px) 100vw, 606px\" \/><\/figure>\n\n\n\n<p>Remember, we want to find the w and b that take us to the bottom of that curve because at the bottom of that curve, our error will be very small.<\/p>\n\n\n\n<p>To do this, imagine that you&#8217;re standing on that point.<\/p>\n\n\n\n<p>There&#8217;s one direction that is the steepest direction. That is the gradient.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"http:\/\/www.vias.org\/calculus\/13_vector_calculus_01_06.html\"><img loading=\"lazy\" decoding=\"async\" width=\"455\" height=\"338\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-67.png\" alt=\"\" class=\"wp-image-513\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-67.png 455w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-67-300x223.png 300w\" sizes=\"auto, (max-width: 455px) 100vw, 455px\" \/><\/a><figcaption class=\"wp-element-caption\">http:\/\/www.vias.org\/calculus\/13_vector_calculus_01_06.html<\/figcaption><\/figure>\n\n\n\n<p> In the picture above, you can see that the gradient, <span class=\"wp-katex-eq\" data-display=\"false\"> grad f <\/span>, points at the steepest direction.<\/p>\n\n\n\n<p>The gradient, since it is the steepest direction, it is the direction at which will be the hardest to climb.<\/p>\n\n\n\n<p>But, we&#8217;re trying to go down, aren&#8217;t we?<\/p>\n\n\n\n<p>So.. instead of trying to go up in the steepest direction, why don&#8217;t we go down in the steepest direction? That means that we&#8217;ll go down in the direction that will get us to the bottom the fastest.<\/p>\n\n\n\n<p>Thus, we have our gradients.<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> \\nabla\u00a0SSR =\\begin{bmatrix} \\displaystyle\\frac{\\partial\u00a0SSR}{\\partial\u00a0w}\u00a0\\\\ \\displaystyle\\frac{\\partial\u00a0SSR}{\\partial\u00a0b} \\end{bmatrix} <\/span>\n\n\n\n<p>Remember that <span class=\"wp-katex-eq\" data-display=\"false\"> \\nabla SSR <\/span> will get us in the steepest direction. <\/p>\n\n\n\n<p>We want to go <strong>down<\/strong> to find the minimum of the SSR, not up.<\/p>\n\n\n\n<p>So, to get to the minimum of the error, why don&#8217;t we just subtract <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span> by the gradient.<\/p>\n\n\n\n<p>If we subtract the gradient, then we&#8217;ll be decreasing <span class=\"wp-katex-eq\" data-display=\"false\"> w <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> b <\/span> by the steepest direction. <\/p>\n\n\n\n<p>And if we keep on subtracting this gradient over and over again, eventually, we&#8217;ll reach the bottom and we&#8217;ll get the w and b needed to minimize the error!<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> w_{new} = w_{old} - \\frac{\\partial SSR}{\\partial w_{old}}  <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> b_{new} = b_{old} - \\frac{\\partial SSR}{\\partial b_{old}}  <\/span>\n\n\n\n<p>I won&#8217;t be getting too deep into the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Multivariable_calculus\">multivariable calculus<\/a> in this thread, but here is the gradient of SSR:<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> \\nabla\u00a0SSR =\\begin{bmatrix} \\frac{\\partial\u00a0SSR}{\\partial\u00a0w}\u00a0\\\\ \\frac{\\partial\u00a0SSR}{\\partial\u00a0b} \\end{bmatrix} <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> = \\displaystyle\\begin{bmatrix}\\displaystyle\\sum_{i=1}^{N} 2 x_i (w x_i + b -\\hat{y_i})\u00a0\\\\ \\displaystyle\\sum_{i=1}^{N} 2 (w x_i + b -\u00a0\\hat{y_i})\\end{bmatrix} <\/span>\n\n\n\n<p>And, we&#8217;re going to introduce a new variable <span class=\"wp-katex-eq\" data-display=\"false\"> \\alpha <\/span>, which we&#8217;re going to call the <strong>learning rate<\/strong>.<\/p>\n\n\n\n<p>And our <span class=\"wp-katex-eq\" data-display=\"false\"> \\alpha <\/span> will be between <span class=\"wp-katex-eq\" data-display=\"false\"> 0 <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> 1 <\/span>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Using the Gradient Descent Algorithm<\/strong><\/h3>\n\n\n\n<p>So, we have:<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> w_{new} = w_{old} - \\alpha \\frac{\\partial SSR}{\\partial w_{old}}  <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> b_{new} = b_{old} - \\alpha \\frac{\\partial SSR}{\\partial b_{old}}  <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> \\nabla\u00a0SSR =\\begin{bmatrix} \\frac{\\partial\u00a0SSR}{\\partial\u00a0w}\u00a0\\\\ \\frac{\\partial\u00a0SSR}{\\partial\u00a0b} \\end{bmatrix} <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> = \\displaystyle\\begin{bmatrix}\\displaystyle\\sum_{i=1}^{N} 2 x_i (w x_i + b -\\hat{y_i})\u00a0\\\\ \\displaystyle\\sum_{i=1}^{N} 2 (w x_i + b -\u00a0\\hat{y_i})\\end{bmatrix} <\/span>\n\n\n\n<p>Now, if we were to run these equations one time (we call this <strong>one epoch<\/strong>), with <span class=\"wp-katex-eq\" data-display=\"false\"> \\alpha = 0.005 <\/span> we would get:<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> w_{new} = 3 - 0.005 \\displaystyle\\sum_{i=1}^{N} 2 x_i (w x_i + b -\\hat{y_i}) = 2.12 <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> b_{new} = 1 - 0.005 \\displaystyle\\sum_{i=1}^{N} 2 (w x_i + b -\u00a0\\hat{y_i}) = 0.17 <\/span>\n\n\n\n<p>And now, we have new values for w and b!<\/p>\n\n\n\n<p>If we plug in these values for our SSR:<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR(w, b) = \\sum_{i=1}^{N}(wx_i + b -\\hat{y_i})^2 <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> SSR(2.12, 0.17) = \\sum_{i=1}^{N}(2.12x_i + 0.17 -\\hat{y_i})^2 = 58.0419 <\/span>\n\n\n\n<p>Our error, with <span class=\"wp-katex-eq\" data-display=\"false\"> w = 2.12 <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> b = 0.17 <\/span> is now only 58.0419.<\/p>\n\n\n\n<p>And in the picture below, it&#8217;s represented by that blue dot.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"774\" height=\"1024\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-68-774x1024.png\" alt=\"\" class=\"wp-image-534\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-68-774x1024.png 774w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-68-227x300.png 227w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-68-768x1015.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-68.png 776w\" sizes=\"auto, (max-width: 774px) 100vw, 774px\" \/><\/figure>\n\n\n\n<p>Look at how much we&#8217;ve improved! <\/p>\n\n\n\n<p>Our error is a lot less.. and if we look at our line, using the new values for w and b&#8230;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"760\" src=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-69-1024x760.png\" alt=\"\" class=\"wp-image-535\" srcset=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-69-1024x760.png 1024w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-69-300x223.png 300w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-69-768x570.png 768w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-69-1536x1140.png 1536w, https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-69.png 1996w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The blue line is our new line, which is a lot better than the green line!<\/p>\n\n\n\n<p>Now, imagine if we ran the same algorithm again, (another <strong>epoch<\/strong>), and again, and again, &#8230; <\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> w_{new} = w_{old} - \\alpha \\frac{\\partial SSR}{\\partial w_{old}}  <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> b_{new} = b_{old} - \\alpha \\frac{\\partial SSR}{\\partial b_{old}}  <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> \\nabla\u00a0SSR =\\begin{bmatrix} \\frac{\\partial\u00a0SSR}{\\partial\u00a0w}\u00a0\\\\ \\frac{\\partial\u00a0SSR}{\\partial\u00a0b} \\end{bmatrix} <\/span>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> = \\displaystyle\\begin{bmatrix}\\displaystyle\\sum_{i=1}^{N} 2 x_i (w x_i + b -\\hat{y_i})\u00a0\\\\ \\displaystyle\\sum_{i=1}^{N} 2 (w x_i + b -\u00a0\\hat{y_i})\\end{bmatrix} <\/span>\n\n\n\n<p>The results get better and better.<\/p>\n\n\n\n<p>Note that in this example, I used a large learning rate to demonstrate the effects of gradient descent. <\/p>\n\n\n\n<p>In practice, we&#8217;d use an even smaller learning rate.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n\n\n\n<p>And there you have it! <\/p>\n\n\n\n<p>To conclude, gradient descent is an algorithm that we use in machine learning to find the &#8220;best fit line&#8221; across a bunch of points.<\/p>\n\n\n\n<p>Here are the graphs I created:<\/p>\n\n\n\n<p>3D graph: https:\/\/www.desmos.com\/3D\/p1lrfiqlii<\/p>\n\n\n\n<p>2D graph: https:\/\/www.desmos.com\/calculator\/vnzyq8t1kn<\/p>\n\n\n\n<p>Thanks for reading!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, I&#8217;ll be explaining what gradient descent is in machine learning in a simple and easy to understand way. What is Gradient Descent in Machine Learning? Gradient descent is an optimization algorithm used in machine learning to predict data. This is a probably really confusing explanation. So, please keep reading so that we [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":433,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[11,28],"class_list":["post-432","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-science","tag-algorithms","tag-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.0 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>What is Gradient Descent in Machine Learning? - Kushal Writes<\/title>\n<meta name=\"description\" content=\"This article explains what gradient descent is in machine learning. Gradient descent is an algorithm used to minimize error in predictions.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What is Gradient Descent in Machine Learning? - Kushal Writes\" \/>\n<meta property=\"og:description\" content=\"This article explains what gradient descent is in machine learning. Gradient descent is an algorithm used to minimize error in predictions.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Kushal Writes\" \/>\n<meta property=\"article:published_time\" content=\"2025-02-15T13:15:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-02-15T13:15:35+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1118\" \/>\n\t<meta property=\"og:image:height\" content=\"866\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"kushal\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"kushal\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/\"},\"author\":{\"name\":\"kushal\",\"@id\":\"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/9ae64ce30587c804d89b1eef21ba5d2f\"},\"headline\":\"What is Gradient Descent in Machine Learning?\",\"datePublished\":\"2025-02-15T13:15:34+00:00\",\"dateModified\":\"2025-02-15T13:15:35+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/\"},\"wordCount\":2565,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/9ae64ce30587c804d89b1eef21ba5d2f\"},\"image\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png\",\"keywords\":[\"algorithms\",\"machine learning\"],\"articleSection\":[\"Computer Science\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/\",\"url\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/\",\"name\":\"What is Gradient Descent in Machine Learning? - Kushal Writes\",\"isPartOf\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png\",\"datePublished\":\"2025-02-15T13:15:34+00:00\",\"dateModified\":\"2025-02-15T13:15:35+00:00\",\"description\":\"This article explains what gradient descent is in machine learning. Gradient descent is an algorithm used to minimize error in predictions.\",\"breadcrumb\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#primaryimage\",\"url\":\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png\",\"contentUrl\":\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png\",\"width\":1118,\"height\":866},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/kushaltimsina.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"What is Gradient Descent in Machine Learning?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/kushaltimsina.com\/blog\/#website\",\"url\":\"https:\/\/kushaltimsina.com\/blog\/\",\"name\":\"Kushal Timsina\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/9ae64ce30587c804d89b1eef21ba5d2f\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/kushaltimsina.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/9ae64ce30587c804d89b1eef21ba5d2f\",\"name\":\"kushal\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2024\/11\/tempImage75F1Sw-edited.jpg\",\"contentUrl\":\"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2024\/11\/tempImage75F1Sw-edited.jpg\",\"width\":1274,\"height\":849,\"caption\":\"kushal\"},\"logo\":{\"@id\":\"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/image\/\"},\"description\":\"Kushal Timsina has been developing Roblox games since 2016, played 40,000,000+ times, teaches Roblox scripting on YouTube to 1,000,000+ views, and is the author of the Beginner's Guide to Roblox Scripting book.\",\"sameAs\":[\"https:\/\/kushaltimsina.com\/blog\"],\"url\":\"https:\/\/kushaltimsina.com\/blog\/author\/kushal\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What is Gradient Descent in Machine Learning? - Kushal Writes","description":"This article explains what gradient descent is in machine learning. Gradient descent is an algorithm used to minimize error in predictions.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/","og_locale":"en_US","og_type":"article","og_title":"What is Gradient Descent in Machine Learning? - Kushal Writes","og_description":"This article explains what gradient descent is in machine learning. Gradient descent is an algorithm used to minimize error in predictions.","og_url":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/","og_site_name":"Kushal Writes","article_published_time":"2025-02-15T13:15:34+00:00","article_modified_time":"2025-02-15T13:15:35+00:00","og_image":[{"width":1118,"height":866,"url":"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png","type":"image\/png"}],"author":"kushal","twitter_card":"summary_large_image","twitter_misc":{"Written by":"kushal","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#article","isPartOf":{"@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/"},"author":{"name":"kushal","@id":"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/9ae64ce30587c804d89b1eef21ba5d2f"},"headline":"What is Gradient Descent in Machine Learning?","datePublished":"2025-02-15T13:15:34+00:00","dateModified":"2025-02-15T13:15:35+00:00","mainEntityOfPage":{"@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/"},"wordCount":2565,"commentCount":0,"publisher":{"@id":"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/9ae64ce30587c804d89b1eef21ba5d2f"},"image":{"@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png","keywords":["algorithms","machine learning"],"articleSection":["Computer Science"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/","url":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/","name":"What is Gradient Descent in Machine Learning? - Kushal Writes","isPartOf":{"@id":"https:\/\/kushaltimsina.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#primaryimage"},"image":{"@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png","datePublished":"2025-02-15T13:15:34+00:00","dateModified":"2025-02-15T13:15:35+00:00","description":"This article explains what gradient descent is in machine learning. Gradient descent is an algorithm used to minimize error in predictions.","breadcrumb":{"@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#primaryimage","url":"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png","contentUrl":"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2025\/02\/image-40.png","width":1118,"height":866},{"@type":"BreadcrumbList","@id":"https:\/\/kushaltimsina.com\/blog\/2025\/02\/15\/what-is-gradient-descent-in-machine-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/kushaltimsina.com\/blog\/"},{"@type":"ListItem","position":2,"name":"What is Gradient Descent in Machine Learning?"}]},{"@type":"WebSite","@id":"https:\/\/kushaltimsina.com\/blog\/#website","url":"https:\/\/kushaltimsina.com\/blog\/","name":"Kushal Timsina","description":"","publisher":{"@id":"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/9ae64ce30587c804d89b1eef21ba5d2f"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/kushaltimsina.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/9ae64ce30587c804d89b1eef21ba5d2f","name":"kushal","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2024\/11\/tempImage75F1Sw-edited.jpg","contentUrl":"https:\/\/kushaltimsina.com\/blog\/wp-content\/uploads\/2024\/11\/tempImage75F1Sw-edited.jpg","width":1274,"height":849,"caption":"kushal"},"logo":{"@id":"https:\/\/kushaltimsina.com\/blog\/#\/schema\/person\/image\/"},"description":"Kushal Timsina has been developing Roblox games since 2016, played 40,000,000+ times, teaches Roblox scripting on YouTube to 1,000,000+ views, and is the author of the Beginner's Guide to Roblox Scripting book.","sameAs":["https:\/\/kushaltimsina.com\/blog"],"url":"https:\/\/kushaltimsina.com\/blog\/author\/kushal\/"}]}},"_links":{"self":[{"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/posts\/432","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/comments?post=432"}],"version-history":[{"count":67,"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/posts\/432\/revisions"}],"predecessor-version":[{"id":537,"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/posts\/432\/revisions\/537"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/media\/433"}],"wp:attachment":[{"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/media?parent=432"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/categories?post=432"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kushaltimsina.com\/blog\/wp-json\/wp\/v2\/tags?post=432"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}