<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://sohagkumarsaha.github.io/staging/feed.xml" rel="self" type="application/atom+xml" /><link href="https://sohagkumarsaha.github.io/staging/" rel="alternate" type="text/html" /><updated>2026-05-15T22:57:15-05:00</updated><id>https://sohagkumarsaha.github.io/staging/feed.xml</id><title type="html">Sohag Kumar Saha</title><subtitle>Ph.D. Researcher in Electrical &amp; Computer Engineering at Tennessee Technological University. Research interests: Smart Grid, Energy Management Systems, Renewable Energy, Deep Reinforcement Learning.</subtitle><author><name>Sohag Kumar Saha</name><email>sohag@ieee.org</email></author><entry><title type="html">Gradient Descent in Raw Python Code</title><link href="https://sohagkumarsaha.github.io/staging/machine%20learning/optimization/python/tutorial/gradient-descent-in-raw-python-code/" rel="alternate" type="text/html" title="Gradient Descent in Raw Python Code" /><published>2026-05-11T00:00:00-05:00</published><updated>2026-05-11T00:00:00-05:00</updated><id>https://sohagkumarsaha.github.io/staging/machine%20learning/optimization/python/tutorial/gradient-descent-in-raw-python-code</id><content type="html" xml:base="https://sohagkumarsaha.github.io/staging/machine%20learning/optimization/python/tutorial/gradient-descent-in-raw-python-code/"><![CDATA[<p>Optimization is one of the most important ideas in engineering, data science, machine learning, control systems, and artificial intelligence. Many modern algorithms try to answer one simple question:</p>

<blockquote>
  <p>How can we find the best value of a variable that minimizes or maximizes a function?</p>
</blockquote>

<p>In this post, we will learn <strong>Gradient Descent</strong> from the ground up using a pure Python implementation. The goal is not only to run the algorithm, but also to understand how it works internally. The implementation uses only built-in Python features, so students can focus on the fundamental concept without depending on external libraries such as NumPy or Matplotlib.</p>

<p>The example function used in this tutorial is:</p>

\[f(x) = x^2 + 1\]

<p>This is a simple convex function. Its minimum occurs near:</p>

\[x = 0, \qquad f(x) = 1\]

<p>The Python code will start from an initial guess, repeatedly update the value of <code class="language-plaintext highlighter-rouge">x</code>, and gradually move toward the minimum point.</p>

<hr />

<h2 id="what-is-gradient-descent">What Is Gradient Descent?</h2>

<p><strong>Gradient Descent</strong> is an iterative optimization algorithm used to find the minimum value of a function. It works by moving step-by-step in the direction where the function decreases the fastest.</p>

<p>For a function \(f(x)\), the basic update rule is:</p>

\[x_{new} = x_{old} - \alpha \frac{df(x)}{dx}\]

<p>where:</p>

<ul>
  <li>\(x_{old}\) is the current value of the variable.</li>
  <li>\(x_{new}\) is the updated value after one step.</li>
  <li>\(\alpha\) is the learning rate.</li>
  <li>\(\frac{df(x)}{dx}\) is the gradient or derivative of the function.</li>
</ul>

<p>The derivative tells us the slope of the function at the current point. If the slope is positive, the algorithm moves left. If the slope is negative, the algorithm moves right. Repeating this process brings the value of <code class="language-plaintext highlighter-rouge">x</code> closer to the minimum.</p>

<hr />

<h2 id="why-learn-gradient-descent">Why Learn Gradient Descent?</h2>

<p>Gradient Descent is a foundational algorithm in many technical fields. It is widely used in:</p>

<ol>
  <li><strong>Machine learning</strong> for training models by minimizing loss functions.</li>
  <li><strong>Deep learning</strong> for updating neural network weights.</li>
  <li><strong>Engineering optimization</strong> for finding efficient system parameters.</li>
  <li><strong>Control systems</strong> for tuning controllers and minimizing cost functions.</li>
  <li><strong>Economics and operations research</strong> for cost minimization and resource allocation.</li>
  <li><strong>Energy management systems</strong> for optimizing dispatch, battery usage, and operating cost.</li>
</ol>

<p>Understanding Gradient Descent helps students understand how many advanced algorithms learn from data or improve system performance.</p>

<hr />

<h2 id="working-principle-of-gradient-descent">Working Principle of Gradient Descent</h2>

<p>The working process of Gradient Descent can be explained in five simple steps:</p>

<h3 id="step-1-define-the-objective-function">Step 1: Define the objective function</h3>

<p>The objective function is the function we want to minimize. In this tutorial, we use:</p>

\[f(x) = x^2 + 1\]

<p>The goal is to find the value of <code class="language-plaintext highlighter-rouge">x</code> that gives the smallest value of <code class="language-plaintext highlighter-rouge">f(x)</code>.</p>

<h3 id="step-2-choose-an-initial-value">Step 2: Choose an initial value</h3>

<p>The algorithm needs a starting point. In the code, we start from:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="mf">10.0</span>
</code></pre></div></div>

<p>This means the algorithm begins far away from the true minimum at <code class="language-plaintext highlighter-rouge">x = 0</code>.</p>

<h3 id="step-3-compute-the-gradient">Step 3: Compute the gradient</h3>

<p>The gradient tells us the direction of the steepest increase. Since we want to minimize the function, we move in the opposite direction of the gradient.</p>

<p>In this implementation, the gradient is computed numerically using the <strong>central difference method</strong>:</p>

\[f'(x) \approx \frac{f(x+h)-f(x-h)}{2h}\]

<p>This is useful for learning because students do not need to manually derive the derivative for every function.</p>

<h3 id="step-4-update-the-value-of-x">Step 4: Update the value of <code class="language-plaintext highlighter-rouge">x</code></h3>

<p>The update rule is:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x_new</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">grad</span>
</code></pre></div></div>

<p>The learning rate controls the step size. A small learning rate may converge slowly. A very large learning rate may overshoot the minimum.</p>

<h3 id="step-5-repeat-until-convergence">Step 5: Repeat until convergence</h3>

<p>The algorithm continues updating <code class="language-plaintext highlighter-rouge">x</code> until the changes become very small. When the update difference is below a defined tolerance, the algorithm stops early.</p>

<hr />

<h2 id="complete-python-code-gradient-descent-from-scratch">Complete Python Code: Gradient Descent from Scratch</h2>

<p>The following code implements Gradient Descent using only built-in Python. It includes:</p>

<ul>
  <li>A user-defined objective function.</li>
  <li>Numerical gradient calculation using first principles (central difference formula).</li>
  <li>Automatic theoretical minimum estimation.</li>
  <li>Iteration history recording.</li>
  <li>A text-based ASCII convergence plot.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># =============================================
# Gradient Descent from scratch - Pure Python
# AUTOMATED gradient + FULLY AUTOMATIC theoretical minimum
# + CONVERGENCE PLOT using only built-in Python (no libraries!)
# Benchmark function example: f(x) = x² + 1
# =============================================
</span>

<span class="k">def</span> <span class="nf">objective</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">The ONLY function you need to change.
    Put any mathematical expression you want to minimize here.
  
    In this example, the objective function is:
        f(x) = x^2 + 1
  
    The minimum value of this function occurs at x = 0,
    where f(x) = 1.
    </span><span class="sh">"""</span>
    <span class="nf">return </span><span class="p">(</span><span class="n">x</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>


<span class="k">def</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">func</span><span class="p">,</span> <span class="n">h</span><span class="o">=</span><span class="mf">1e-8</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Automatically compute the derivative using central difference.
  
    The central difference formula is:
        f</span><span class="sh">'</span><span class="s">(x) ≈ [f(x + h) - f(x - h)] / (2h)
  
    This allows us to estimate the gradient without manually deriving
    the mathematical derivative of the function.
    </span><span class="sh">"""</span>
    <span class="nf">return </span><span class="p">(</span><span class="nf">func</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">h</span><span class="p">)</span> <span class="o">-</span> <span class="nf">func</span><span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">h</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">h</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">find_theoretical_minimum</span><span class="p">(</span><span class="n">func</span><span class="p">,</span> <span class="n">grad_func</span><span class="p">,</span> <span class="n">x_start</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span>
                             <span class="n">max_iter</span><span class="o">=</span><span class="mi">300</span><span class="p">,</span> <span class="n">tol</span><span class="o">=</span><span class="mf">1e-12</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Automatically finds a high-precision theoretical minimum.
  
    This helper function also applies gradient descent, but with a
    smaller learning rate and tighter tolerance. It is used as a
    reference result to compare with the main gradient descent output.
    </span><span class="sh">"""</span>
    <span class="n">x</span> <span class="o">=</span> <span class="n">x_start</span>

    <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">max_iter</span><span class="p">):</span>
        <span class="c1"># Compute the gradient at the current point
</span>        <span class="n">grad</span> <span class="o">=</span> <span class="nf">grad_func</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">func</span><span class="p">)</span>

        <span class="c1"># Move opposite to the gradient direction
</span>        <span class="n">x_new</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">grad</span>

        <span class="c1"># Stop if the update is extremely small
</span>        <span class="k">if</span> <span class="nf">abs</span><span class="p">(</span><span class="n">x_new</span> <span class="o">-</span> <span class="n">x</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">tol</span><span class="p">:</span>
            <span class="k">break</span>

        <span class="c1"># Update x for the next iteration
</span>        <span class="n">x</span> <span class="o">=</span> <span class="n">x_new</span>

    <span class="k">return</span> <span class="n">x</span><span class="p">,</span> <span class="nf">func</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>


<span class="c1"># =============================================
# Run Gradient Descent + Record History for Plot
# =============================================
</span>
<span class="c1"># The learning rate controls how large each update step is.
</span><span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.1</span>

<span class="c1"># Maximum number of iterations allowed.
</span><span class="n">max_iterations</span> <span class="o">=</span> <span class="mi">1000</span>

<span class="c1"># Initial guess. The algorithm starts from x = 10.
</span><span class="n">x</span> <span class="o">=</span> <span class="mf">10.0</span>

<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Starting gradient descent with AUTOMATED gradient...</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Initial x = </span><span class="si">{</span><span class="n">x</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="s">, f(x) = </span><span class="si">{</span><span class="nf">objective</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Lists to store history for plotting convergence.
# x_history stores all x values.
# f_history stores all function values.
# grad_history stores all gradient values.
</span><span class="n">x_history</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span><span class="p">]</span>
<span class="n">f_history</span> <span class="o">=</span> <span class="p">[</span><span class="nf">objective</span><span class="p">(</span><span class="n">x</span><span class="p">)]</span>
<span class="n">grad_history</span> <span class="o">=</span> <span class="p">[]</span>

<span class="c1"># Main gradient descent loop
</span><span class="k">for</span> <span class="n">iteration</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">max_iterations</span><span class="p">):</span>
    <span class="c1"># Compute numerical gradient at current x
</span>    <span class="n">grad</span> <span class="o">=</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">objective</span><span class="p">)</span>
    <span class="n">grad_history</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">grad</span><span class="p">)</span>

    <span class="c1"># Apply the gradient descent update rule
</span>    <span class="n">x_new</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">grad</span>

    <span class="c1"># Evaluate the function at the new x value
</span>    <span class="n">f_new</span> <span class="o">=</span> <span class="nf">objective</span><span class="p">(</span><span class="n">x_new</span><span class="p">)</span>

    <span class="c1"># Store updated values for later analysis and plotting
</span>    <span class="n">x_history</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">x_new</span><span class="p">)</span>
    <span class="n">f_history</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">f_new</span><span class="p">)</span>

    <span class="c1"># Print progress every 5 iterations
</span>    <span class="k">if</span> <span class="n">iteration</span> <span class="o">%</span> <span class="mi">5</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span>
            <span class="sa">f</span><span class="sh">"</span><span class="s">Iter </span><span class="si">{</span><span class="n">iteration</span><span class="si">:</span><span class="mi">2</span><span class="n">d</span><span class="si">}</span><span class="s"> | </span><span class="sh">"</span>
            <span class="sa">f</span><span class="sh">"</span><span class="s">x = </span><span class="si">{</span><span class="n">x_new</span><span class="si">:</span><span class="mf">8.6</span><span class="n">f</span><span class="si">}</span><span class="s"> | </span><span class="sh">"</span>
            <span class="sa">f</span><span class="sh">"</span><span class="s">f(x) = </span><span class="si">{</span><span class="n">f_new</span><span class="si">:</span><span class="mf">8.6</span><span class="n">f</span><span class="si">}</span><span class="s"> | </span><span class="sh">"</span>
            <span class="sa">f</span><span class="sh">"</span><span class="s">gradient ≈ </span><span class="si">{</span><span class="n">grad</span><span class="si">:</span><span class="mf">8.6</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span>
        <span class="p">)</span>

    <span class="c1"># Early stopping condition:
</span>    <span class="c1"># if x changes very little, the algorithm has converged.
</span>    <span class="k">if</span> <span class="nf">abs</span><span class="p">(</span><span class="n">x_new</span> <span class="o">-</span> <span class="n">x</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">1e-8</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">Converged early!</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">break</span>

    <span class="c1"># Update x for the next iteration
</span>    <span class="n">x</span> <span class="o">=</span> <span class="n">x_new</span>

<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">=== Final Result from Gradient Descent ===</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Optimal x found ≈ </span><span class="si">{</span><span class="n">x</span><span class="si">:</span><span class="p">.</span><span class="mi">10</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Minimum value f(x) found ≈ </span><span class="si">{</span><span class="nf">objective</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">10</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>


<span class="c1"># =============================================
# Automatic theoretical minimum
# =============================================
</span>
<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">Computing theoretical minimum automatically...</span><span class="sh">"</span><span class="p">)</span>
<span class="n">x_theory</span><span class="p">,</span> <span class="n">f_theory</span> <span class="o">=</span> <span class="nf">find_theoretical_minimum</span><span class="p">(</span><span class="n">objective</span><span class="p">,</span> <span class="n">numerical_gradient</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">=== Theoretical Minimum (auto-detected) ===</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Optimal x ≈ </span><span class="si">{</span><span class="n">x_theory</span><span class="si">:</span><span class="p">.</span><span class="mi">10</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Minimum value f(x) ≈ </span><span class="si">{</span><span class="n">f_theory</span><span class="si">:</span><span class="p">.</span><span class="mi">10</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>


<span class="c1"># =============================================
# CREATE CONVERGENCE PLOT using ONLY built-in Python
# Text-based ASCII plot - no matplotlib needed
# =============================================
</span>

<span class="k">def</span> <span class="nf">create_ascii_plot</span><span class="p">(</span><span class="n">x_vals</span><span class="p">,</span> <span class="n">f_vals</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="sh">"</span><span class="s">Convergence Plot</span><span class="sh">"</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="mi">70</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">20</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Creates a simple text-based ASCII plot of convergence.
  
    This function visualizes how f(x) changes over iterations.
    It does not require any external plotting library.
    </span><span class="sh">"""</span>

    <span class="c1"># Check if the input lists are empty
</span>    <span class="k">if</span> <span class="ow">not</span> <span class="n">x_vals</span> <span class="ow">or</span> <span class="ow">not</span> <span class="n">f_vals</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">No data to plot.</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span>

    <span class="c1"># Find minimum and maximum values for scaling
</span>    <span class="n">x_min</span><span class="p">,</span> <span class="n">x_max</span> <span class="o">=</span> <span class="nf">min</span><span class="p">(</span><span class="n">x_vals</span><span class="p">),</span> <span class="nf">max</span><span class="p">(</span><span class="n">x_vals</span><span class="p">)</span>
    <span class="n">f_min</span><span class="p">,</span> <span class="n">f_max</span> <span class="o">=</span> <span class="nf">min</span><span class="p">(</span><span class="n">f_vals</span><span class="p">),</span> <span class="nf">max</span><span class="p">(</span><span class="n">f_vals</span><span class="p">)</span>

    <span class="c1"># Avoid division by zero if all f values are the same
</span>    <span class="n">f_range</span> <span class="o">=</span> <span class="n">f_max</span> <span class="o">-</span> <span class="n">f_min</span> <span class="k">if</span> <span class="n">f_max</span> <span class="o">&gt;</span> <span class="n">f_min</span> <span class="k">else</span> <span class="mf">1.0</span>

    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="se">\n</span><span class="s">=== </span><span class="si">{</span><span class="n">title</span><span class="si">}</span><span class="s"> ===</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span>
        <span class="sa">f</span><span class="sh">"</span><span class="s">X range: </span><span class="si">{</span><span class="n">x_min</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> → </span><span class="si">{</span><span class="n">x_max</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> | </span><span class="sh">"</span>
        <span class="sa">f</span><span class="sh">"</span><span class="s">f(x) range: </span><span class="si">{</span><span class="n">f_min</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="s"> → </span><span class="si">{</span><span class="n">f_max</span><span class="si">:</span><span class="p">.</span><span class="mi">6</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span>
    <span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Minimal point should be near the bottom of the plot.</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># Build the plot line by line from high f(x) to low f(x)
</span>    <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">height</span><span class="p">):</span>
        <span class="c1"># Current function-value level represented by this row
</span>        <span class="n">f_level</span> <span class="o">=</span> <span class="n">f_max</span> <span class="o">-</span> <span class="p">(</span><span class="n">row</span> <span class="o">/</span> <span class="p">(</span><span class="n">height</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span> <span class="o">*</span> <span class="n">f_range</span>
        <span class="n">line</span> <span class="o">=</span> <span class="sh">""</span>

        <span class="k">for</span> <span class="n">col</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">width</span><span class="p">):</span>
            <span class="c1"># Map the current column to an index in the history list
</span>            <span class="n">idx</span> <span class="o">=</span> <span class="nf">int</span><span class="p">(</span><span class="n">col</span> <span class="o">/</span> <span class="p">(</span><span class="n">width</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">x_vals</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span>

            <span class="k">if</span> <span class="n">idx</span> <span class="o">&gt;=</span> <span class="nf">len</span><span class="p">(</span><span class="n">f_vals</span><span class="p">):</span>
                <span class="n">idx</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">f_vals</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>

            <span class="n">point_f</span> <span class="o">=</span> <span class="n">f_vals</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span>

            <span class="c1"># Draw '*' if a convergence point is close to this row level
</span>            <span class="k">if</span> <span class="nf">abs</span><span class="p">(</span><span class="n">point_f</span> <span class="o">-</span> <span class="n">f_level</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">f_range</span> <span class="o">/</span> <span class="p">(</span><span class="n">height</span> <span class="o">*</span> <span class="mf">1.5</span><span class="p">):</span>
                <span class="n">line</span> <span class="o">+=</span> <span class="sh">"</span><span class="s">*</span><span class="sh">"</span>
            <span class="k">else</span><span class="p">:</span>
                <span class="c1"># Draw a vertical line near the final minimum point
</span>                <span class="n">final_x_idx</span> <span class="o">=</span> <span class="p">(</span>
                    <span class="nf">int</span><span class="p">((</span><span class="n">x</span> <span class="o">-</span> <span class="n">x_min</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">x_max</span> <span class="o">-</span> <span class="n">x_min</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">width</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span>
                    <span class="k">if</span> <span class="n">x_max</span> <span class="o">&gt;</span> <span class="n">x_min</span>
                    <span class="k">else</span> <span class="n">width</span> <span class="o">//</span> <span class="mi">2</span>
                <span class="p">)</span>

                <span class="k">if</span> <span class="n">col</span> <span class="o">==</span> <span class="n">final_x_idx</span> <span class="ow">and</span> <span class="n">row</span> <span class="o">&gt;</span> <span class="n">height</span> <span class="o">*</span> <span class="mf">0.7</span><span class="p">:</span>
                    <span class="n">line</span> <span class="o">+=</span> <span class="sh">"</span><span class="s">|</span><span class="sh">"</span>
                <span class="k">else</span><span class="p">:</span>
                    <span class="n">line</span> <span class="o">+=</span> <span class="sh">"</span><span class="s"> </span><span class="sh">"</span>

        <span class="nf">print</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>

    <span class="c1"># X-axis labels
</span>    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">-</span><span class="sh">"</span> <span class="o">*</span> <span class="n">width</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="sh">'</span><span class="s">Start</span><span class="sh">'</span><span class="si">:</span><span class="o">&lt;</span><span class="mi">10</span><span class="si">}{</span><span class="sh">'</span><span class="s">→ Convergence →</span><span class="sh">'</span><span class="si">:</span><span class="o">^</span><span class="si">{</span><span class="n">width</span><span class="o">-</span><span class="mi">20</span><span class="si">}}{</span><span class="sh">'</span><span class="s">End/Min</span><span class="sh">'</span><span class="si">:</span><span class="o">&gt;</span><span class="mi">10</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span>
        <span class="sa">f</span><span class="sh">"</span><span class="s">Initial x = </span><span class="si">{</span><span class="n">x_vals</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">                  </span><span class="sh">"</span>
        <span class="sa">f</span><span class="sh">"</span><span class="s">Final x ≈ </span><span class="si">{</span><span class="n">x_vals</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> </span><span class="sh">"</span>
        <span class="sa">f</span><span class="sh">"</span><span class="s">(should be near theoretical </span><span class="si">{</span><span class="n">x_theory</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span>
    <span class="p">)</span>


<span class="c1"># Generate the ASCII plot
</span><span class="nf">create_ascii_plot</span><span class="p">(</span>
    <span class="n">x_history</span><span class="p">,</span>
    <span class="n">f_history</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">Gradient Descent Convergence (f(x) vs Iterations)</span><span class="sh">"</span>
<span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">Interpretation:</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">• The </span><span class="sh">'</span><span class="s">*</span><span class="sh">'</span><span class="s"> trace shows how f(x) decreases over iterations.</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">• It should drop quickly at first then flatten near the bottom → good convergence.</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">• The vertical </span><span class="sh">'</span><span class="s">|</span><span class="sh">'</span><span class="s"> marks the final minimal point found.</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">• If the plot flattens at the bottom and matches the theoretical minimum, optimization worked!</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h2 id="explanation-of-the-code">Explanation of the Code</h2>

<h3 id="1-objective-function">1. Objective Function</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">objective</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
    <span class="nf">return </span><span class="p">(</span><span class="n">x</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
</code></pre></div></div>

<p>This function defines the mathematical expression that the algorithm will minimize. Students can replace this function with another expression to test different optimization problems.</p>

<p>For example:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">return </span><span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="mi">3</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">5</span>
</code></pre></div></div>

<p>This new function would have its minimum near <code class="language-plaintext highlighter-rouge">x = 3</code>.</p>

<hr />

<h3 id="2-numerical-gradient-calculation-first-principles">2. Numerical Gradient Calculation (First Principles)</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">numerical_gradient</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">func</span><span class="p">,</span> <span class="n">h</span><span class="o">=</span><span class="mf">1e-8</span><span class="p">):</span>
    <span class="nf">return </span><span class="p">(</span><span class="nf">func</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">h</span><span class="p">)</span> <span class="o">-</span> <span class="nf">func</span><span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">h</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">h</span><span class="p">)</span>
</code></pre></div></div>

<p>This function calculates the derivative automatically using the <strong>central difference method</strong>, which is based on first principles from calculus. This is useful because students do not need to manually calculate the derivative.</p>

<p>For the example function:</p>

\[f(x) = x^2 + 1\]

<p>The exact derivative is:</p>

\[f'(x) = 2x\]

<p>However, the code estimates this derivative numerically using the formula:</p>

\[f'(x) \approx \frac{f(x+h)-f(x-h)}{2h}\]

<p>This makes the implementation more flexible because we can change the objective function without changing the gradient function.</p>

<hr />

<h3 id="3-gradient-descent-update">3. Gradient Descent Update</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x_new</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">grad</span>
</code></pre></div></div>

<p>This is the most important line in the program. It moves the current value of <code class="language-plaintext highlighter-rouge">x</code> in the direction that reduces the objective function.</p>

<p>If the gradient is positive, the update decreases <code class="language-plaintext highlighter-rouge">x</code>. If the gradient is negative, the update increases <code class="language-plaintext highlighter-rouge">x</code>. The goal is to keep moving until the gradient becomes close to zero.</p>

<hr />

<h3 id="4-learning-rate">4. Learning Rate</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.1</span>
</code></pre></div></div>

<p>The learning rate determines the size of each step.</p>

<ul>
  <li>If the learning rate is too small, convergence will be slow.</li>
  <li>If the learning rate is too large, the algorithm may overshoot the minimum.</li>
  <li>A suitable learning rate helps the algorithm converge smoothly.</li>
</ul>

<p>For this simple function, <code class="language-plaintext highlighter-rouge">0.1</code> works well.</p>

<hr />

<h3 id="5-early-stopping">5. Early Stopping</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="nf">abs</span><span class="p">(</span><span class="n">x_new</span> <span class="o">-</span> <span class="n">x</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">1e-8</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="se">\n</span><span class="s">Converged early!</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">break</span>
</code></pre></div></div>

<p>Early stopping prevents unnecessary iterations. If the value of <code class="language-plaintext highlighter-rouge">x</code> changes very little between two consecutive iterations, the algorithm assumes that it has already reached the minimum region.</p>

<p>This makes the program more efficient.</p>

<hr />

<h3 id="6-history-recording">6. History Recording</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x_history</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span><span class="p">]</span>
<span class="n">f_history</span> <span class="o">=</span> <span class="p">[</span><span class="nf">objective</span><span class="p">(</span><span class="n">x</span><span class="p">)]</span>
<span class="n">grad_history</span> <span class="o">=</span> <span class="p">[]</span>
</code></pre></div></div>

<p>The code stores the values of <code class="language-plaintext highlighter-rouge">x</code>, <code class="language-plaintext highlighter-rouge">f(x)</code>, and the gradient during training. This is useful for analyzing the optimization process.</p>

<p>The stored data also allows us to create the convergence plot.</p>

<hr />

<h3 id="7-ascii-convergence-plot">7. ASCII Convergence Plot</h3>

<p>The function <code class="language-plaintext highlighter-rouge">create_ascii_plot()</code> generates a text-based plot using only built-in Python. The <code class="language-plaintext highlighter-rouge">*</code> symbols show how the objective function decreases over time.</p>

<p>This is very helpful for beginners because it visually demonstrates convergence without requiring external libraries.</p>

<p>A good convergence pattern usually shows:</p>

<ul>
  <li>A rapid decrease at the beginning.</li>
  <li>A slower decrease near the minimum.</li>
  <li>A flat region near the bottom of the plot.</li>
</ul>

<hr />

<h2 id="expected-behavior-of-the-algorithm">Expected Behavior of the Algorithm</h2>

<p>Since the objective function is:</p>

\[f(x) = x^2 + 1\]

<p>and the initial value is:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="mf">10.0</span>
</code></pre></div></div>

<p>Gradient Descent will gradually move <code class="language-plaintext highlighter-rouge">x</code> toward zero. As <code class="language-plaintext highlighter-rouge">x</code> approaches zero, <code class="language-plaintext highlighter-rouge">f(x)</code> approaches one.</p>

<p>Therefore, the expected result is approximately:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Optimal x found ≈ 0
Minimum value f(x) found ≈ 1
</code></pre></div></div>

<p>The exact printed values may contain very small numerical differences due to floating-point computation and numerical gradient approximation.</p>

<hr />

<h2 id="important-learning-points-for-students">Important Learning Points for Students</h2>

<p>After studying and running this code, students should understand the following ideas:</p>

<ol>
  <li>
    <p><strong>Optimization means finding the best value of a variable.</strong><br />
In this example, the best value is the <code class="language-plaintext highlighter-rouge">x</code> that minimizes <code class="language-plaintext highlighter-rouge">f(x)</code>.</p>
  </li>
  <li>
    <p><strong>The gradient tells the direction of change.</strong><br />
A positive gradient means the function increases as <code class="language-plaintext highlighter-rouge">x</code> increases. A negative gradient means the function decreases as <code class="language-plaintext highlighter-rouge">x</code> increases.</p>
  </li>
  <li>
    <p><strong>Gradient Descent moves opposite to the gradient.</strong><br />
This is why the update rule subtracts the gradient term.</p>
  </li>
  <li>
    <p><strong>The learning rate controls the step size.</strong><br />
Choosing a good learning rate is important for stable convergence.</p>
  </li>
  <li>
    <p><strong>Numerical gradients make the code flexible.</strong><br />
The central difference method allows the program to estimate derivatives automatically without manual differentiation.</p>
  </li>
  <li>
    <p><strong>Convergence can be visualized.</strong><br />
The ASCII plot shows how the objective function decreases over iterations.</p>
  </li>
</ol>

<hr />

<h2 id="applications-of-gradient-descent">Applications of Gradient Descent</h2>

<p>Gradient Descent is used in many real-world problems, including:</p>

<h3 id="machine-learning-model-training">Machine Learning Model Training</h3>

<p>In machine learning, models are trained by minimizing a loss function. Gradient Descent updates model parameters so that prediction error becomes smaller.</p>

<h3 id="neural-networks-and-deep-learning">Neural Networks and Deep Learning</h3>

<p>Deep learning uses advanced forms of Gradient Descent such as stochastic gradient descent, mini-batch gradient descent, Adam, and RMSProp to train large neural networks.</p>

<h3 id="engineering-design-optimization">Engineering Design Optimization</h3>

<p>Engineers use optimization to minimize cost, weight, energy consumption, or error while satisfying design constraints.</p>

<h3 id="energy-management-systems">Energy Management Systems</h3>

<p>In power and energy systems, optimization can help schedule batteries, renewable energy sources, diesel generators, and grid power to reduce operating costs.</p>

<h3 id="control-systems">Control Systems</h3>

<p>Gradient-based methods are used to tune controllers and minimize tracking error or energy consumption.</p>

<hr />

<h2 id="how-students-can-experiment-with-this-code">How Students Can Experiment with This Code</h2>

<p>Students can modify only the <code class="language-plaintext highlighter-rouge">objective()</code> function and observe how the algorithm behaves.</p>

<p>For example, try this function:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">objective</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
    <span class="nf">return </span><span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="mi">5</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">2</span>
</code></pre></div></div>

<p>The minimum should occur near:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = 5
f(x) = 2
</code></pre></div></div>

<p>Students can also experiment with the learning rate:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.01</span>
<span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.5</span>
</code></pre></div></div>

<p>By comparing the results, students can see how learning rate affects convergence speed and stability.</p>

<hr />

<h2 id="final-words">Final Words</h2>

<p>Gradient Descent is one of the most fundamental algorithms in optimization and machine learning. Although the mathematical idea is simple, it is the foundation of many powerful modern learning systems.</p>

<p>This pure Python implementation is especially useful for beginners because it avoids external libraries and clearly shows every step of the algorithm. Students can see how the objective function is defined, how the gradient is estimated using first principles, how the update rule works, and how convergence can be visualized using a simple ASCII plot.</p>

<p>By understanding this example deeply, students will be better prepared to study advanced optimization techniques, machine learning algorithms, and neural network training methods.</p>

<hr />

<h2 id="key-outcomes">Key Outcomes</h2>

<p>By the end of this tutorial, students should be able to:</p>

<ul>
  <li>Define Gradient Descent in simple terms.</li>
  <li>Explain the role of the gradient in optimization.</li>
  <li>Understand and apply the Gradient Descent update rule.</li>
  <li>Implement Gradient Descent from scratch in Python.</li>
  <li>Use numerical differentiation (central difference method) to estimate gradients from first principles.</li>
  <li>Interpret convergence behavior using stored history and plots.</li>
  <li>Modify the objective function and test new optimization examples.</li>
</ul>

<p>Gradient Descent may look simple in this one-variable example, but the same basic principle extends to large-scale machine learning and engineering optimization problems with thousands or even millions of variables.</p>]]></content><author><name>Sohag Kumar Saha</name><email>sohag@ieee.org</email></author><category term="Machine Learning" /><category term="Optimization" /><category term="Python" /><category term="Tutorial" /><category term="gradient descent" /><category term="optimization" /><category term="python" /><category term="machine learning" /><category term="beginner-friendly" /><category term="numerical methods" /><summary type="html"><![CDATA[Learn the working principle of gradient descent using a fully user-defined Python implementation with numerical gradients, automatic minimum detection, and an ASCII convergence plot.]]></summary></entry></feed>