Nonlinear Optimization I - 553.761 (Fall 2024)
This course considers algorithms for solving various important nonlinear optimization problems and, in parallel, develops the supporting theory. Our primary focus will be on unconstrained optimization. Topics will include: necessary and sufficient optimality conditions; first-order methods, e.g., (sub)gradient descent, for smooth and nonsmooth optimization, second-order methods, e.g., Newton, quasi-Newton and trust-region methods; stochastic gradient and coordinate descent methods; linear and nonlinear least squares problems and conjugate gradient methods. If time permits we will cover: linear programming, minimax optimization, composite optimization.
Coordinates #
Time: TTh 4:30PM - 5:45PM
Location: Ames 234
Personnel #
Instructor:
Mateo Díaz (mateodd at
jhu dot edu)
OH M 4:00PM - 5:30PM Wyman S429
Teaching Assistants:
Pedro Izquierdo Lehmann (pizquie1 at
jhu dot edu)
OH Th 10:00AM - 11:30AM Wyman S425
Daniel Lopez-Castaño (jlopezc1 at
jhu dot edu)
OH Tu 10:00AM - 11:30AM Wyman S425
Thabo Samakhoana (tsamakh1 at
jhu dot edu)
OH Wed 10:30AM - 11:15AM Wyman S425
Syllabus #
The syllabus can be found here.
Lecture notes #
The lecture notes will be posted here.
- Lecture 1: Introduction
- Lecture 2: Minima and first-order optimality conditions
- Lecture 3: Second-order optimality conditions and basic convexity
- Lecture 4: Geometry of smooth convex functions and subdifferentials
- Lecture 5: Gradient descent, descent lemma, and stepsizes
- Lecture 6: Guarantees for nonconvex smooth optimization
- Lecture 7: Better guarantees for convex, and strongly convex functions
- Lecture 8: Accelerated gradient descent
- Lecture 9: Lower bounds
- Lecture 10: Structured nonsmooth problems and the proximal operator
- Lecture 11: Guarantees for the Forward-backward method
- Lecture 12: Acceleration and Alternating projections
- Lecture 13: Subgradient descent
- Lecture 14: Stochastic gradient descent
- Lecture 15: SGD guarantees
- Lecture 16: Newton-Raphson method
- Lecture 17: Newton-Raphson local quadratic convergence and its issues
- Lecture 18: Modified Newton methods
- Lecture 19: Quasi-Newton methods
- Lecture 20: BFGS and DFP
- Lecture 21: Convergence guarantees for BFGS
- Lecture 22: Conjugate gradient method
- Lecture 23: CG guarantees and Nonlinear Least Squares
- Lecture 24: Trust Region Methods
- Lecture 25: Trust Region Methods continued
- Lecture 26: Weakly convex functions and composite optimization
Homework #
Homework assignments (approximately five) will be posted here and on the course Canvas. Most homework assignments include at least one question that involves the writing and testing of code. Python is prefered, and I will use it in my demos.
Textbook #
We will not be following any particular textbook. Notes will be posted on this website. Other potentially useful textbooks as references (but not required):
- J. Nocedal and S. Wright, Numerical Optimization, Second Edition, Springer, (2006),
- Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, Norwell, MA, (2004),
- A. Ruszcynski, Nonlinear Optimization, Princeton University Press, Princeton, NJ (2006),
- D. P. Bertsekas, Nonlinear Programming, Second Edition, Athena Scientific, Belmont, MA, (1999).
- D. Drusvyatskiy, Convex Analysis and Nonsmooth Optimization, Lecture notes, (2020) PDF.
Grading system #
We will use an ingenious grading scheme invented by Ben Grimmer for one of the previous iterations of this class. Course grades will be based on four components: Homework, Midterm, Final, Participation. These are described individually below. I will maximize the course score that I give each of you. To optimize each student’s course score, I will solve the following optimization problem.
Denote the student’s performance in each of these four components as
\[ \begin{array}{rl} C_H &= \text{Homework score},\\ C_M &= \text{Midterm score},\\ C_F &= \text{Final score, and}\\ C_P &= \text{Participation score}. \end{array}\] All of the above are scores out of \(100\). The optimization problem will be solved individually for each student to give them the best rubric and highest course score I can justify. Then, a rubric for grading this student is given by selecting weights for these four components as \[ \begin{array}{rl} H &= \text{Homework weight}, \\ M &= \text{Midterm exam weight}, \\ F &= \text{Final exam weight}, \\ P &= 100 - H - M - F. \end{array}\] Notice the participation weight is determined by the other three since they must sum to 100. Each student’s score is given by maximizing over the set of all reasonable rubrics \((H, M, F)\) by solving \[\begin{array}{rl} \text{max} & C_H H + C_M M + C_F F + C_P (100 - H - M - F)&&\\ \text{subject to} & \\ & \begin{array}{rllll} && (H, M, F) &\in \mathbb{R}^3&\\ && H + M + F &\leq 100 &\text{(Percentages are at most 100)}\\ 15 &\leq& H, M & &\text{(Homework and Midterm matter)}\\ M &\leq& F &&\text{(Final is more important than Midterm)}\\ 50 & \leq& M + F & \leq 80 & \text{(Exams are most, but not all of the score)}\\ 90 & \leq& H + M + F && \text{(H, M, and F are the majority of the score)}. \end{array} \end{array}\]
Midterm and Final #
Additional assessment will be based on one midterm exam and the final exam, both will be take home. The dates of the midterm and final exams will be posted on the course Canvas as they become available, although the date of the final exam is determined by the official JHU final exam schedule.
Participation #
The grading program described above will assign between 0 and 10 percent of the course to be a participation grade. As a result, you can get full marks in the course without any participation. Students will receive full points in participation for engaging during lectures, engaging in office hours, and/or asking insightful questions.