Nonlinear Optimization I - 553.761 (Fall 2023)

This course considers algorithms for solving various important nonlinear optimization problems and, in parallel, develops the supporting theory. Our primary focus will be on unconstrained optimization. Topics will include: necessary and sufficient optimality conditions; first-order methods, e.g., (sub)gradient descent, for smooth and nonsmooth optimization, second-order methods, e.g., Newton, quasi-Newton and trust-region methods; stochastic gradient and coordinate descent methods; linear and nonlinear least squares problems and conjugate gradient methods. If time permits we will cover: linear programming, minimax optimization, composite optimization.
Coordinates #
Time: TTh 4:30PM - 5:45PM
Location: Maryland 109
Personnel #
Instructor:
Mateo Díaz (mateodd at
jhu dot edu)
OH M 4:00PM - 6:00PM Wyman S429
Teaching Assistants:
Kaleigh Rudge (krudge2 at
jhu dot edu)
OH W 10:00AM - 11:20AM Wyman S425
Thabo Samakhoana (tsamakh1 at
jhu dot edu)
OH Th 10:00AM - 11:20AM Wyman S425
Roy Siegelmann (rsiege15 at
jh dot edu)
OH T 7:00PM - 8:20PM Wyman S425
Syllabus #
The syllabus can be found here.
Lecture notes #
- Lecture 1: Introduction
- Lecture 2: Minima and first-order optimality conditions
- Lecture 3: Second-order optimality conditions and basic convexity
- Lecture 4: More convexity, geometry of smooth convex functions, and subdifferentials
- Lecture 5: Gradient descent, descent lemma, and stepsizes
- Lecture 6: Guarantees for nonconvex smooth optimization
- Lecture 7: Better guarantees for convex, and strongly convex functions
- Lecture 8: Accelerated gradient descent
- Lecture 9: Lower bounds
- Lecture 10: Proximal operator and Forward-backward method
- Lecture 11: Guarantees for the Forward-backward method
- Lecture 12: Acceleration and Alternating projections
- Lecture 13: Subgradient descent
- Lecture 14: Stochastic gradient descent
- Lecture 15: SGD guarantees
- Lecture 16: Newton-Raphson method
- Lecture 17: Newton-Raphson local quadratic convergence and its issues
- Lecture 18: Modified Newton methods
- Lecture 19: Quasi-Newton methods
- Lecture 20: BFGS and DFP
- Lecture 21: Convergence guarantees for BFGS
- Lecture 22: Conjugate gradient method
- Lecture 23: CG guarantees and Nonlinear Least Squares
- Lecture 24: Trust Region Methods
- Lecture 25: Trust Region Methods continued
- Lecture 26: Weakly convex functions and composite optimization
Homework #
Homework assignments (approximately five) will be posted here and on the course Canvas. Most homework assignments include at least one question that involves the writing and testing of code. Python is prefered, and I will use it in my demos.
- Homework 0: This is an optional homework, just to help you learn/practice python. Do not submit it.
- Homework 1.
- Homework 2.
- Homework 3.
- Homework 4.
- Homework 5.
Textbook #
We will not be following any particular textbook. Notes will be posted on Canvas and the website. Other potentially useful textbooks as references (but not required):
- J. Nocedal and S. Wright, Numerical Optimization, Second Edition, Springer, (2006),
- Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, Norwell, MA, (2004),
- A. Ruszcynski, Nonlinear Optimization, Princeton University Press, Princeton, NJ (2006),
- D. P. Bertsekas, Nonlinear Programming, Second Edition, Athena Scientific, Belmont, MA, (1999).
- D. Drusvyatskiy, Convex Analysis and Nonsmooth Optimization, Lecture notes, (2020) PDF.
Grading system #
We will use an ingenious grading scheme invented by Ben Grimmer for one of the previous iterations of this class. Course grades will be based on four components: Homework, Midterm, Final, Participation. These are described individually below. I will maximize the course score that I give each of you. To optimize each student’s course score, I will solve the following optimization problem.
Denote the student’s performance in each of these four components as
\[ \begin{array}{rl} C_H &= \text{Homework score},\\ C_M &= \text{Midterm score},\\ C_F &= \text{Final score, and}\\ C_P &= \text{Participation score}. \end{array}\] All of the above are scores out of \(100\). The optimization problem will be solved individually for each student to give them the best rubric and highest course score I can justify. Then, a rubric for grading this student is given by selecting weights for these four components as \[ \begin{array}{rl} H &= \text{Homework weight}, \\ M &= \text{Midterm exam weight}, \\ F &= \text{Final exam weight}, \\ P &= 100 - H - M - F. \end{array}\] Notice the participation weight is determined by the other three since they must sum to 100. Each student’s score is given by maximizing over the set of all reasonable rubrics \((H, M, F)\) by solving \[\begin{array}{rl} \text{max} & C_H H + C_M M + C_F F + C_P (100 - H - M - F)&&\\ \text{subject to} & \\ & \begin{array}{rllll} && (H, M, F) &\in \mathbb{R}^3&\\ && H + M + F &\leq 100 &\text{(Percentages are at most 100)}\\ 15 &\leq& H, M & &\text{(Homework and Midterm matter)}\\ M &\leq& F &&\text{(Final is more important than Midterm)}\\ 50 & \leq& M + F & \leq 80 & \text{(Exams are most, but not all of the score)}\\ 90 & \leq& H + M + F && \text{(H, M, and F are the majority of the score)}. \end{array} \end{array}\]
Midterm and Final #
Additional assessment will be based on one midterm exam and the final exam, both will be take home. The dates of the midterm and final exams will be posted on the course Canvas as they become available, although the date of the final exam is determined by the official JHU final exam schedule. The Midterm will be made available on Canvas on Friday October 13, 2023 and will be due at the beginning of class on Tuesday, October 17, 2023. The Final Exam will be made available on Canvas on Wednesday, December 13 and due by midnight on Friday, December 15.
Participation #
The grading program described above will assign between 0 and 10 percent of the course to be a participation grade. As a result, you can get full marks in the course without any participation. Students will receive full points in participation for doing any of: scribing lectures, engaging during lectures, engaging in office hours, or asking insightful questions.
Scribing #
Scribing at least one lecture is strongly encouraged. As mentioned above, scribing counts towards participation. Given that we have more lectures than students, there might be at most two scribes per lecture. Please send me your notes within one week of the lecture. I expect the notes to be fairly polished, and I might ask to improve them before I upload them to the website. Whenever possible, please add figures that clarify concepts and ideas. I recommend using GeoGebra, or Grapher (if you use macOS). Also, we all make mistakes, so run the notes by a grammar checker, e.g., Microsoft Word or Grammarly.