On Modified DFP Update for Unconstrained Optimization
Saad Shakir Mahmood, Samira Hassan Shnywer
Department of Mathematics, College of Education, Almustansiryah University, Baghdad, Iraq
Email address:
To cite this article:
Saad Shakir Mahmood, Samira Hassan Shnywer. On Modified DFP Update for Unconstrained Optimization. American Journal of Applied Mathematics. Vol. 5, No. 1, 2017, pp. 1930. doi: 10.11648/j.ajam.20170501.13
Received: December 25, 2016; Accepted: January 9, 2017; Published: February 6, 2017
Abstract: In this paper, we propose a new modify of DFP update with a new extended quasiNewton condition for unconstrained optimization problem so called update. This update is based on a new Zhang Xu condition we show that update preserves the value of determinant of the next Hessian matrix equal to the value of determinant of current Hessian matrix theoretically and practically. Global convergence of the modify is established. Local and super linearly convergence are obtained for the proposed method. Numerical results are given to compare a performance of the modify method with the standard DFP method on same function is selected.
Keywords: QuasiNewton Equation, the DFP Updating Formula, Global Convergence and Super Linearly Convergence
1. Introduction
The quasiNewton methods are very useful and efficient methods for solving the unconstrained minimization problem
(1)
Where is twice continuously differentiable. Starting from point and a symmetric positive definite matrix quasiNewton method generates sequence { and { by the iteration of the form
(2)
where the update satisfies the following famous quasiNewton equation or (secant equation):
(3)
with
(4)
where is the step length and is the search direction that is obtained by solving the equation:
(5)
in which is the gradient of at and is an approximation to the Hessian matrix The updating matrix is required to satisfy the usual quasiNewton equation (3) with equation (4). So is reasonable approximation to .
The update consist of iteration of the form (2) where is the search direction which of the form (5) and the Hessian approximation B is update by the ( formula with quasiNewton equation (3).
(6)
where (7)
The formula is the modifying of DFP update which is satisfy equation (3) and in the next section.
In the following discussion, we shall use and to denote the norm and the frobenius norm, respectively. For a symmetric positive definite matrix , we show also use the following weighted norm
(8)
2. Update
Let and from Zhang Xu condition, we have:
which gives (9)
From quasiNewton equation (3), we get:
which is extended of quasiNewton equation (10)
Now, substitution (9) in (10), we obtain:
(11)
Let we consider the determinant of Hessian matrix for the DFP update [7] with replace each by , we get:
Det (12)
And from equation (9), we obtain:
Det (13)
Which gives:
Det (14)
we suppose that Det .
Then the equation (14) becomes:
Now, by multiplying both sides with we get:
(15)
Hence
.
Finally:
(16)
In addition, by more simplifying we can write as follows:
(17)
which is equivalent the following formula:
(18)
and also the formula (17) is equivalent the following formula:
(19)
where is defined by (9). It is clear that any formula is symmetric and satisfies the quasiNewton equation.
3. Convergence Analysis
Now, we study the global convergence of the update:
At the first, we need the following assumptions:
Assumption (3.1)
: f: is twice continuously differentiable on convex set
is uniformly convex, i.e., there exist positive constants n and N such that for all , which is convex, where is starting point, we have:
(20)
The assumption (B) implies that is positive definite on ,
and that f has a unique minimizer in L.
By definition of weighted norm (8) and equation (9), satisfy extended (QN) equation then and that is then
(21)
where .
Now by return to the property (20) and from definition of weighted norm, we get:
(22)
Where is the average Hessian, which is defined as:
(23)
and
(24)
Since also
(25)
Assumption (B) of (3.1) means that is positive definite proven, thus its square root is well defined. There is a symmetric square root is satisfying
If we let , then (26)
(27)
Substitution equation (26) in equation (27), we get
(28)
And from (20) , we know Then we can divide both sides from this, we get:
That mean
(29)
Then from equation (28), we get:
(30)
In addition, from equations (21) and (9), we get:
(31)
Which gives:
(32)
And
(33)
Therefore, from the above discussion, we have:
Lemma (3.2)
Let f: satisfy Assumption (3.1). Then
Are bounded.
Hence, we have
Are bounded.
Lemma (3.3)
Under exact line search, and are convergent.
Which gives:
is convergent and bounded.
From lemma (3.2), we have
is convergent and bounded.
Since is convergent, we get:
is convergent, then
Which implies is convergent, where is the minimum of .
Lemma (3.4)
For all vector x, the inequality
(34)
holds, where is the minimum of .
Proof:
Since the function
is a convex function, we have:
In particular, set , then we have
Which gives
By multiplying both sides with (1) we get
By CauchySchwarz inequality, we get
(35)
From (24) and (9), we have
(36)
Hence
Let
(37)
Substituting (37) into (35) establishes (34).
Theorem (3.5)
Suppose that satisfies Assumption (3.1). Then under exact line search the sequence { generated by method converges to the minimizer of f.
Proof: Consider formula of inverse Hessian approximation
(38)
and from formula (19) of Hessian approximation
(39)
Obviously, By computing the trace of (39), we have
(40)
The middle two terms can be written as:
From equation (4) and (5), we have
From the property of the DFP method [5] , we obtain:
From equation (4)and (5) again, we get
(41)
Fromthe positive definiteness property of , then (38) becomes:
Which gives:
By finding the inverse number of the expression, we obtain
(42)
using (41) and (42), then (40) becomes:
(43)
By recurrence, we obtain:
(44)
Therefore, by lemma (3.2), there exists a positive number N which is independent of k, such that
(45)
In the left part, we will prove that if the theorem does not hold, then the sum of the last two term in (45) is negative.
Now consider the trace of , from (38), we have
(46)
Since is positive definite, the right hand side of (46)is positive. By lemma (3.2) there exists which is independent of k such that
(47)
Note that
(48)
and
(49)
By the positive definiteness of and exact line search, then by using (49), (48) and (47) in turn, we obtain:
(50)
By using CauchySchwarz inequality and (50)
(51)
Now suppose that the theorem is not true, that is, there exists such that for all sufficiently large k,
(52)
Also, by lemma (3.3), there exists a constant such that
which gives and further . Then, by (51) and (52) we deduce, for any k sufficiently large, that
(53)
The above inequality implies that the sum of the last two terms in (45) is negative.
By (53) and (45), we immediately obtain:
(54)
Note that, for a symmetric positive definite matrix, the inverse of trace is the lower bound of the last eigenvalue of inverse of the matrix. Then, it follows from (54) that
(55)
Where is the lower bound of the last eigenvalue of . However, from the property of Rayleigh quotient [9], we have
(56)
which contradicts (55). This contradiction proves that { converges to and that our theorem holds.
4. Local Linear Convergence of Method
Now, we shall prove the local linear convergence of method of equivalent formula (18) for α>0 under exact line search.
The iteration we consider is:
(57)
(58)
So that replace and by and respectively.
In this discussion of this subsection, we need the following assumption:
Assumption (4.1)
The mapping is continuously differentiable in open convex set
: There is in D such that and is nonsingular. : satisfies the Lipshiz condition at that is, there exists a constant such that
,
We begin with some general converges results.
Lemma (4.2) [9]
Let satisfy assumption (A). Then for any we have
(59)
Furthermore, assume that satisfy assumption (C), then
(60)
and
(61)
where .(62)
Lemma (4.3)
Let ,. If
then is positive and
(63)
Conversely, if and (63) holds, then
(64)
Theorem (4.4) [9]
Let is satisfy the assumptions and in (4.1), U an update function such that for all and
, we have (65)
(66)
where is some constant, or that
(67)
where and are some constants, and
(68)
Then, there exist constants and such that, for all and , the iteration (57) and (65) is welldefined, and converges to linearly.
To study the local convergence of method, it is required to estimate
As show in the following theorem, there is a matrix
in Since
(69)
It is a secant of the angle between and . In general, and is not parallel. So may be quite big, and it is not suitable to estimate by means of norm. However, near ,
closes a quadratic function, and hence and are approximately parallel, where . In motivates us some weighted norm to estimate . Then, we have
(70)
Below, we first develop the linear convergence of .
Theorem (4.5)
Let satisfy Assumption in (4.1). Also let
(71)
In a neighborhood of , where , then, there exist and such that for and , the iteration (57) and equivalent formula (18) of method is welldefined, and the produced sequence { converges to linearly.
Proof:
Based on the lemma (4.4), to prove the linearly convergence of method, it is enough to prove
(72)
where and are positive constants independent of and , is defined by (68).
Le and which is and respectively, also are symmetric positive definite matrices.
From (57) and the formula (18) of , it follows that
(73)
where
(74)
Thus, from (73), one has
(75)
Note that is defined by (69).
The first term on the right hand side of (75) can be estimate as:
(76)
moreover, for the rest two terms on the right hand side of (75) and by (70) we have:
(77)
and
(78)
where
(79)
which by lemma (4.3) implies the curvature condition
Now, we estimate by using (76), (77) and (78), we have
(80)
Note from lemma (4.2) that
(81)
Since
By lemma (4.3), we have:
Consequently, if and are in the neighborhood of , then
So, the two terms in (80) satisfy respectively.
(82)
and
(83)
combining (82) with (83) into (80), we have:
Which completes the proof by applying lemma (4.4) with and
5. Super Linear Convergence of Method
Now, we shall prove the super linear convergence of the method. the convergence analysis in this section mainly Dennis and Mor'e [2]. The super linear convergence of the sequence generated by the iteration (57) is generally characterized by the following theorem.
Theorem (5.1) [2]
Let is satisfy and in Assumption (4.1). Let { be a sequence, of nonsingular matrices. Suppose for , that the iteration generated by (57) remain in D. . Suppose also that { converges to Then {converges to at super linear rate if and only if
(84)
Theorem (5.1) indicates that if converges to along the direction then method converges super linearly. This theorem is very important in analysis of . Equation (84)is called the Dennis
and Mor'e characterization of super linear convergence.
To apply theorem (5.1), we need a refinement estimate which is established with the help of the following lemmas.
Lemma (5.2)
Let be a nonsingular symmetric matrix, If, for ,
The inequality
(85)
holds, then for any nonzero matrix , we have:
(86)
(87)
(88)
Where
(89)
Proof:
Note that
(90)
Also, it follows from CauchySchwarz inequality and (85) that
(91)
From (90) and (91), we get
And, in same way, we have
Which gives the first result (a).
Now, we will prove (b) By using the property of the Frobenius norm of a rankone update.
To prove (b) we need the following property
In particular,
Let
By using (a) and (89), we get:
And therefore
from (89) again, we get:
Which shows (b):
Finally, we prove by means of (b). It enough to prove that
(92)
from (a), we have:
(93)
which proves (c).
We have known that if satisfies Assumption (4.1), then (72) holds.
Then under Assumptions of the theorem (4.5), the preceding lemma can be applied with the sitting
, ,
,
and
Due to the linear convergence of the sequence {gives in theorem (4.5), we have and can sequnetly by lemma (5.3), there exists a constant such that
(94)
Hence
,
is exists.
Lemma (5.3):
Let { and { be sequences of nonnegative numbers satisfying
(95)
and
(96)
then { is converges.
These results together then give rise to a refinement estimate of as follows:
Lemma (5.4)
Under the assumption of theorem(4.5), there exist positive constants and , such that , we have
(97)
where is defined by (68) and
(98)
Proof:
First m we write . From (75), we have:
Let
(99)
And
Similar to the proof of the theorem (4.5), we known that there exists and such that
If we let , then (75) becomes:
(100)
Since
Then, by use of lemma (5.2), and from (99), we get:
Note that thus, by using lemma (5.2) again, we obtain:
Where is defined by (98), From lemma (5.2) again and by (89), we get
(101)
Where Substitution (101) into (100) and from (99), we deduce the desired result (97). The proof is complete
Finally, using the above four lemmas, we can establish the following super linear convergence theorem of method.
Theorem (5.5)
Under the assumption of theorem (4.5), method defined by (57) and (58) is convergent super linearly.
Proof:
Since then (95) can be written as:
Summing the above from k = 1 to infinity gives:
Since, from theorem (4.5), { is linearly convergent, then
Also, since { is bounded, then
By (94), the exists.
Hence, if some subsequence of {converges to zero.
The whole sequence converges to zero.
Therefore,
Which proves the super linear convergence of {by theorem (5.1). Otherwise, there exists a positive constant such that
then
Since it follows that
Furthermore, we have:
Where is defined by (98).
Then, by using , we immediately obtain:
Hence, { is convergent super linearly, we complete the proof.
6. Numerical Results
This section is devoted to numerical experiments. Our purpose was to check whether the modified algorithm provide improvements on the corresponding standard DFP algorithm. The programs were written in MATLAP. The reason for their selection is that the problems appear to have been used in standard problems in most the literature these functions represent a result of application in the branch of technology and industry.
The test functions are chosen as follows:
. [1]
A quadratic function. [10]
Rosen brook's function. [4]
.
Rosenbroc'k cliff function [8]
.
Generalized Edger function. [1]
.
Extended Himmelbla function [1]
Rosen rock's function [6]
Trigonometric function [1]
.
Extended Rosen rock function [1]
. c = 100
Watson function [6]
.
. and
Freudenstein and Roth function [3]
Table 1. Numerical results for DFP and update.
Fun.  Starting point  Dim.  DFP  The Best  
Feval  Iter.  Feval  Iter.  
1 
 2  6.0962e016  2  1.7909e017  2  DFP 
1 
 2  2.7900e020  2  2.7900e020  2  Same 
1 
 2  3.2494e016  2  2.9082e017  2  DFP 
2 
 4  1.5076e007  9  9.0146e007  7 

2 
 4  1.8142e013  3  7.3965e011  3 

2 
 4  7.9747e008  34  7.2012e006  9 

3 
 2  1.7525e018  20  1.6887e016  14 

3 
 2  1.5615e016  3  7.5140e015  3 

3 
 2  3.0198eo17  14  2.4041e017  14  Same 
4 
 4  0.2011  3  0.2011  3  Same 
4 
 12  0.2004  3  0.2004  3  Same 
4 
 12  0.2007  3  0.2007  3  Same 
5 
 2  5.5433e015  8  8.8715e014  3 

5 
 18  2.9846e008  9  3.6384e008  6 

5 
 18  1.4714e010  7  5.3611e009  6 

6 
 2  8.1785e012  7  3.2157e011  6 

6 
 2  1.3697e013  8  3.1675e009  8 

6 
 2  1.1910e012  8  2.7507e012  8 

7 
 8  2.0658e011  6  1.9815e010  24 

7 
 8  5.1583e012  6  3.1487e011  8 

7 
 4  1.7728e010  2  1.7728e010  2  Same 
8 
 12  7.6464e007  13  4.6941e006  5 

8 
 12  3.7119e006  13  3.7598e006  4 

8 
 12  6.8965e007  21  1.8229e006  5 

9 
 3  0  1  0  1  Same 
9 
 3  0.0220  40  0.0118  9 

9 
 2  3.5522  31  40.2765  8 

10 
 4  6.0707e017  3  1.0838e016  3 

10 
 4  2.1626e013  3  3.5351e008  3 

10 
 10  1.6053e010  2  5.5796e010  2 

11 
 2  5.7785e010  12  2.2977e010  8 

11 
 2  2.3185e013  15  1.7572e013  9 

11 
 2  1.6761e012  13  8.3077e012  10 

7. Conclusion
In this thesis, we introduce a new modified of the DFP say update, we show that under certain circumstances this update preserve the value of the determinant of hessian matrix and without QuasiNewton or based on the Zhang Xu condition.
Global convergence of the proposed method establishes under exact line search. The proposed method possesses local linearly convergence and super linearly convergence for unconstrained optimization problem.
Numerical results show that the proposed is efficient for unconstrained optimization problem compared the modified method with the standard DFP method on same function is selected, which suggests that a good improvement has been achieved.
References