preprint tasora anitescu comp

Upload: sadasdasdff

Post on 05-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Preprint Tasora Anitescu Comp

    1/17

    Amatrix-free cone complementarity approach for solving large-scale,nonsmooth, rigid bodydynamics

    Preprint ANL/MCS-P1692-1109

    A.Tasora a M.Anitescu baUniversita degli Studi di Parma, Dipartimento di Ingegneria Industriale, 43100 Parma, Italy, [email protected]

    bMathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA,[email protected]

    Abstract

    This paper proposes an iterative method that can simulate mechanical systems featuring a large number of contacts and jointsbetween rigid bodies. The numerical method behaves as a contractive mapping that converges to the solution of a cone comple-mentarity problem by means of iterated fixed-point steps with separable pro jections onto convex manifolds. Since computationalspeed and robustness are important issues when dealing with a large number of frictional contacts, we have performed specialalgorithmic optimizations in order to translate the numerical scheme into a matrix-free algorithm with O(n) space complexity andeasy implementation. A modified version, that can run on parallel computers is discussed. A multithreaded version of the methodhas been used to simulate systems with more than a million contacts with friction.

    Key words: Large-scale multibody, contact, friction, cone complementarity problem

    1. Introduction

    Many engineering problems involve unilateral con-tacts between rigid bodies, for instance, in simulationsof robotic cells and part feeders, in cam followers, inmasonry stability analysis, and in packaging devicessuch as those depicted in Fig. 1.

    The dynamical simulation of such systems is compli-cated by the nonsmooth nature of the frictional con-straints. When the number of contacts between bod-ies increases to thousands or millions, as in the caseof granular flows in silos or in rock-soil dynamics, thecomputational efficiency of traditional methods can

    become an issue even on supercomputers.A straightforward approach to solve this class of

    problems may consist of regularization schemes, thattransform the discontinuities into a stiff force field.This is, for example, the approach often adopted bydiscrete element schemes (DEMs) because it doesnot require major modifications to traditional solversbased on smooth ordinary differential equations(ODEs) [15,35,36,30]. Nevertheless, although success-fully used to simulate granular flows with many con-tacts, the regularization approach requires small timesteps to achieve numerical stability. Moreover, it forcesthe user to introduce artificial stiffness or heuristicparameters: actually, if the deformation of the parts isnegligible, a method that can use large time steps andunconditionally rigid bodies would be more welcome.

    These considerations encouraged our research on a

    fast, robust, and unified numerical scheme that canhandle complex mechanical systems made of rigid bod-ies with an arbitrary number of contacts and joints.Such a scheme aims at simulating mechanical systemsranging from the simplest (articulated linkages withfew bilateral kinematic pairs and motors) up to themost complex (for example, a bulldozer interactingwith millions of particles of sand with the tracks andthe blade).

    In this context, the biggest challenge comes from thediscontinuous nature of the adhesion constraints andnon-interpenetration constraints; in fact, the simula-

    tion of rigid contacts embeds the solution of nonsmoothequations. To this end, the straightforward applicationof numerical methods for ODEs or differential alge-braic equations (DAEs) is inefficient. In fact, a naiveapproach based on piecewise integrals is virtually im-possible because it would require stopping and restart-ing the integrator at each discontinuity to change theactive set of constraints. This could work only if therewere a limited number of unilateral constraints [20,21].Otherwise, the risk of combinatorial explosion couldseverely affect the computational efficiency to the pointwhere the simulation would come to a halt [45].

    The nature of nonsmooth dynamics requires the

    adoption of a deeper mathematical framework, whereconcepts like set-valued functions, inclusions, and com-plementarity conditions are used [33]. In particular,recent time-stepping approaches construct weak solu-

    Preprint submitted to Elsevier 21 January 2010

  • 8/2/2019 Preprint Tasora Anitescu Comp

    2/17

    tions of the differential variational inequality (DVI)that describes the continuous time motion of rigidbodies with collision, contact, and friction. Earlier nu-merical methods based on differential variational in-equalities can be found in [26,25,24], whereas the DVIformulation has been discussed in full generality andclassified by differential index only recently, in [37,29].

    Two main families of solvers spawn from the DVI

    formulation: those that lead to an acceleration-forcecomplementarity problem [9,31,45] and that gener-ate velocity-impulse, complementarity-based time-stepping methods [39,6,7]. The latter case results inschemes convergent to a vector measure differentialinclusion, so named because it operates on vector mea-sures or distributions [40]. It has the advantage thatit can solve a class of problems with Coulomb fric-tion that would be unsolvable in an acceleration-forcecontext, as the Painleve paradox [38].

    In both cases, the introduction of inequalities intime-stepping schemes for DVI, together with a poly-hedral approximation of the friction cone as a faceted

    pyramid, leads to linear complementarity problems(LCPs) [40], which are systems of complementaryinequalities to be satisfied simultaneously [14]. SuchLCPs, which are hard to solve because of their inher-ent nonlinear nature, must be solved at each time stepin order to advance the integrator [24,40].

    Most literature about this topic shows how, for alarge number of contacts and rigid bodies, usual LCPsolution schemes have significant limitations. In fact,classical approaches to the solution of LCP problemsare based on simplex methods, also known as direct orpivoting methods, originating from the algorithms of

    Lemke and Dantzig [13]. These methods may exhibitan exponential worst-case complexity [10]. Our experi-ence shows that, in spite of algorithmic optimizations[43], simplex methods still cannot practically handlemultibody systems with more than one hundred col-liding bodies.

    Moreover, in the three-dimensional case, typicalLCP solvers can be used only at the cost of ap-proximating the Coulomb friction cone with facetedpyramids [40,45,6]. Not only does this expedient intro-duces artificial anisotropy in the friction phenomenon;it also impacts negatively the performance of LCPsolvers, which is already critical in general, becausethe finite approximation of cones results in a muchlarger problem.

    A precise description of the friction cone constraintin three-dimensional space would imply a nonlinearcomplementarity problem. This is a broader class ofproblems in mathematical programming, for which nooff-the-shelf solvers are available. A custom methodmust be developed.

    The above-mentioned limitations of the existingLCP approaches led us to develop a novel solutionmethod based on a fixed-point iteration with projec-tion on a convex set and presented in [8]. That method

    extended the seminal work on iterative LCP solversby Mangasarian [28] to the LCP case with conicalconstraints, that is, a cone complementarity problem.In the same work we presented the convergence theory

    for the iteration; the scheme converges under certainconditions that do not include a small friction assump-tion. Applied to granular flow problems, our methoddemonstrated high performance and was able to solvebenchmarks with up to a million dual variables.

    The time-stepping scheme was proven to convergein a measure differential inclusion sense to the solutionof the original continuous-time DVI [2].

    In the present paper we extend our original formu-lation [8] in several ways.(i) We enhance our approach to the case of both

    frictional contacts and bilateral constraints, ei-ther scleronomic or rheonomic (motors, imposedtrajectories, etc.). This extension cannot be ob-tained with full theoretical guarantees of conver-gence for the algorithm in [8] by simply replacingthe bilateral constraint with two unilateral con-straints. The latter case allows for unbounded in-ternal forces, for which the approach in [8, Corol-lary 1] does not apply (the resulting constraintcone is not pointed).

    (ii) We present practical algorithmic details and op-timizations that can be adopted to implementthe method in a matrix-free, memory-efficient,reliable, fast, and robust way.

    (iii) We demonstrate the performance of our ap-proach for configurations that include both jointand contact-with-friction constraints.

    A significant side-effect of the proposed method isthat it proceeds monotonically toward the solution. Ifimplemented in real-time applications such as virtual-reality and man-in-the-loop vehicle simulations, wherethe requirements on precision are less severe than those

    on the computational times, it can be stopped prema-turely before the tolerance threshold is reached.

    We believe that the CAD community will welcomethe availability of this solver because of its abilityof simulating generic mechanisms regardless of thenumber of parts, joints, and frictional contacts. Tothis end we developed a physics library based on thistime-stepping method, written in C++ and calledChrono::Engine [41], that can be used by third partiesto develop simulation software. Indeed, we used it toimplement a 3D graphical interface for the interac-tive modeling and visualization of multibody systems.Moreover, we have already simulated many types ofmechanical problems, ranging from robots to granularflows with hundreds of thousands of rigid bodies withfriction.

    Among the most complex tests, we simulated thegranular flow of a fourth-generation pebble bed nuclearreactor. Thanks to the numerical scheme, the refuelingmotion of 170,000 uranium spheres was simulated ona single computer, whereas DEM methods required asupercomputer and much more CPU time [42].

    2. The model

    This section presents a formulation for the nons-mooth dynamics of multibody systems in the mostgeneral case of both bilateral constraints and frictional

    2

  • 8/2/2019 Preprint Tasora Anitescu Comp

    3/17

    Fig. 1. Simulation of a palletizing device: a multibody problemwith many frictional contacts.

    contacts. We remark that frictional contacts embedalso the case of unilateral constraints, since those rep-

    resent a special case of contacts without friction.

    2.1. System state

    The position of the system at time t is representedby mq generalized coordinates q(t) Rmq . In the caseof rigid bodies in three-dimensional space, these coor-dinates include the positions x R3 of the centers ofmass of all bodies, as well as the rotations of all bodyframes respect to the absolute frame.

    We represent rotations by means of unimodular

    quaternions (t) S3 H. Since quaternions arefour-dimensional numbers, each rigid body in ourformulation requires 3+4=7 scalar coordinates plusone constraint |||| = 1 enforcing the unit length ofthe quaternion 1 . With this notation, we define the

    position vector to be q = {x1T ,1T ,x2T,2T,...}.Generalized velocities are represented by the vec-

    tor v(t) Rmv , where for each body we consider thespeed of the center of mass x R3 and the angu-lar velocity l, expressed in local body coordinates.Therefore a system with n bodies in three dimensionsis represented by mv = 6n speed coordinates. With

    this notation, we define the speed vector to be v ={x1T ,1Tl , x2T

    ,2T

    l ,...}T.Given the angular velocity l, one can obtain the

    time derivative of the quaternion, if needed, by build-ing the purely imaginary quaternion {0,l} and com-puting the quaternion product = 12{0,l}. On thisbasis, we introduce q = (q,v) as the linear map thatgives the time derivative of the position:

    1 Different methods can be used to represent the rotation. Ifone stores the 3x3 rotation matrix A SO(3,R), each body will

    require 3+9=12 scalar coordinates, which are highly redundant.On the other hand, storing only three angles (such as the threeEuler angles or the three Cardano angles) could give problemsof singularities. Not being affected by these limitations, quater-nions are better suited for computational application.

    (q,v) = q = {x1T, 1T , x2T , 2T,...} == {x1T, 1

    21{0,1l }T, x2

    T

    ,1

    22{0,2l }T,...}T.

    (1)

    For the time integration of the system position,different options exist for the function q(t+t) =(q(t),v, t), the simplest one being the first-order

    explicit Euler formula q(t+t) = q(t) + tq(t). Indeed,

    for body positions a straightforward first-order differ-ential approximation is used: x(t+t) = x(t) + tx(t).However, if the same approach is used for quaternions,as in (t+t) = (t) + t(t), an annoying situationcan happen: quaternions may slowly lose the unimod-ularity and drift away from the S3 manifold, unlesssome stabilization is used to enforce |||| = 1. There-fore, we prefer to integrate the rotations using thefollowing exponential map that preserves the unimod-ularity of the quaternions:

    (t+t) = (t)e{0,1

    2lt}, (2)

    so we get

    (t+t) = (q(t),v, t) =

    x1,(t) + tx1,(t)

    1,(t)e{0,1

    21l t}

    x2,(t) + tx2,(t)

    2,(t)e{0,1

    22l t}

    ...

    .

    (3)We can explicitly compute the second factor of the

    quaternion product thanks to the property e{0,u} =

    {cos ,u sin

    }of quaternion exponentials; we obtain

    e{0,1

    2lt} = e{0,(l/|l|)

    1

    2|l|t} =

    = {cos 12|l|t, l|l| sin

    1

    2|l|t}. (4)

    Since (2) preserves the norm of the quaternions,large t time steps can be used (although, once in awhile, it is safer to normalize all quaternions becausenumerical roundoff can accumulate small errors). Yetwe can demonstrate that, for t 0, the formula (2)still corresponds to (t+t) = (t) + t. In fact,

    = limt0

    (t+t) (t)

    t

    . (5)

    Hence, substituting (2) in (5), we can write

    = limt0

    (t){cos 1

    2|l|t, l|l| sin 12 |l|t} {1, 0}

    t.

    (6)Applying the Hopital theorem and simplifying, we

    obtain = 12{0,l}, as expected.

    2.2. Bilateral constraints

    Most kinematic pairs, such as revolute joints, pris-matic joints, and glyphs, can be expressed by means

    of holonomic constraints over the relative position oftwo bodies. In general, we introduce a set GB of scalarequations

    i(q, t) = 0, i GB. (7)

    3

  • 8/2/2019 Preprint Tasora Anitescu Comp

    4/17

    The size of the set GB is the number of basic scalarbilateral constraints, not necessarily corresponding tocomplex three-dimensional mechanical joints 2 .

    We assume that i(q, t) is smooth, so that it canbe differentiated to obtain the Jacobian qi =

    i/qT

    .Constraints must be respected also at the velocity

    level: the full time-derivative of the ith constraint equa-

    tion is

    di(q, t)

    dt=

    i

    qq +

    i

    t= qiT q +

    i

    t= 0

    di(q, t)

    dt= qiT(q,v) +

    i

    t= 0.

    For simplicity, from now on we will define iT =qiT(q,v).

    Note that the term i

    t is nonzero only for rheonomic(time-dependent) constraints such as motors and im-posed trajectories.

    For each bilateral constraint, there exists a Lagrange

    multiplier iB such that the force acting on the systemby the ith bilateral constraint is iBi [20].

    2.3. Unilateral contact constraints

    Since rigid bodies cannot overlap each other, giventhe set of body shapes = {1, 2, ..., n}, weassumethat there exists a set ofGP distance functions (q, )that must satisfy the unilateral constraint conditions:

    i(q, ) 0, i GP . (8)An example of such a mapping is the signed distance

    function [23].For convenience, we pose the problem in terms of

    contact points, since in most cases we can compute aminimal set of contact normals belonging to one of twoneighboring bodies: distances (q, ) are measuredalong those normals.

    For example, in the case of spherical bodies withpositions xjs and radius rs, for all body pairs j, k thesigned distance function is = ||xjsxks || 2rs. Note,however, that, for bodies with generic shapes, findinga proper set of contact points (and defining their i

    distance functions) is not always trivial [3,4]. In fact,

    there could be multiple contact points, or it could evenhappen that defining a differentiable signed distancefunction is not possible, as in the case of concave shapes[3].

    Nevertheless, since we are interested in enforcingnon-penetration, what truly matters is that a signeddistance function be defined up to some value of thepenetration [4]. We thus assume that (q, ) can bedifferentiably defined at least on a neighborhood of theset (q, ) 0. Such an assumption does hold forsmooth and strictly convex bodies, such as spheres [3].

    2 In the context of this work, bilateral constraints are always

    considered scalar, because complex mechanical joints can bemodeled by using multiple basic scalar constraints. For exam-ple, kinematic pairs such as a ball joint require three scalarequations, a prismatic guide requires five scalar equations, andso on.

    In addition, piecewise smooth bodies can be accommo-dated in a fixed-time-step framework, by decomposingthe distance function in components attached to eachpair of features, such as point and piecewise smoothsurface or curved edge and curved edge, and using allnormals attached to such pairs [19].

    Also, we consider only a subset GA(q, , ) GPof all potential contacts, that is, only those contacts

    whose surfaces are under a distance threshold :GA(q, , ) =

    i i GP , i(q, ) . (9)

    Special attention must be paid in implementing anefficient and robust collision algorithm for the genera-tion of the GA(q, , ) set of contact points.

    A preliminary algorithm, called broad-phase colli-sion detection, discards pairs of shapes that are fartherthan , in order to avoid a combinatorial waste of timeif checking for collision points with all pairs of bodies.We use the sweep and prune (SAP) algorithm to thisend [17].

    The following narrow-phase step operates on thepairs of bodies that passed the broad-phase check: itfinds the contact points and their normals. We adoptthe GJK algorithm for this purpose because it featureshigh efficiency and robustness even in the case of non-smooth surfaces [16].

    Concave shapes, if any, undergo an off-line conver-sion into sets of convex shapes using a convex de-composition algorithm; a middle-phase AABB binary-tree traversal is used to check collisions between thesecompounds of shapes without running into superlineartime complexity.

    A thin envelope is added around all shapes using

    the Minkowski sums in order to allow a small amountof interpenetration. If larger overlapping occurs, theExpanding Polytope (EPA) algorithm is used [11].

    2.4. Frictional constraints

    In the following section we introduce friction bymeans of conic constraints, which are an extension ofcomplementarity models discussed in [6,40].

    2.4.1. The Coulomb friction model

    The original Coulomb model introduces static sand kinetic k friction coefficients as the only param-eters to characterize the frictional phenomena at thesurface. Although simple, this model was proven to berealistic and practical in many situations. Usually thekinetic coefficient is slightly lower than the static coef-ficient, but in this work we consider both to have thesame value .

    If a position q is feasible and the contact is active,that is, (q, ) = 0, then at the contact we have anormal force and a tangential force.

    Let n be the normal at the contact pointing fromthe second body to the first body, and let t1 and t2 be

    the tangents at the contact. Here n, t1, t2 are mutuallyorthogonal vectors of length one in three dimensions.The vectors n, t1, and t2 are a function of the positionq, but we ignore this fact until the end of this section.

    4

  • 8/2/2019 Preprint Tasora Anitescu Comp

    5/17

    The reaction force is impressed on the system bymeans of multipliers n 0, u, and v. The normalcomponent of the force is FN = nn, and the tangen-tial component of the force is FT = ut1 +vt2.

    The Coulomb model consists of the following con-straints:

    n 0, (q) 0, (q)

    n = 0,

    n 2u +2v , ||vT||n 2u +2v = 0,FT, vT = ||FT||||vT|| ,

    (10)where vT is the relative tangential velocity at contact.The effect of the friction over the dynamical system isdefined by the friction coefficient R+, which typi-cally has a value between 0 and 1 for most materials.

    The first part of the constraint can be restated as

    F = FN + FT = nn +ut1 +vt2 K,where K is a cone in three dimensions, whose slope isarctan().

    The constraint FT, vT = ||FT||||vT|| requiresthat the tangential force be opposite to the tangentialvelocity. As a result, the reaction force is dissipative.In fact, an equivalent convenient way of expressing thisconstraint is by using the maximum dissipation prin-ciple [40,38,39],

    (u,v) = argmin2u+2vn (ut1 +vt2)T vT.These constraints are represented by mapping the

    vectors n, t1, t2 from contact coordinates to general-ized coordinates [3].

    We denote the generalized vector version ofn, t1, t2

    by Dn,Du,Dv.In generalized coordinates, the Coulomb model be-

    comes [8]

    FN =nDn, FT =uDu +vDv , (11)n 0, (q) 0, n(q) = 0, (12)where the tangential multipliersu, v are determinedfrom the maximum dissipation principle

    (u,v) = argmin2u+2vn (uDu +vDv)T v.The last relation is obtained from the identitiesDTuv =tT

    1vT

    and DTvv = tT

    2vT

    .

    2.5. The Overall dynamical model

    The other dynamical data needed for the model arethe mass matrix M(q), which is symmetric positivedefinite; the external force fe(t, q, v); and the inertialforce fc(q, v), containing the centrifugal and Coriolisforces.

    We can define the total force

    ft(t, q, v) = fe(t, q,v) + fc(q, v). (13)

    Assume now that we have multiple contact con-

    straints i(q, ) 0, i GA and multiple bilateralconstraints i(q, t) = 0, i GB. Note that the unilat-eral condition n 0, 0, n = 0 can be writtenas a complementarity constraintn 0 0.

    The continuous model is a differential variationalinequality [37]:

    M(ql)dv

    dt=

    iGA

    inDin +iuDiu +ivDiv++iGB

    iBi + ft(t, q,v)q = (q, v)

    i(q, t) = 0 i GBin 0 i(q, ) 0, i GAiu,iv = argminiin(iu)2+(iv)2 i GA

    vTuDiu +vDiv .

    (14)Unfortunately, the introduction of the Coulomb

    friction model may lead to an inconsistent model. Itis known [9] that paradoxical configurations exist forwhich such a model does not have a solution in termsof unknown accelerations and reaction forces. Such

    configurations are called Painleve paradoxes [39]. Nev-ertheless, a weaker formulation of the problem can besolved in terms of vector measures, using a nonsmoothtime-stepping scheme where reaction impulses are theunknowns at each time step [39].

    To this end we define the following stepping scheme,with time step h, known positions q(l), and velocityv(l); the scheme is an equation problem with equi-librium constraints, where the unknowns are q(l+1),v(l+1), and constraint impulses n = hn, u = hu,v = h

    v, B = h

    B:

    M(l)(v(l+1)

    vl) = iGA inDin + iuDiu + ivDiv ++iGB

    iBi

    + hft(t

    (l),q(l), v(l))(15)

    0 =1

    hi(q(l)) + iTv(l+1) +

    i

    t, i GB(16)

    0 1h

    i(q(l)) + iTv(l+1) (17) in 0, i GA

    iu, iv

    = argmin

    iin

    (iu)2+(iv)

    2 i GA

    vT(uD

    iu + vD

    iv)

    (18)

    q(l+1) = (q(l), v(l+1), h). (19)

    To simplify notation, we denoted M(ql) by M(l).In previous work, we have shown that the scheme is

    convergent, as the time step h goes to 0, to the solutionof a measure differential inclusion [2].

    For the special case of zero friction, the subproblemsimplifies to (15-17), that is, a linear complementarityproblem. Such problems can be solved by Lemkes al-gorithm [14,6]. Introducing the Coulomb friction (18),however, turns the problem into a nonlinear comple-mentarity problem that poses more difficulties. If thenonlinear constraint cone (the Coulomb cone) is ap-

    proximated by a piecewise linear cone, the subproblem(15-18) becomes again an LCP solvable by Lemkes al-gorithm [6]. Nevertheless, in [5] we have also demon-strated that, as the number of constraints in the prob-

    5

  • 8/2/2019 Preprint Tasora Anitescu Comp

    6/17

    lem increases, the computational cost of typical LCPsolvers increases far faster than linearly with the size ofthe problem. Moreover, the approximation of frictioncones by means of faceted pyramids would introduceunwanted anisotropy.

    To overcome these difficulties, we modified the time-stepping scheme by relaxing the constraint (17) as

    0

    1

    hi(q(l)) +

    i

    T

    v(l+1)

    i

    (Di,Tu v)2 + (Di,Tv v)

    2 in 0, i GA.(20)

    This results in a cone complementarity problem thatcan be solved with a fixed-point iteration approach, asdemonstrated in our earlier work [8].

    2.6. Cone complementarity formulation

    Developing the optimality conditions for the equi-librium constraint in (18), we obtain that there existsa Lagrange multiplier i such that, for any i

    GA,

    iiu = Di,Tu v, iiv = Di,Tv v,i 0 iin

    (iu)

    2+ (iv)

    2 0.(21)

    The first two equations imply that i

    (iu)2+(iv)

    2=(Di,Tu v)

    2

    +(Di,Tv v)2, while the complementarity con-

    straint implies that

    0 = i

    (iu)2 + (iv)

    2

    iin

    (iu)

    2 + (iv)2

    and, in turn, that

    iin(Di,Tu v)

    2

    +(Di,Tv v)2 = i

    iu

    2+

    iv

    2

    .(22)We now define, for all potential contacts, the vectors

    uiA =

    1

    hi(q(l)) + iTv(l+1),Di,Tu v,Di,Tv v

    T(23)

    iA =

    in, iu,

    iv

    T, i GA. (24)

    We calculate the scalar product using (20),(21):uiA,

    iA

    = in

    1

    hi + iTv

    + iuD

    i,Tu v

    + ivDi,Tv v

    = i

    in(Di,Tu v)2+(Di,Tv v)2

    i

    iu2

    +

    iv2

    = 0 uiA iA. (25)We recall that the dual cone of a convex cone K is

    the set K = {x|y Ky,x 0} and that the polarcone is defined as K = K.

    We now define the friction cone FCi such that theCoulomb friction model is satisfied ifiA FC i:

    FCi =

    x,y,z R3|ix

    y2 + z2

    .

    Then, from (18), (20), and (25), the frictional contactconstraints can be expressed by means of the followingcone complementarity constraints :

    uiA FC i iA FC i, i GA. (26)

    To obtain a unified formalism, we can represent alsothe bilateral constraints (16) in terms of cone comple-mentarity constraints. Of course, multipliers iB, withi GB, are not restrained into some special subset ofR, but even R itself is a convex cone. Thus we can in-troduce the scalar

    uiB =1

    hi(q(l)) + iTv(l+1) +

    t, i GB, (27)

    which allows us to write the bilateral constraints (16)as

    uiB BCi iB BCi, i GB, (28)where BCi = {R} and BCi = {0}, and uib, ib = 0 isalways satisfied for uiB BCi.

    We now define the vector

    k(l)

    = M(l)v(l) + hft(t(l), q(l),v(l)). (29)

    Then, equations (29), (28), and (26), together with(15) and the definition ofuiA,

    iA, u

    iB, and

    iB, result

    in the following problem:

    M(l)v(l+1) = iGA

    inDin + iuDiu + ivDiv++

    iGB

    iBi

    + k

    (l),

    uiA FC i iA FC i, i GA uiB BCi iB BCi, i GB.

    (30)If we want to obtain a cone complementarity

    problem as expressed in the typical form K f(a) a K, a more compact formulation of theproblem (30) is necessary. To this end we denote by

    nA and nB the number of elements in the sets GA andGB, respectively. Then, we define the following vectorsbA R3nA , A R3nA , bB RnB , and B RnB :

    bA =

    1

    hi1 , 0, 0,

    1

    hi2 , 0, 0, . . . ,

    1

    hinA , 0, 0

    TA =

    i1n ,

    i1u ,

    i1v ,

    i2n ,

    i2u ,

    i2v , . . . ,

    inAn ,

    inAu ,

    inAv

    TbB =

    1

    h1 +

    1

    t,

    1

    h2 +

    2

    t, . . . ,

    1

    hnB +

    nB

    t

    TB =

    1B,

    2B, . . . ,

    nBB

    TuA = u

    1T

    A ,u2T

    A , . . . ,unAT

    A T

    ,uB = u1B, u

    2B, . . . , u

    nBB

    T.

    (31)It is useful to merge these vectors, joining data from

    both frictional constraints and bilateral constraints,obtaining vectors with nE = 3nA+ nB scalar elements:

    bE =bTA, b

    TB

    T, E =

    TA,

    TB

    T, uE =

    uTA,u

    TB

    T.

    (32)For each frictional contact i GA we also define the

    following three-column matrix:

    Di =Din|Diu|Div

    . (33)

    As before, it is useful to merge all Jacobians from

    both frictional constraints and bilateral constraints ina single, large matrix having nE columns and mv rows:

    DE =

    Di1 |Di2 | . . . |DinA |1|2| . . . |nB .(34)

    6

  • 8/2/2019 Preprint Tasora Anitescu Comp

    7/17

    From the definitions (23),(27), (31), (32), (33), and(34) one can see that

    uE = DTE v

    (l+1) + bE . (35)

    Also, premultiplying by M(l)1

    equation (15), one gets

    v(l+1) = M(l)1

    DEE + M(l)1k. (36)

    Hence it is possible to substitute (36) into (35) to ob-

    tain

    uE = DTE M

    (l)1DEE + DTE M

    (l)1k + bE . (37)

    To make the expressions more compact, we intro-duce the following:

    N = DTE M(l)1DE (38)

    r = DTE M(l)1k + bE (39)

    In this way, we can write

    uE = NE + r. (40)

    Consider the multidimensional cone obtained byperforming the direct sum of all FC and BC conesand its embedding in the corresponding vector spacedirect sums:

    =

    iGA

    FCi

    iGB

    BCi

    . (41)

    From (41), (40), (31), the complementarity relation-ship in (30), and the property of convex cones [22], =

    i i = i i,, we can write the problem as

    a cone complementarity problem (CCP):

    (NE + r)

    E . (42)The separable structure of the cone will allow us

    to define an algorithm based on block matrices, withrelatively small blocks (dimension no larger than 3 inthe case of the contact problem).

    2.7. Physical effects of the relaxation

    In [2] we demonstrated that, for h 0, the solu-tion of the time-stepping scheme with the relaxed con-straints (20) will approach the solution of the samemeasure differential inclusion as the scheme that uses

    the unrelaxed constraints (17). In addition, iteratesproduced by the modified scheme approach the ones ofthe original scheme even for one time step at fixed h,

    provided that iin

    (Di,Tu v)

    2 + (Di,Tv v)2

  • 8/2/2019 Preprint Tasora Anitescu Comp

    8/17

    both cases the method is similar to a projected blockGauss-Seidel. Another option is to use a null K ma-trix, so that the method is similar to a projected blockGauss-Jacobi.

    Also, : RnE RnE is the orthogonal projection

    operator, whose calculation, as we show in the subse-quent sections, is considerably simplified by the sepa-rable cone structure (41).

    For convergence of the scheme we need the followingassumptions about the algorithm.

    A1 The matrix N of the CCP problem is symmetric andpositive semi-definite.

    A2 There exists a positive number, > 0 such that, atany iteration r, r = 0, 1, 2, . . ., we have that Br I.

    A3 There exists a positive number > 0 such that,at any iteration r, r = 0, 1, 2, . . ., we have that

    (r+1 r)T

    (Br)1

    + Kr N2

    (r+1 r)

    r+1 r

    2

    .

    Note that the first assumption is always assured inmultibody systems because the mass matrix is definitepositive, Br and are under our control for satisfyingthe second assumption, and it is always possible toadjust the free and parameters in order to satisfythe third assumption. Our main convergence result [8,Corollary 1] is given in Theorem 1.

    Theorem 1 Assume that

    0 = E NE = 0(that is, there does not exist a choice of reaction forceswhose net effect is zero; bodies do not get stuck).

    Then the algorithm (44) for the cone complementarityproblem applied to (42) produces a bounded sequence,and any accumulation point results in the same velocitysolution.

    We note that, even if in [8] we have demonstrated theuse of our algorithm for contacts only, Theorem 1 ap-plies to the bilateral case as well. Nevertheless, the keyobservation is that we consider the bilateral constraintsas linearized constraints for which we develop the addi-tional algorithmic machinery in this paper. The directapplication of the formulation in [8], by formulating abilateral constraint as two unilateral constraints will

    violate the main assumption in Theorem 1. Therefore,we had to show that direct treatment of bilateral con-traints will result in the same formal abstraction (42).

    3.1. Efficient computation of the projection onto thefriction cone

    Because of the separable cone structure of the convexsubspace , the metric projection R

    nE RnE is

    =

    FC1(

    1A)

    T, FC2(2A)

    T, . . . , FCnA (nAA )

    T,

    BC1

    (2B)

    T

    , BC2

    (2B)

    T

    , . . . BCnB

    (nBB )

    TT ,where FC i() : R3 R3 with i GA, and BCi() :R R with i GB . The projection operator mustbehave as () = argmin|| || in order to be

    FC

    i

    A

    FC(A)

    nnnn

    v

    u

    n

    r

    tttt1

    tttt2

    ttttg

    tttte

    FC

    ioooo

    A

    f

    f

    r

    r

    n

    n

    r

  • 8/2/2019 Preprint Tasora Anitescu Comp

    9/17

    i GB BCi = iBi GA

    r < in FC i = iA

    r < 1i

    n FC i = {0, 0, 0}T

    r > in, r > 1i

    n FC in =ri + n

    2i + 1

    FC iu = u iFC in

    r

    FC iv = viFC in

    r.

    (46)

    4. Implementation

    The CCP method proposed here can be applied tothe simulation of multibody systems with a large num-ber of parts and contacts because, where an upper limiton the number of iteration is enforced, the iteration(44) can run in O(n) space and O(n) time.

    Previous sections showed that generic multibodyproblems with frictional contacts, expressed with thesystem (15)(19), embed the cone complementarityproblem (42), which can be solved by the iterativemethod (44).

    Given (31), one can consider the final time-steppingscheme as a sequence of three main operations: a CCPproblem that finds unknown reactions E (47a), anaffine scaling (47b) that gives the new speeds v(l+1),and a position update (47c):

    (NE + r) o

    E (47a)v(l+1) = M1

    k + DEE

    (47b)

    q(l+1) = (q(l),v(l+1), h). (47c)

    The biggest computational overhead is caused bythe first problem, that is, the CCP (47a). In fact, (47c)is immediate, and (47b) can be computed quickly be-cause in most cases the matrix M is diagonal and itsinverse M1 can be precomputed easily.

    The convergence theory about the iterative scheme(44) leaves some degrees of freedom in choosing iA,

    iB

    values that build the diagonal blocks of the iterationmatrix B. A trivial choice could be to use the sameiA =

    iB = value for all diagonal blocks, that is,

    B = I, and then use the overrelaxation parameter to control the convergence. However, this may slowconvergence in systems with large mass ratios, evenwith an optimal . A more practical approach, whichcopes better with systems affected by uneven masses, isinspired by the Gauss-Jacobi idea of using the inverseof the diagonal of the system matrix N, so we use iB =

    1i,TM1i

    and iA =1gi

    , where gi is the average ofthe diagonal values of the ith block of the N matrix.

    We note that gi can be computed easily from the traceof the 3 3 matrix Di,TM1Di, as

    gi =Trace(Di,TM1Di)

    3. (48)

    We recall that the matrix N is a product of largematrices; N = DTE M

    1DE , and it is full even if Dand M are sparse. For systems with a large number ofcontacts, the size ofN would be prohibitive and clearlywould not satisfy the goal ofO(n) space complexity. Tothis end, direct multiplication of vectors and matricesin (44) must be avoided; otherwise the effort and thespace requirement would be superlinear in the number

    of constraint.For the reasons above, a scheme that does not needthe explicit building of N, B, and K has been devel-oped, exploiting the sparsity of M and D.

    The K matrix in (44) can be chosen freely, withinthe convergence limits posed by assumptions [A1][A3]. Among the most noticeable options, we have thecase K = 0, which results in a scheme like a projectedGauss-Jacobi, or the case where K is built by usingthe lower blocks of N.

    4.1. Optimized projected Gauss-Jacobi CCP

    Considering the case K = 0, and recalling Eq. (38),we can express the rth step of the iteration (44) as aninner loop with index i = 1 . . . nA on all nA frictioncones FCi:

    i,r+1A =

    i,rA iA

    Di,TM1

    nAz=1

    Dzz,rA +

    +

    nBz=1

    zz,rB + k

    + biA

    (49)

    i,r+1A = FC i

    i,r+1A + (1 )

    i,rA , (50)

    followed by an inner loop with index i = 1 . . . nB onall nB bilateral constraints (which we do not reportbecause it is like (49)-(50) except for B instead ofAsubscripts).

    However, for each iteration, the previous loop i =1 . . . nA would require quadratic time in terms of po-tential contacts nA because of the presence of the sum-mations

    nAz=1. This major source of slow performance

    can be eliminated if one computes the algorithm in in-cremental form. In fact, from (36) it follows that

    vr

    = M1

    nA

    z=1

    Dz

    z,r

    A +

    nB

    z=1

    z

    z,r

    B +k . (51)

    Substituting (51) into (49), we can write

    i,r+1A =

    i,rA iA

    Di,Tvr + biA.

    (52)

    Considering the optimizations above, we can expressthe final CCP algorithm with the pseudocode of Algo-rithm 1.

    In the proposed algorithm, for achieving high per-formance, some auxiliary data can be precomputed be-fore starting the iteration. Specifically, we introducethe mv

    3 matrix EiA = M

    1Di and the vector EiB =

    M1i.The iterations, usually stopped when an approxi-

    mation threshold has been reached, can be also pre-maturely aborted when r exceeds a limit rmax on the

    9

  • 8/2/2019 Preprint Tasora Anitescu Comp

    10/17

    Algorithm 1: Solve complementarity - PGJ CCP(1) // Pre-compute some data for friction constraints(2) for i := 1 to nA(3) Ei

    A= M1Di

    (4) iA

    = 3Trace(Di,TEi

    A)

    (5) // Pre-compute some data for bilateral constraints(6) for i := 1 to nB(7) Ei

    B= M1i

    (8) iB

    = 1i,TEi

    B

    (9)(10) // Initialize impulses(11) if warm start with initial guess

    E

    (12) 0E

    = E

    (13) else(14) 0

    E= 0

    (15)(16) // Initialize speeds(17) v0 =

    nAi=1

    EiAi,0A

    +nB

    i=1EiBi,0B

    + M1k(18)(19) // Main iteration loop(20) for r := 0 to rmax(21) // Loop on frictional constraints(22) for i := 1 to nA(23) i,r+1

    A=

    i,r

    A i

    A Di,Tvr + biA;(24) i,r+1A = i,r+1A + (1 )i,rA ;(25) // Loop on bilateral constraints(26) for i := 1 to nB(27) i,r+1

    B=i,r

    B i

    B

    i,Tvr + biB

    ;

    (28) i,r+1B

    = i,r+1B

    + (1 )i,r

    B;

    (29) // Update speeds(30) vr+1 =

    nAi=1

    EiAi,r+1A

    +nB

    i=1EiBi,r+1B

    +

    M1k

    (31)(32) return E , v

    maximum number of iterations if the simulation must

    meet hard-real-time requirements.With minimal modifications to the () operator,the proposed method can be easily adapted to the caseof friction in 2D or the case of generic unilateral con-straints.

    In our simulations, we chose = 1 and = 1, exceptfor the K = 0 case, where we used = 0.2. We cannotguarantee a priori that this will satisfy condition [A3],but it did for all our simulations. In addition, the ma-trix sequences Kr and Br were constant. We can there-fore claim that Theorem 1 does apply and, since thesequence did not diverge, any accumulation point is asolution of the cone complementarity problem (47a).

    In addition, our proofs of the theoretical results allowfor similar conclusions if varies from iteration to it-eration. Therefore, we could ensure that at some iter-ation the appropriate is chosen after decreasing itsvalue a few times until assumption [A3] holds. It canbe shown that if the value of is halved each time [A3]does not hold and the respective iteration is rejected,then [A3] will eventually be satisfied after a finite num-ber of steps. In our experiments, however, the valueswe have chosen for and have worked for all itera-tions without need of further adjustment.

    4.2. Optimized projected Gauss-Seidel CCP

    Another option is to take Kas the lower block struc-ture of N, which results in a scheme similar to a pro-

    Algorithm 2: Solve complementarity - PGS CCP(1) // Pre-compute some data for friction constraints(2) for i := 1 to nA(3) Ei

    A= M1Di

    (4) iA

    = 3Trace(Di,TEi

    A)

    (5) // Pre-compute some data for bilateral constraints(6) for i := 1 to nB(7) Ei

    B= M1i

    (8) iB

    = 1i,TEi

    B

    (9)(10) // Initialize impulses(11) if warm start with initial guess

    E

    (12) 0E

    = E

    (13) else(14) 0

    E= 0

    (15)(16) // Initialize speeds(17) v =

    nAi=1

    EiAi,0A

    +nB

    i=1EiBi,0B

    + M1k(18)(19) // Main iteration loop(20) for r := 0 to rmax(21) // Loop on frictional constraints(22) for i := 1 to nA(23) i,r+1

    A=

    i,r

    A i

    A Di,Tvr + biA;(24) i,r+1A = i,r+1A + (1 )i,rA ;(25) i,r+1

    A= i,r+1

    A

    i,r

    A;

    (26) v := v+ EiA

    i,r+1A

    .(27) // Loop on bilateral constraints(28) for i := 1 to nB(29) i,r+1

    B=i,r

    B i

    B

    i,Tvr + biB

    ;

    (30) i,r+1B

    = i,r+1B

    + (1 )

    i,r

    B;

    (31) i,r+1B

    = i,r+1B

    i,r

    B;

    (32) v := v+ EiB

    i,r+1B

    .(33)(34) return E , v

    jected Gauss-Seidel. The Kmatrix does not need to beexplicitly built: its effect is that, as soon as computed,a reaction impulse i will be used also for computingthe following i+1 impulse, and so on for all i, withoutneeding to finish a single iteration. In practical termsthis means that the

    nAz=1 D

    zz,rA term in Eqs. (49) and

    (51) is split ini1

    z=1 Dz

    z,r+1A +

    nAz=i D

    zz,rA , and the

    same for bilateral constraints; so the difference fromAlgorithm 1 is that after the update of a single multi-plier we immediately update v(l+1), as shown in Algo-rithm 2. Note that, to update the speeds, we avoid thefull summation (36) and add only the contributions

    caused by the change of the single multiplier after theprojection (50):

    i,r+1A = i,r+1A i,rA ; (53)

    v(l+1),i+1 = v(l+1),i + M1Dii,r+1A . (54)

    Hence, the computational overhead is not differentfrom Algorithm 1.

    Numerical tests show that this scheme convergesfaster than the case of K = 0; moreover, K = 0 ismore prone to slow convergence in case of redundantconstraints. 3

    3 Redundancy in constraints is frequent in simulations of de-generate contact situations, such as flat surface against flat sur-face, where the collision engine may create a large amount ofsuperfluous contact points.

    10

  • 8/2/2019 Preprint Tasora Anitescu Comp

    11/17

    i,T=EEEE i,T=

    b i=B i= i=

    B B

    B

    i,T{A i,T{B

    Pointer to rigid bodyA

    Pointer to rigid body B

    i-th bilateral constraint

    Di,T=

    Ei,T=A

    bbbb i=A i= i=

    A A

    i= Pointer to rigid body APointer to rigid body B

    i-th frictional contact

    Mj= ffffj= vvvvj=xxxxj

    j....

    l

    {{

    j-th rigid body

    Fig. 3. Sparse data structures used by the solver.

    5. Optimizations and improvements to the

    method

    The proposed method can be further developedby introducing some algorithmic and theoretical im-provements, obtaining different flavours of the originalscheme. This section discusses the most significantoptimizations.

    5.1. Transient data structures

    Figure 3 shows how the multibody model is repre-sented by structures that are placed in memory. Thistransient data can be allocated on the heap duringrun-time, creating lists with unlimited numbers of con-straints and rigid bodies.

    Basically, each object that builds up the lists of bi-lateral constraints encapsulates the pointers to the twoconnected bodies, the Lagrange multiplier iB , the con-straint residual biB, the scalar value

    iB, and the Jaco-

    bian i

    .The Jacobian should be a long row with mv ele-ments, but this would mean that the storage require-ment for each constraint depends on the number ofbodies, hence leading to an algorithm with superlin-ear space complexity. Instead, we store only the twosmall portions of i that are not null, that is, thetwo parts iA and iB corresponding to the twoconnected bodies, hence requiring only a fixed num-ber of 12 elements per Jacobian. When the i,Tvrmultiplication must be performed, it is computed asi,TA vrA + i,TB vrB. Similar considerations apply tothe data structure for frictional contacts, except that

    also i coefficients must be stored and that Jacobiansare made of two larger blocks, each with three rows.Thanks to this major optimization, the memory re-quirement per constraint and per contact is constant,

    so we obtain an optimal O(nE ) space complexity. 4

    Looking at Algorithms 1 and 2, one can see that thesolver can operate directly on these structures. Hence,no temporary matrices are necessary. Our method isthus a matrix-free method.

    5.2. Stabilization factor

    The stabilization terms 1hi(q(l)) and 1hi(q(l)) in(16) and (17) are used to avoid constraint drifting dur-ing the time integration [4]. In fact, if these terms weremissing, equations (16) and (17) would simply enforcethe closure of constraints at the speed level, but errorsmight slowly accumulate in constraint positions afterseveral integration steps. 5 .

    If the model were linear (that is, the matrix DE wereconstant in space, and, implicitly, in time), these termswould close constraint gaps, if any, in a single step.We experienced that, for bilateral constraints, this ap-proach works well even if the time integration step h is

    small. However, this is not always the case when deal-ing with unilateral constraints, for the reason that fol-lows. Because of numerical issues and nonlinearities,it is not possible to avoid some amount of penetrationin contact constraints. For a contact with penetration,the term 1h

    i(q(l)) has a negative value. Hence, the in-

    equality (17) requires that iv(l+1) (the speed of de-tachment of the contact) be large enough to bring thetwo surfaces at zero distance in a single step h. Afterthe time integration step, at the next step the two sur-faces might keep the 1h

    i(q(l)) separation speed, hencecausing a bouncy behavior. Apart from being not pre-dictable, this side effect gets worse if small timestepsh are used, because even the smallest interpenetrationmight cause macroscopic effects that can be perceivedas bouncy motions even in contacts that should haveno restitution.

    We investigated different strategies to improve theoriginal stabilization method. One idea was to scalethe terms by Ki factors, with 0 < Ki < 1:

    Ki

    hi(q(l)) ,

    Ki

    hi(q(l)).

    This diminishes, but does not exclude, the risk of ex-ceeding the interpenetration correction in a single step.

    Moreover, it has the drawback of creating artificial soft-ness in stacked objects, since contact stabilization issomewhat delayed in successive frames.

    A second approach, which we successfully used inmany tests, involves using a clamping A as the maxi-mum orthogonal speed for penetration recovery. Thisdoes not eliminate the risk of popping out from an in-tersecting contact, but at least the residual speed ofseparation, if any, is often negligible (comparable with

    4 We experienced that in many cases of granular flow simula-tions, the number of contacts tends to scale linearly with thenumber of rigid bodies n. Hence, under those circumstances

    the algorithm shows approximate linear space complexity O(n)also in the number of bodies.5 This would happen either because of the numerical integra-tion, whose finite precision cannot catch all geometric nonlin-earities, or because of numerical truncation and errors.

    11

  • 8/2/2019 Preprint Tasora Anitescu Comp

    12/17

    the parameter A, which can be adjusted by the user)and independent of the time step h. Thus, the stabi-lization terms would become

    1

    hi(q(l)) , max

    1

    hi(q(l)), A

    .

    Note that in this case, if the contact surfaces are sep-arated, we have i > 0 and the A clamping has no

    effect; thus it behaves as the original scheme with the1h

    i(q(l)) term.Until now we discussed how the stabilization term

    can correct the contact penetration errors when thesurface distance is negative. However, it also has a use-ful side effect also when surfaces are not yet in con-tact. In fact, Eq. (17) can be interpreted as follows: thecontact constraint is enforced only if the two surfacesare approaching fast enough to close the positive gapi(q(l)) in a h time step. This allows us to include inthe multibody system also contact constraints that arenot yet in contact but simply within a warning enve-lope; later, the cone complementarity solver will do

    the rest.The third approach is more expensive in terms of

    CPU effort, because it avoids constraint drifting bysolving an additional complementarity problem overbody positions, at each time step [12]. In this case, onesolves the speed CCP problem using no stabilizationterms on bilateral constraints, except for contacts thatare not yet in contact, where it is useful to have

    max

    1

    hi(q(l)), 0

    ,

    so that contacts with clearance are still allowed to ap-

    proach until contact, because of positive1

    hi

    , whilesurfaces already in contact are forced to have a separa-tion speed greater than or equal to 0, regardless of theamount of penetration. Later, after the time step ad-vancement, one performs the following poststabiliza-tion step, that is a a linear complementarity problemthat corrects the positions q:

    Mq =iGA

    inD

    in

    +iGB

    ibi

    (55)

    0 = i(q(l)) + iTq, i GB (56)0

    i(q(l)) +

    i

    T

    q

    in

    0, i

    GA(57)

    This problem can be solved by using the same solverused for the speed CCP problem. The poststabilizationidea performs better in the case of very large penetra-tions and ill-posed initial conditions.

    In Fig. 4 a benchmark shows that the effort in reduc-ing the maximum penetration error ||q|| with post-stabilization iterations is almost the same as that ofthe basic approach using only the stabilization coeffi-cient in the speed CCP. However some iterations onthe speed CCP still must be performed. For instance,in our complex simulations involving dense granularflow, the number of iterations of the poststabilization

    is comparable to the number of iterations for the speedCCP, so computational efforts are almost doubled.

    A family of methods can be generated by balancingthe total overhead toward the speed CCP iterations

    20 30 40 50 60 70 8020

    30

    40

    50

    60

    70

    80

    0.002

    0.002

    0.004

    0.0040.0060.0080.010.012

    0.001

    post stabilization iterations

    speedi

    terations

    ||q||

    WU

    Fig. 4. Penetration error: tradeoff between speed iterationsand poststabilization iterations, compared to WU overhead(GPU working units).

    or the poststabilization iterations. In the extreme casewhere no iterations for the speed CCP are performedand iterations are performed only for the poststabi-lization, this method becomes similar to the positionbased dynamics algorithm, proposed in [27] as a robustmethod for real-time simulations.

    Note that in this section we did not discuss the caseof collisions with restitution and we assumed the sim-ulation of contacts with fully plastic collisions. Thereader interested in modifications to the method, forsimulating also of the restitution phase, can read [1].

    5.3. Parallelization of the algorithm

    Algorithm 2 is inherently sequential, so Algorithm 1

    can be easier to implement on computational architec-tures exploiting parallel processing because it does notfeature data dependency for most of the inner loop.

    In Algorithm 1, loops at rows 23 and 27 can be easilyexecuted in parallel because they do not need to writeat the same memory address at the same time. In thebest scenario, one could run up to nA+nB independentthreads, each performing an update of its multiplier .

    However, the sums at row 30 are more critical if par-allelized in the same way, that is, on the basis of athread per contact, since there is the risk that multi-ple threads will need to update the same element of

    the speed vector at the same time (this situation hap-pens, for instance, if two contacts that refer to the samebody are processed simultaneously). This problem canbe solved in many ways, for example, by using a vec-tor of Boolean mutexes, one per rigid body, which canprevent one thread to modify the speed of a rigid bodyif some other thread is writing into it. Otherwise, if thenumber of physical threads is very large as in GPU ar-chitectures, it is worth performing the sums in 30 usingparallel-reduction algorithms.

    We implemented this parallel algorithm on both amulticore processor and on an NVIDIA Tesla C870GPU board featuring a massively multithreaded archi-

    tecture, thanks to a stream processor that is capable ofprocessing hundreds of threads in parallel. In the lat-ter case, we experienced a speedup of 15 times respectto the serial implementation [44].

    12

  • 8/2/2019 Preprint Tasora Anitescu Comp

    13/17

    6. Examples

    We present two benchmarks that we used to test theperformance of our algorithm and an application tothe simulation of the refueling of a nuclear reactor.

    6.1. Forklift truck simulator

    As a complex mechanical system, a forklift truckrepresents a significative benchmark for our algorithmbecause it entails most aspects that we discussed inthis article: rigid bodies, bilateral constraints, appliedforces, and unilateral contact constraints with friction.In detail, our truck model is made of seven rigid bodies(the frame, three wheels, the steering strut, the mast,the carriage with the forks) connected by six revoluteand prismatic joints. The tilting of the mast, the liftingof the carriage, and the steering are obtained by intro-ducing rheonomic bilateral constraints. This is a sim-

    plified approach, but more detailed models of the ac-tuators, with hydraulic components and feedback con-trollers, are also possible within this framework. Mo-tors and brakes provide torques to the wheels; motors,brakes, and actuators can be controlled in real time bythe user, using a joystick or a keyboard, or by usingautomatic procedures.

    In order to also test frictional constraints, a seventhrigid body is added: a wooden pallet of EUR/ISO1type: the truck can pick and move such a pallet withthe forks. Specific collision shapes have been defined todetect the contact points between the pallets, the en-vironment, and the truck. Collision shapes have been

    used also for frame of the forklift and its overheadguard, so that we can simulate the roll-over of the ve-hicle and other hazardous events.

    On our test system, a dual-core Centrino T2600 2.17GHz with 2 GB of RAM, the proposed algorithm is ableto simulate the truck in faster than real time, using afixed timestep h = 0.005 s.

    To assess the efficiency and capability of our matrix-less approach, we simulated an increasing number offorklift trucks, up to 1,600 vehicles with 1,600 pal-lets, see Fig.5. In the largest simulation scenario, with12,800 rigid bodies, the algorithm must handle 54,400

    bilateral constraints and, on average, 19,000 frictionalcontacts: this means more than 110,000 primal anddual variables.

    We used Algorithm 1 with a limit of 20 iterations.Table 1, averaged over 100 steps, shows that the CPUoverheadgrowsalmost linearly with the number of dualvariables 3nA + nB, that is, somehow proportional tothe number of bodies.

    Note that the frame rate in the case of hundreds oftrucks, although not real time as for few trucks, is stillfast and interactive.

    Increasing the number of iterations results in an im-provement of the precision: going from 20 iterations to

    80 iterations, the largest errors in constraint positionand in constraint velocity decrease, respectively, from0.0310 mm and 0.024 m/s to 0.006 mm and 0.002 m/s.These results about precision are not dependent on

    Table 1Time-step performance.

    No. of Trucks CCP Solve [ms] Collision [ms] Bodies 3nA + nB

    1 0.22 0.1 8 70

    400 135 39 3200 28000

    800 268 79 6400 56000

    1200 396 130 9600 84000

    1600 550 200 12800 112000

    the number of trucks, because the convergence of thesolver is not affected by the increasing complexity ofthis type of benchmark, where the dynamics of eachvehicle is uncoupled. On the other hand, our results in-dicate the efficiency of our algorithms, which is due toour customization to rigid-body dynamics structure.For example, the most efficient off-the-shelf algorithmsin [32], which are either of either the interior-point orthe projected gradient type, still need around 20 s pertime step for about 10,000 dual variables. Those algo-

    rithms solve the same problem as here. If linear scalingwould hold for those methods a big assumption intheir favor our algorithm will still be more than 100times faster.

    How precision and convergence can be affected bysystems with more stringent topology is investigatedin the following example.

    6.2. Dense granular packing benchmark

    Dense stacking of granular material is one of thehardest problems involving nonsmooth rigid body dy-

    namics. Indeed, the benchmark described in this sec-tion involves the simulation of the progressive stackingof many convex shapes in an empty box, with differentsettings.

    The rigid bodies have a mass m = 10 kg, the mo-ments of inertia are Ixx = Iyy = Izz = 10.24 kgm2,and the friction coefficient is = 0.4. The horizontalsection of the box measures 20 m20 m. We performedtests with different types of colliding shapes, but forthe graphs presented here we used spheres with a ra-dius r = 1.6 m. At the beginning of the simulation thebox is filled by 220 spheres, with randomized positions

    and with an initial volume fraction of 0.4, on average.After the spheres have settled, we obtained the plots.Figures 6 and 7 show a typical convergence pattern

    of the PGS CCP algorithm: v is the violation of theconstraints at the speed level. For increasing , the iter-ation shows a faster convergence. However, large val-ues may lead to nonmonotonic behavior (that does notnecessarily lead to divergence). We found that a goodtradeoff between convergence speed and a monotonicnondivergent iteration, for scenarios involving equallysized spheres, is = 1.

    Figure 8 shows that the parameter acts like asmooting factor over the iteration. The higher the pa-

    rameter the slower is the convergence, but the loweris the risk of nonmonotonic or divergent patterns. Theoptimal value depends on the type of simulation; weexperienced that, on average, good default values can

    13

  • 8/2/2019 Preprint Tasora Anitescu Comp

    14/17

    Fig. 5. Benchmark: simulation of thousands of forklifts.

    0 10 20 30 40 50 60 70 80

    101

    iterations

    ||

    v||

    =0.2

    =1.4

    1.0

    Fig. 6. Convergence of the residual for varying , for a sampletime step in the 220-sphere benchmark.

    0 10 20 30 40 50 60 70 80

    101

    iterations

    ||||

    =0.2

    =1.4

    1.0

    Fig. 7. Convergence of for varying , for a sample timestep in the 220-sphere benchmark.

    0 10 20 30 40 50 60 70 80

    101.6

    101.4

    101.2

    =0.2

    =1.0

    iterations

    ||v||

    Fig. 8. Effect of the smoothing parameter , for fixed = 1.

    0 5 10 15 200

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    stack height (n. of spheres)

    ||v

    ||

    rmax = 20

    rmax = 40

    rmax = 60

    rmax = 80

    rmax = 100

    Fig. 9. Effect of the vertical size of the stack on the conver-gence, for varying number of iterations.

    0 10 20 30 40 50 60 70 80

    101

    iterations

    ||

    v||

    PGS, =0.2

    PGS, =0.4

    PGS, =0.6

    PGJ, =0.2

    PGJ, =0.4

    PGJ, =0.6

    Fig. 10. Comparison of the convergence of the PGS and PGJalgorithms during the granular stacking benchmark, and ex-ample of divergence.

    1 2 3 4 5 6 7 8 9 10

    x 103

    20

    30

    40

    50

    60

    70

    80

    0.0

    002

    0.0002

    0.0

    002

    0.0

    004

    0.0004

    0.0004

    0.0006

    0.00

    06

    0.0008

    0.00

    08

    0.00

    1

    0.00

    1

    0.00

    12

    0.00

    14

    0.00

    16

    0.00

    18

    h timestep [s]

    iterations

    1e

    005

    1e

    005

    1e

    005

    5e

    005

    5e005

    5e

    005

    0.0

    001

    0.0001

    0.0

    001

    ||q||

    Fig. 11. Penetration error: combined effect of time step andnumber of iterations during the benchmark.

    be chosen in the range = 0.8 1.0.The convergencecan be largely affected by the topol-ogy of the mechanical system; the best-case scenariobeing many single objects on a flat plane, and the worst

    case being the objects stacked in a vertical row. This isshown in Fig. 9, where we performed simulations withthe same number of spheres but with different sizes ofboxes. The precisionof the iteration, for a fixed number

    14

  • 8/2/2019 Preprint Tasora Anitescu Comp

    15/17

    Fig. 12. Simulation of bidisperse granular flow in a PBR nu-clear reactor (170,000 bodies). The inner column of graphitepebbles shows almost no dispersion in the surrounding fissilematerial.

    of iterations, deteriorates proportionally to the heightof the stack.

    Figure 10 shows that, for low , the convergence ofthe PGJ CCP algorithm is comparable to the conver-gence of the PGS CCP algorithm. However the PGJCCP method is more likely to run into divergence, es-

    pecially in the case of redundant constraints. There-fore, it should be used with low , for example = 0.2in this case.

    In Fig. 11 we show the results of various simula-tions with different time steps and iteration numbers,while monitoring the average penetration error in con-straints, p. The figure shows that a good precision insatisfying the constraints at the position level can beachieved both by increasing the number of iterationsand by decreasing the time step h.

    These plots are obtained for a sample time step,after 5 seconds from the beginning of the granularstacking simulation. Hence, previous time steps and

    different scenarios can lead to slightly different graphs.Although this is only a numerical benchmark, it isa worst-case scenario, and its results are indicativeabout the behavior of the solution method when deal-ing with practical engineering problems that share thesame theoretical difficulties (masonry stability, soilcompaction, etc.).

    6.3. Refueling cycle in a pebble bed nuclear reactor

    A significant application, which may benefit fromthe robustness and the speed of the method, is the sim-

    ulation of the granular flow in the pebble bed nuclearreactor PBR [18].

    The PBR reactor features a fourth-generation designbased on a slow recirculation of fuel pebbles in a large

    silo: actinides are coated and packed with graphitemoderator in the spherical pebbles, each with a typi-cal diameter of 60 mm, while the helium coolant flowsbetween the pebbles. To increase the efficiency, a cen-tral column of spheres could contain only graphite, toflatten the neutron flux. Spheres are slowly extractedfrom the bottom, reprocessed, and reinserted at thetop. The simulation of the downward granular flow

    can be useful in estimating statistical parameters suchas the void fraction or the dispersion of the verticalcolumn, hence guiding more efficient designs that canmaximize the burnup of the actinides. A past attemptat simulating a PBR reactor required one week on a64-processor supercomputer at Sandia National Lab-oratories, using the discrete element method (DEM)[34]. Unlike DEM methods, our approach does not in-troduce stiff force fields, and larger time steps are al-lowed, so we could perform the simulation on a singlelaptop computer in few hours. On average, this prob-lem involved 170,000 rigid bodies, more than 500,000frictional contacts, leading to more than two millions

    of primal and dual variables (Fig. 12).In [42] we presented results of these simulations and

    validation against experimental data. Because of thehigh stack of spheres, this example falls in the class ofproblems with slow convergence already discussed inthe previous benchmark: this advocates for future re-search efforts that could improve the performance ofthe solver by leveraging on multiscale and domain de-composition implementations. Nevertheless, we pointout that even under this circumstance we have shownin [42] that we correctly compute macroscopic param-eters of the granular flow in the pebble bed reactor,

    such as porosity, while needing only a few hours on alaptop.

    7. Conclusions

    We presented a formulation for multibody systemswith large amounts of bilateral constraints and fric-tional contacts, and we developed an iterative methodfor this purpose. Our approach poses the problem as aconvex optimization that can be solved as a cone com-plementarity problem. The proposed fixed-point iter-

    ation has been tailored to feature high performanceeven in large simulation scenarios: special care hasbeen devoted in optimizing algorithms and formulasand avoiding matrix storage, hence obtaining an O(n)space complexity.

    The method converges under standard assumptionson the configuration of the system, resulting in a robustalgorithm that can simulate systems with millions ofmultipliers.

    Improvements have been presented for the stabi-lization of contact constraints, resulting in matrix-freeschemes that can correct large interpenetrations ofrigid bodies without running into numerical problems.

    We implemented a multibody system based on themethod presented in this paper: the Chrono::Engine li-brary [41]. Aiming at high performance, we expandedsubstantial efforts optimizing the C++ source code.

    15

  • 8/2/2019 Preprint Tasora Anitescu Comp

    16/17

    This software has been already used to simulate com-plex systems that were hardly tractable with other ap-plications: granular flows in silos, interaction of wheelswith sand and pebbles, size segregation devices, andother problems with a large number of bodies.

    The method has been recently ported also on parallelstream-kernel GPU hardware, obtaining a remarkablecomputational efficiency [44].

    Acknowledgments

    We thank Erwin Coumans for hints about collisiondetection algorithms, and Dan Negrut for commentson an earlier version of the paper. Mihai Anitescu wassupported by Contract DE-AC02-06CH11357 of theU.S. Department of Energy.

    References

    [1] M. Anitescu. A fixed time-step approach for multibodydynamics with contact and friction. In 2003 IEEE/RSJInternational Conference on Intelligent Robots andSystems, 2003.(IROS 2003). Proceedings, volume 4, 2003.

    [2] M. Anitescu. Optimization-based simulation of nonsmoothrigid multibody dynamics. Math. Program., 105(1):113143, 2006.

    [3] M. Anitescu, J. F. Cremer, and F. A. Potra. Formulating3d contact dynamics problems. Mechanics of Structuresand Machines, 24(4):405437, 1996.

    [4] M. Anitescu and G. D. Hart. A constraint-stabilizedtime-stepping approach for rigid multibody dynamicswith joints, contact and friction. International Journalfor Numerical Methods in Engineering, 60(14):23352371,2004.

    [5] M. Anitescu and G. D. Hart. A fixed-point iterationapproach for multibody dynamics with contact and friction.Mathematical Programming, Series B, 101(1)(ANL/MCS-P985-0802):332, 2004.

    [6] M. Anitescu and F. A. Potra. Formulating dynamic multi-rigid-body contact problems with friction as solvable linearcomplementarity problems. Nonlinear Dynamics, 14:231247, 1997.

    [7] M. Anitescu, F. A. Potra, and D. Stewart. Time-steppingfor three-dimensional rigid-body dynamics. ComputerMethods in Applied Mechanics and Engineering, 177:183197, 1999.

    [8] M. Anitescu and A. Tasora. An iterative approach forcone complementarity problems for nonsmooth dynamics.Computational Optimization and Applications, pages

    Accepted for printing, DOI 10.1007/s1058900892234,2008.

    [9] D. Baraff. Issues in computing contact forces for non-penetrating rigid bodies. Algorithmica, 10:292352, 1993.

    [10] D. Baraff. Fast contact force computation fornonpenetrating rigid bodies. In Computer Graphics(Proceedings of SIGGRAPH), pages 2334, 1994.

    [11] G. V. D. Bergen and G. J. A. Bergen. Collision Detectionin Interactive 3D Environments. Morgan Kaufmann, 2004.

    [12] M. Cline and D. Pai. Post-stabilization for rigid bodysimulation with contact and constraints. In IEEEInternational Conference on Robotics and Automation,volume 3, pages 37443751, 2003.

    [13] R. Cottle and G. Dantzig. Complementary pivot theoryof mathematical programming. Linear Algebra and Its

    Applications, 1:103125, 1968.[14] R. W. Cottle, J.-S. Pang, and R. E. Stone. The Linear

    Complementarity Problem. Academic Press, Boston, 1992.

    [15] B. R. Donald and D. K. Pai. On the motion of compliantlyconnected rigid bodies in contact: a system for analyzing

    designs for assembly. In Proceedings of the Conf. onRobotics and Automation, pages 17561762. IEEE, 1990.

    [16] S. K. E.G. Gilbert, D.W. Johnson. A fast procedure forcomputing the distance between complex objects in three-dimensional space. Robotics and Automation, 4(2):193203,1988.

    [17] C. Ericson. Real-Time Collision Detection. Elsevier, 2005.

    [18] H. D. Gougar. Advanced core design and fuel managementfor pebble-bed reactors. Ph.D thesis, Penn State University,Department of Nuclear Engineering, 2004.

    [19] G. D. Hart. A Constraint-stabilized Time-steppingApproach for Piecewise Smooth Multibody Dynamics.Ph.D thesis, University of Pittsburgh, Department ofMathematics, April 2007.

    [20] E. J. Haug. Computer Aided Kinematics and Dynamics ofMechanical Systems. Allyn and Bacon, Boston, 1989.

    [21] E. J. Haug, S. Wu, and S. Yang. Dynamic mechanicalsystems with coulomb friction, stiction, impact andconstraint addition-deletion. Mechanisms and MachineTheory, 21(5):407416, 1986.

    [22] J.-B. Hiriart-Urruty and C. Lemarechal. Convex Analysisand Minimization Algorithms. Springer Verlag, Berlin,1993.

    [23] Y. J. Kim, M. C. Lin, and D. Manocha. Deep: Dual-space expansion for estimating penetration depth betweenconvex polytopes. In Proceedings of the 2002 InternationalConference on Robotics and Automation, volume 1,pages 921926. Institute for Electrical and ElectronicsEngineering, 2002.

    [24] P. Lotstedt. Mechanical systems of rigid bodies subjectto unilateral constraints. SIAM Journal of AppliedMathematics, 42(2):281296, 1982.

    [25] M. D. P. Marques. Differential Inclusions in NonsmoothMechanical Problems: Shocks and Dry Friction, volume 9of Progress in Nonlinear Differential Equations and TheirApplications. Birkhauser Verlag, Basel, 1993.

    [26] J. J. Moreau. Standard inelastic shocks and the dynamicsof unilateral constraints. In G. D. Piero, F. Macieri,and S. Verlag, editors, Unilateral Problems in Structural

    Analysis, pages 173221, New York, 1983. CISM Coursesand Lectures no. 288.

    [27] M. Muller, B. H. M. Hennix, and J. Ratcliff. Position baseddynamics. In Proceedings of Virtual Reality Interactionsand Physical Simulations, pages 7180, 2006.

    [28] K. G. Murty. Linear Complementarity, Linear andNonlinear Programming. Helderman Verlag, Berlin, 1988.

    [29] J. Pang and D. Stewart. Solution dependence oninitial conditions in differential variational inequalities.Mathematical Programming, 116(1):429460, 2009.

    [30] J.-S. Pang, V. Kumar, and P. Song. Convergence of time-stepping method for initial and boundary-value frictionalcompliant contact problems. SIAM J. Numer. Anal.,43(5):22002226, 2005.

    [31] J.-S. Pang and J. C. Trinkle. Complementarity

    formulations and existence of solutions of dynamic multi-rigid-body contact problems with coulomb friction. Math.Program., 73(2):199226, 1996.

    [32] C. Petra, B. Gavrea, M. Anitescu, and F. Potra. Acomputational study of the use of an optimization-based method for simulating large multibody systems.Optimization Methods and Software, 24(6):871894, 2009.

    [33] F. Pfeiffer and C. Glocker. Multibody Dynamics withUnilateral Contacts. John Wiley, New York City, 1996.

    [34] C. Rycroft, G. Grest, J. Landry, and M. Bazant. Analysisof granular flow in a pebble-bed nuclear reactor. PhysicalReview E, 74, 021306, 2006.

    [35] P. Song, P. Kraus, V. Kumar, and P. Dupont. Analysisof rigid-body dynamic models for simulation of systemswith frictional contacts. Journal of Applied Mechanics,

    68(1):118128, 2001.[36] P. Song, J.-S. Pang, and V. Kumar. A semi-implicit

    time-stepping model for frictional compliant contactproblems. International Journal of Numerical Methods inEngineering, 60(13):267279, 2004.

    16

  • 8/2/2019 Preprint Tasora Anitescu Comp

    17/17

    [37] D. Stewart and J.-S. Pang. Differential variationalinequalities. Mathematical Programming, 113(2):345424,2008.

    [38] D. E. Stewart. Convergence of a time-stepping scheme forrigid body dynamics and resolution of Painleves problems.Archive Rational Mechanics and Analysis, 145(3):215260,1998.

    [39] D. E. Stewart. Rigid-body dynamics with friction andimpact. SIAM Review, 42(1):339, 2000.

    [40] D. E. Stewart and J. C. Trinkle. An implicit time-stepping

    scheme for rigid-body dynamics with inelastic collisionsand Coulomb friction. International Journal for NumericalMethods in Engineering, 39:26732691, 1996.

    [41] A. Tasora. Chrono::Engine project, web page.www.deltaknowledge.com/chronoengine , 2006.

    [42] A. Tasora and M. Anitescu. A convex complementarityapproach for simulating large granular flow. Journal ofComputational Nonlinear Dynamics, 2009. To appear.

    [43] A. Tasora, E. Manconi, and M. Silvestri. Un nuovometodo del simplesso per il problema di complementaritlineare mista in sistemi multibody con vincoli unilateri. InProceedings of AIMETA 05, Firenze, Italy, 2005.

    [44] A. Tasora, D. Negrut, and M. Anitescu. Large-scaleparallel multi-body dynamics with frictional contact on thegraphical processing unit. Proceedings of the Institution

    of Mechanical Engineers, Part K: Journal of Multi-bodyDynamics, 222(4):315326, 2008.

    [45] J. Trinkle, J.-S. Pang, S. Sudarsky, and G. Lo. On dynamicmulti-rigid-body contact problems with Coulomb friction.Zeithschrift fur Angewandte Mathematik und Mechanik,77:267279, 1997.

    (To be removed b efore publication) The submitted manuscripthas been created by the University of Chicago as Operator ofArgonne National Laboratory (Argonne) under Contract No.DE-AC02-06CH11357 with the U.S. Department of Energy. TheU.S. Government retains for itself, and others acting on its behalf, apaid-up, nonexclusive, irrevocable worldwide license in said articleto reproduce, prepare derivative works, distribute copies to thepublic, and perform publicly and display publicly, by or on b ehalf

    of the Government.