The source file exact_solution.C with comments:
#include <math.h>
Mesh library includes
#include "libmesh/libmesh_common.h"
Bring in everything from the libMesh namespace
using namespace libMesh;
/**
* This is the exact solution that
* we are trying to obtain. We will solve
*
* - (u_xx + u_yy) = f
*
* and take a finite difference approximation using this
* function to get f. This is the well-known "method of
* manufactured solutions".
*/
Real exact_solution (const Real x,
const Real y,
const Real z = 0.)
{
static const Real pi = acos(-1.);
return cos(.5*pi*x)*sin(.5*pi*y)*cos(.5*pi*z);
}
The source file introduction_ex3.C with comments:
Introduction Example 3 - Solving a Poisson Problem
This is the third example program. It builds on the second example program by showing how to solve a simple Poisson system. This example also introduces the notion of customized matrix assembly functions, working with an exact solution, and using element iterators. We will not comment on things that were already explained in the second example.
C++ include files that we need
#include <iostream>
#include <algorithm>
#include <math.h>
Basic include files needed for the mesh functionality.
#include "libmesh/libmesh.h"
#include "libmesh/mesh.h"
#include "libmesh/mesh_generation.h"
#include "libmesh/vtk_io.h"
#include "libmesh/linear_implicit_system.h"
#include "libmesh/equation_systems.h"
Define the Finite Element object.
#include "libmesh/fe.h"
Define Gauss quadrature rules.
#include "libmesh/quadrature_gauss.h"
Define useful datatypes for finite element
matrix and vector components.
#include "libmesh/sparse_matrix.h"
#include "libmesh/numeric_vector.h"
#include "libmesh/dense_matrix.h"
#include "libmesh/dense_vector.h"
#include "libmesh/elem.h"
Define the DofMap, which handles degree of freedom
indexing.
#include "libmesh/dof_map.h"
Bring in everything from the libMesh namespace
using namespace libMesh;
Function prototype. This is the function that will assemble
the linear system for our Poisson problem. Note that the
function will take the EquationSystems object and the
name of the system we are assembling as input. From the
EquationSystems object we have access to the Mesh and
other objects we might need.
void assemble_poisson(EquationSystems& es,
const std::string& system_name);
Function prototype for the exact solution.
Real exact_solution (const Real x,
const Real y,
const Real z = 0.);
int main (int argc, char** argv)
{
Initialize libraries, like in example 2.
LibMeshInit init (argc, argv);
Brief message to the user regarding the program name
and command line arguments.
std::cout << "Running " << argv[0];
for (int i=1; i<argc; i++)
std::cout << " " << argv[i];
std::cout << std::endl << std::endl;
Skip this 2D example if libMesh was compiled as 1D-only.
libmesh_example_assert(2 <= LIBMESH_DIM, "2D support");
Mesh mesh;
Use the MeshTools::Generation mesh generator to create a uniform
2D grid on the square [-1,1]^2. We instruct the mesh generator
to build a mesh of 15x15 QUAD9 elements. Building QUAD9
elements instead of the default QUAD4's we used in example 2
allow us to use higher-order approximation.
MeshTools::Generation::build_square (mesh,
15, 15,
-1., 1.,
-1., 1.,
QUAD9);
Print information about the mesh to the screen.
Note that 5x5 QUAD9 elements actually has 11x11 nodes,
so this mesh is significantly larger than the one in example 2.
mesh.print_info();
Create an equation systems object.
EquationSystems equation_systems (mesh);
Declare the Poisson system and its variables.
The Poisson system is another example of a steady system.
equation_systems.add_system<LinearImplicitSystem> ("Poisson");
Adds the variable "u" to "Poisson". "u"
will be approximated using second-order approximation.
equation_systems.get_system("Poisson").add_variable("u", SECOND);
Give the system a pointer to the matrix assembly
function. This will be called when needed by the
library.
equation_systems.get_system("Poisson").attach_assemble_function (assemble_poisson);
Initialize the data structures for the equation system.
equation_systems.init();
Prints information about the system to the screen.
equation_systems.print_info();
Solve the system "Poisson". Note that calling this
member will assemble the linear system and invoke
the default numerical solver. With PETSc the solver can be
controlled from the command line. For example,
you can invoke conjugate gradient with:
./introduction_ex3 -ksp_type cg
You can also get a nice X-window that monitors the solver convergence with:
./introduction-ex3 -ksp_xmonitor
if you linked against the appropriate X libraries when you built PETSc.
./introduction_ex3 -ksp_type cg
You can also get a nice X-window that monitors the solver convergence with:
./introduction-ex3 -ksp_xmonitor
if you linked against the appropriate X libraries when you built PETSc.
equation_systems.get_system("Poisson").solve();
#if defined(LIBMESH_HAVE_VTK) && !defined(LIBMESH_ENABLE_PARMESH)
After solving the system write the solution
to a VTK-formatted plot file.
VTKIO (mesh).write_equation_systems ("out.pvtu", equation_systems);
#endif // #ifdef LIBMESH_HAVE_VTK
All done.
return 0;
}
We now define the matrix assembly function for the
Poisson system. We need to first compute element
matrices and right-hand sides, and then take into
account the boundary conditions, which will be handled
via a penalty method.
void assemble_poisson(EquationSystems& es,
const std::string& system_name)
{
It is a good idea to make sure we are assembling
the proper system.
libmesh_assert_equal_to (system_name, "Poisson");
Get a constant reference to the mesh object.
const MeshBase& mesh = es.get_mesh();
The dimension that we are running
const unsigned int dim = mesh.mesh_dimension();
Get a reference to the LinearImplicitSystem we are solving
LinearImplicitSystem& system = es.get_system<LinearImplicitSystem> ("Poisson");
A reference to the DofMap object for this system. The DofMap
object handles the index translation from node and element numbers
to degree of freedom numbers. We will talk more about the DofMap
in future examples.
const DofMap& dof_map = system.get_dof_map();
Get a constant reference to the Finite Element type
for the first (and only) variable in the system.
FEType fe_type = dof_map.variable_type(0);
Build a Finite Element object of the specified type. Since the
FEBase::build() member dynamically creates memory we will
store the object as an AutoPtr. This can be thought
of as a pointer that will clean up after itself. Introduction Example 4
describes some advantages of AutoPtr's in the context of
quadrature rules.
AutoPtr<FEBase> fe (FEBase::build(dim, fe_type));
A 5th order Gauss quadrature rule for numerical integration.
QGauss qrule (dim, FIFTH);
Tell the finite element object to use our quadrature rule.
fe->attach_quadrature_rule (&qrule);
Declare a special finite element object for
boundary integration.
AutoPtr<FEBase> fe_face (FEBase::build(dim, fe_type));
Boundary integration requires one quadraure rule,
with dimensionality one less than the dimensionality
of the element.
QGauss qface(dim-1, FIFTH);
Tell the finite element object to use our
quadrature rule.
fe_face->attach_quadrature_rule (&qface);
Here we define some references to cell-specific data that
will be used to assemble the linear system.
The element Jacobian * quadrature weight at each integration point.
The element Jacobian * quadrature weight at each integration point.
const std::vector<Real>& JxW = fe->get_JxW();
The physical XY locations of the quadrature points on the element.
These might be useful for evaluating spatially varying material
properties at the quadrature points.
const std::vector<Point>& q_point = fe->get_xyz();
The element shape functions evaluated at the quadrature points.
const std::vector<std::vector<Real> >& phi = fe->get_phi();
The element shape function gradients evaluated at the quadrature
points.
const std::vector<std::vector<RealGradient> >& dphi = fe->get_dphi();
Define data structures to contain the element matrix
and right-hand-side vector contribution. Following
basic finite element terminology we will denote these
"Ke" and "Fe". These datatypes are templated on
Number, which allows the same code to work for real
or complex numbers.
DenseMatrix<Number> Ke;
DenseVector<Number> Fe;
This vector will hold the degree of freedom indices for
the element. These define where in the global system
the element degrees of freedom get mapped.
std::vector<dof_id_type> dof_indices;
Now we will loop over all the elements in the mesh.
We will compute the element matrix and right-hand-side
contribution.
Element iterators are a nice way to iterate through all the elements, or all the elements that have some property. The iterator el will iterate from the first to the last element on the local processor. The iterator end_el tells us when to stop. It is smart to make this one const so that we don't accidentally mess it up! In case users later modify this program to include refinement, we will be safe and will only consider the active elements; hence we use a variant of the \p active_elem_iterator.
Element iterators are a nice way to iterate through all the elements, or all the elements that have some property. The iterator el will iterate from the first to the last element on the local processor. The iterator end_el tells us when to stop. It is smart to make this one const so that we don't accidentally mess it up! In case users later modify this program to include refinement, we will be safe and will only consider the active elements; hence we use a variant of the \p active_elem_iterator.
MeshBase::const_element_iterator el = mesh.active_local_elements_begin();
const MeshBase::const_element_iterator end_el = mesh.active_local_elements_end();
Loop over the elements. Note that ++el is preferred to
el++ since the latter requires an unnecessary temporary
object.
for ( ; el != end_el ; ++el)
{
Store a pointer to the element we are currently
working on. This allows for nicer syntax later.
const Elem* elem = *el;
Get the degree of freedom indices for the
current element. These define where in the global
matrix and right-hand-side this element will
contribute to.
dof_map.dof_indices (elem, dof_indices);
Compute the element-specific data for the current
element. This involves computing the location of the
quadrature points (q_point) and the shape functions
(phi, dphi) for the current element.
fe->reinit (elem);
Zero the element matrix and right-hand side before
summing them. We use the resize member here because
the number of degrees of freedom might have changed from
the last element. Note that this will be the case if the
element type is different (i.e. the last element was a
triangle, now we are on a quadrilateral).
The DenseMatrix::resize() and the DenseVector::resize() members will automatically zero out the matrix and vector.
The DenseMatrix::resize() and the DenseVector::resize() members will automatically zero out the matrix and vector.
Ke.resize (dof_indices.size(),
dof_indices.size());
Fe.resize (dof_indices.size());
Now loop over the quadrature points. This handles
the numeric integration.
for (unsigned int qp=0; qp<qrule.n_points(); qp++)
{
Now we will build the element matrix. This involves
a double loop to integrate the test funcions (i) against
the trial functions (j).
for (unsigned int i=0; i<phi.size(); i++)
for (unsigned int j=0; j<phi.size(); j++)
{
Ke(i,j) += JxW[qp]*(dphi[i][qp]*dphi[j][qp]);
}
This is the end of the matrix summation loop
Now we build the element right-hand-side contribution.
This involves a single loop in which we integrate the
"forcing function" in the PDE against the test functions.
{
const Real x = q_point[qp](0);
const Real y = q_point[qp](1);
const Real eps = 1.e-3;
"fxy" is the forcing function for the Poisson equation.
In this case we set fxy to be a finite difference
Laplacian approximation to the (known) exact solution.
We will use the second-order accurate FD Laplacian approximation, which in 2D is
u_xx + u_yy = (u(i,j-1) + u(i,j+1) + u(i-1,j) + u(i+1,j) + -4*u(i,j))/h^2
Since the value of the forcing function depends only on the location of the quadrature point (q_point[qp]) we will compute it here, outside of the i-loop
We will use the second-order accurate FD Laplacian approximation, which in 2D is
u_xx + u_yy = (u(i,j-1) + u(i,j+1) + u(i-1,j) + u(i+1,j) + -4*u(i,j))/h^2
Since the value of the forcing function depends only on the location of the quadrature point (q_point[qp]) we will compute it here, outside of the i-loop
const Real fxy = -(exact_solution(x,y-eps) +
exact_solution(x,y+eps) +
exact_solution(x-eps,y) +
exact_solution(x+eps,y) -
4.*exact_solution(x,y))/eps/eps;
for (unsigned int i=0; i<phi.size(); i++)
Fe(i) += JxW[qp]*fxy*phi[i][qp];
}
}
We have now reached the end of the RHS summation,
and the end of quadrature point loop, so
the interior element integration has
been completed. However, we have not yet addressed
boundary conditions. For this example we will only
consider simple Dirichlet boundary conditions.
There are several ways Dirichlet boundary conditions can be imposed. A simple approach, which works for interpolary bases like the standard Lagrange polynomials, is to assign function values to the degrees of freedom living on the domain boundary. This works well for interpolary bases, but is more difficult when non-interpolary (e.g Legendre or Hierarchic) bases are used.
Dirichlet boundary conditions can also be imposed with a "penalty" method. In this case essentially the L2 projection of the boundary values are added to the matrix. The projection is multiplied by some large factor so that, in floating point arithmetic, the existing (smaller) entries in the matrix and right-hand-side are effectively ignored.
This amounts to adding a term of the form (in latex notation)
\frac{1}{\epsilon} \int_{\delta \Omega} \phi_i \phi_j = \frac{1}{\epsilon} \int_{\delta \Omega} u \phi_i
where
\frac{1}{\epsilon} is the penalty parameter, defined such that \epsilon << 1
There are several ways Dirichlet boundary conditions can be imposed. A simple approach, which works for interpolary bases like the standard Lagrange polynomials, is to assign function values to the degrees of freedom living on the domain boundary. This works well for interpolary bases, but is more difficult when non-interpolary (e.g Legendre or Hierarchic) bases are used.
Dirichlet boundary conditions can also be imposed with a "penalty" method. In this case essentially the L2 projection of the boundary values are added to the matrix. The projection is multiplied by some large factor so that, in floating point arithmetic, the existing (smaller) entries in the matrix and right-hand-side are effectively ignored.
This amounts to adding a term of the form (in latex notation)
\frac{1}{\epsilon} \int_{\delta \Omega} \phi_i \phi_j = \frac{1}{\epsilon} \int_{\delta \Omega} u \phi_i
where
\frac{1}{\epsilon} is the penalty parameter, defined such that \epsilon << 1
{
The following loop is over the sides of the element.
If the element has no neighbor on a side then that
side MUST live on a boundary of the domain.
for (unsigned int side=0; side<elem->n_sides(); side++)
if (elem->neighbor(side) == NULL)
{
The value of the shape functions at the quadrature
points.
const std::vector<std::vector<Real> >& phi_face = fe_face->get_phi();
The Jacobian * Quadrature Weight at the quadrature
points on the face.
const std::vector<Real>& JxW_face = fe_face->get_JxW();
The XYZ locations (in physical space) of the
quadrature points on the face. This is where
we will interpolate the boundary value function.
const std::vector<Point >& qface_point = fe_face->get_xyz();
Compute the shape function values on the element
face.
fe_face->reinit(elem, side);
Loop over the face quadrature points for integration.
for (unsigned int qp=0; qp<qface.n_points(); qp++)
{
The location on the boundary of the current
face quadrature point.
const Real xf = qface_point[qp](0);
const Real yf = qface_point[qp](1);
The penalty value. \frac{1}{\epsilon}
in the discussion above.
const Real penalty = 1.e10;
The boundary value.
const Real value = exact_solution(xf, yf);
Matrix contribution of the L2 projection.
for (unsigned int i=0; i<phi_face.size(); i++)
for (unsigned int j=0; j<phi_face.size(); j++)
Ke(i,j) += JxW_face[qp]*penalty*phi_face[i][qp]*phi_face[j][qp];
Right-hand-side contribution of the L2
projection.
for (unsigned int i=0; i<phi_face.size(); i++)
Fe(i) += JxW_face[qp]*penalty*value*phi_face[i][qp];
}
}
}
We have now finished the quadrature point loop,
and have therefore applied all the boundary conditions.
If this assembly program were to be used on an adaptive mesh, we would have to apply any hanging node constraint equations
If this assembly program were to be used on an adaptive mesh, we would have to apply any hanging node constraint equations
dof_map.constrain_element_matrix_and_vector (Ke, Fe, dof_indices);
The element matrix and right-hand-side are now built
for this element. Add them to the global matrix and
right-hand-side vector. The SparseMatrix::add_matrix()
and NumericVector::add_vector() members do this for us.
system.matrix->add_matrix (Ke, dof_indices);
system.rhs->add_vector (Fe, dof_indices);
}
All done!
}
The source file exact_solution.C without comments:
#include <math.h>
#include "libmesh/libmesh_common.h"
using namespace libMesh;
/**
* This is the exact solution that
* we are trying to obtain. We will solve
*
* - (u_xx + u_yy) = f
*
* and take a finite difference approximation using this
* function to get f. This is the well-known "method of
* manufactured solutions".
*/
Real exact_solution (const Real x,
const Real y,
const Real z = 0.)
{
static const Real pi = acos(-1.);
return cos(.5*pi*x)*sin(.5*pi*y)*cos(.5*pi*z);
}
The source file introduction_ex3.C without comments:
#include <iostream>
#include <algorithm>
#include <math.h>
#include "libmesh/libmesh.h"
#include "libmesh/mesh.h"
#include "libmesh/mesh_generation.h"
#include "libmesh/vtk_io.h"
#include "libmesh/linear_implicit_system.h"
#include "libmesh/equation_systems.h"
#include "libmesh/fe.h"
#include "libmesh/quadrature_gauss.h"
#include "libmesh/sparse_matrix.h"
#include "libmesh/numeric_vector.h"
#include "libmesh/dense_matrix.h"
#include "libmesh/dense_vector.h"
#include "libmesh/elem.h"
#include "libmesh/dof_map.h"
using namespace libMesh;
void assemble_poisson(EquationSystems& es,
const std::string& system_name);
Real exact_solution (const Real x,
const Real y,
const Real z = 0.);
int main (int argc, char** argv)
{
LibMeshInit init (argc, argv);
std::cout << "Running " << argv[0];
for (int i=1; i<argc; i++)
std::cout << " " << argv[i];
std::cout << std::endl << std::endl;
libmesh_example_assert(2 <= LIBMESH_DIM, "2D support");
Mesh mesh;
MeshTools::Generation::build_square (mesh,
15, 15,
-1., 1.,
-1., 1.,
QUAD9);
mesh.print_info();
EquationSystems equation_systems (mesh);
equation_systems.add_system<LinearImplicitSystem> ("Poisson");
equation_systems.get_system("Poisson").add_variable("u", SECOND);
equation_systems.get_system("Poisson").attach_assemble_function (assemble_poisson);
equation_systems.init();
equation_systems.print_info();
equation_systems.get_system("Poisson").solve();
#if defined(LIBMESH_HAVE_VTK) && !defined(LIBMESH_ENABLE_PARMESH)
VTKIO (mesh).write_equation_systems ("out.pvtu", equation_systems);
#endif // #ifdef LIBMESH_HAVE_VTK
return 0;
}
void assemble_poisson(EquationSystems& es,
const std::string& system_name)
{
libmesh_assert_equal_to (system_name, "Poisson");
const MeshBase& mesh = es.get_mesh();
const unsigned int dim = mesh.mesh_dimension();
LinearImplicitSystem& system = es.get_system<LinearImplicitSystem> ("Poisson");
const DofMap& dof_map = system.get_dof_map();
FEType fe_type = dof_map.variable_type(0);
AutoPtr<FEBase> fe (FEBase::build(dim, fe_type));
QGauss qrule (dim, FIFTH);
fe->attach_quadrature_rule (&qrule);
AutoPtr<FEBase> fe_face (FEBase::build(dim, fe_type));
QGauss qface(dim-1, FIFTH);
fe_face->attach_quadrature_rule (&qface);
const std::vector<Real>& JxW = fe->get_JxW();
const std::vector<Point>& q_point = fe->get_xyz();
const std::vector<std::vector<Real> >& phi = fe->get_phi();
const std::vector<std::vector<RealGradient> >& dphi = fe->get_dphi();
DenseMatrix<Number> Ke;
DenseVector<Number> Fe;
std::vector<dof_id_type> dof_indices;
MeshBase::const_element_iterator el = mesh.active_local_elements_begin();
const MeshBase::const_element_iterator end_el = mesh.active_local_elements_end();
for ( ; el != end_el ; ++el)
{
const Elem* elem = *el;
dof_map.dof_indices (elem, dof_indices);
fe->reinit (elem);
Ke.resize (dof_indices.size(),
dof_indices.size());
Fe.resize (dof_indices.size());
for (unsigned int qp=0; qp<qrule.n_points(); qp++)
{
for (unsigned int i=0; i<phi.size(); i++)
for (unsigned int j=0; j<phi.size(); j++)
{
Ke(i,j) += JxW[qp]*(dphi[i][qp]*dphi[j][qp]);
}
{
const Real x = q_point[qp](0);
const Real y = q_point[qp](1);
const Real eps = 1.e-3;
const Real fxy = -(exact_solution(x,y-eps) +
exact_solution(x,y+eps) +
exact_solution(x-eps,y) +
exact_solution(x+eps,y) -
4.*exact_solution(x,y))/eps/eps;
for (unsigned int i=0; i<phi.size(); i++)
Fe(i) += JxW[qp]*fxy*phi[i][qp];
}
}
{
for (unsigned int side=0; side<elem->n_sides(); side++)
if (elem->neighbor(side) == NULL)
{
const std::vector<std::vector<Real> >& phi_face = fe_face->get_phi();
const std::vector<Real>& JxW_face = fe_face->get_JxW();
const std::vector<Point >& qface_point = fe_face->get_xyz();
fe_face->reinit(elem, side);
for (unsigned int qp=0; qp<qface.n_points(); qp++)
{
const Real xf = qface_point[qp](0);
const Real yf = qface_point[qp](1);
const Real penalty = 1.e10;
const Real value = exact_solution(xf, yf);
for (unsigned int i=0; i<phi_face.size(); i++)
for (unsigned int j=0; j<phi_face.size(); j++)
Ke(i,j) += JxW_face[qp]*penalty*phi_face[i][qp]*phi_face[j][qp];
for (unsigned int i=0; i<phi_face.size(); i++)
Fe(i) += JxW_face[qp]*penalty*value*phi_face[i][qp];
}
}
}
dof_map.constrain_element_matrix_and_vector (Ke, Fe, dof_indices);
system.matrix->add_matrix (Ke, dof_indices);
system.rhs->add_vector (Fe, dof_indices);
}
}
The console output of the program:
***************************************************************
* Running Example introduction_ex3:
* mpirun -np 2 example-devel -pc_type bjacobi -sub_pc_type ilu -sub_pc_factor_levels 4 -sub_pc_factor_zeropivot 0 -ksp_right_pc -log_summary
***************************************************************
Running /workspace/libmesh/examples/introduction/introduction_ex3/.libs/lt-example-devel -pc_type bjacobi -sub_pc_type ilu -sub_pc_factor_levels 4 -sub_pc_factor_zeropivot 0 -ksp_right_pc -log_summary
Mesh Information:
mesh_dimension()=2
spatial_dimension()=3
n_nodes()=961
n_local_nodes()=495
n_elem()=225
n_local_elem()=112
n_active_elem()=225
n_subdomains()=1
n_partitions()=2
n_processors()=2
n_threads()=1
processor_id()=0
EquationSystems
n_systems()=1
System #0, "Poisson"
Type "LinearImplicit"
Variables="u"
Finite Element Types="LAGRANGE", "JACOBI_20_00"
Infinite Element Mapping="CARTESIAN"
Approximation Orders="SECOND", "THIRD"
n_dofs()=961
n_local_dofs()=495
n_constrained_dofs()=0
n_local_constrained_dofs()=0
n_vectors()=1
n_matrices()=1
DofMap Sparsity
Average On-Processor Bandwidth <= 14.8491
Average Off-Processor Bandwidth <= 0.520291
Maximum On-Processor Bandwidth <= 26
Maximum Off-Processor Bandwidth <= 14
DofMap Constraints
Number of DoF Constraints = 0
Number of Node Constraints = 0
*** Warning, This code is untested, experimental, or likely to see future API changes: ../../../include/libmesh/vtk_io.h, line 178, compiled Feb 5 2013 at 13:27:07 ***
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/workspace/libmesh/examples/introduction/introduction_ex3/.libs/lt-example-devel on a intel-12. named hbar.ices.utexas.edu with 2 processors, by benkirk Tue Feb 5 13:38:02 2013
Using Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00 CDT 2012
Max Max/Min Avg Total
Time (sec): 1.048e-01 1.00001 1.048e-01
Objects: 6.200e+01 1.00000 6.200e+01
Flops: 4.669e+06 1.09073 4.475e+06 8.949e+06
Flops/sec: 4.453e+07 1.09072 4.268e+07 8.535e+07
MPI Messages: 4.150e+01 1.00000 4.150e+01 8.300e+01
MPI Message Lengths: 1.841e+04 1.00000 4.435e+02 3.681e+04
MPI Reductions: 1.120e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.0482e-01 100.0% 8.9491e+06 100.0% 8.300e+01 100.0% 4.435e+02 100.0% 1.110e+02 99.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 32 1.0 2.9898e-04 1.5 4.63e+05 1.1 0.0e+00 0.0e+00 3.2e+01 0 10 0 0 29 0 10 0 0 29 3005
VecNorm 35 1.0 1.0729e-04 1.0 3.46e+04 1.1 0.0e+00 0.0e+00 3.5e+01 0 1 0 0 31 0 1 0 0 32 627
VecScale 34 1.0 2.7895e-05 1.1 1.68e+04 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1171
VecCopy 3 1.0 4.0531e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 42 1.0 1.9789e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 4 1.0 6.1200e-03 1.0 3.96e+03 1.1 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 1
VecMAXPY 34 1.0 1.4687e-04 1.1 4.95e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 6543
VecAssemblyBegin 3 1.0 4.9114e-05 1.0 0.00e+00 0.0 2.0e+00 2.9e+02 9.0e+00 0 0 2 2 8 0 0 2 2 8 0
VecAssemblyEnd 3 1.0 1.8835e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 35 1.0 6.9857e-05 1.1 0.00e+00 0.0 7.0e+01 4.0e+02 0.0e+00 0 0 84 77 0 0 0 84 77 0 0
VecScatterEnd 35 1.0 2.1458e-04 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 34 1.0 1.7595e-04 1.0 5.05e+04 1.1 0.0e+00 0.0e+00 3.4e+01 0 1 0 0 30 0 1 0 0 31 557
MatMult 34 1.0 5.5981e-04 1.5 5.01e+05 1.1 6.8e+01 4.0e+02 0.0e+00 0 11 82 73 0 0 11 82 73 0 1720
MatSolve 35 1.0 7.7701e-04 1.1 2.15e+06 1.1 0.0e+00 0.0e+00 0.0e+00 1 46 0 0 0 1 46 0 0 0 5301
MatLUFactorNum 1 1.0 7.1788e-04 1.1 1.00e+06 1.1 0.0e+00 0.0e+00 0.0e+00 1 21 0 0 0 1 21 0 0 0 2647
MatILUFactorSym 1 1.0 1.9939e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 2 0 0 0 3 2 0 0 0 3 0
MatAssemblyBegin 2 1.0 2.4700e-04 1.9 0.00e+00 0.0 3.0e+00 2.3e+03 4.0e+00 0 0 4 19 4 0 0 4 19 4 0
MatAssemblyEnd 2 1.0 2.9898e-04 1.0 0.00e+00 0.0 4.0e+00 1.0e+02 8.0e+00 0 0 5 1 7 0 0 5 1 7 0
MatGetRowIJ 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 5.7936e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 2 0 0 0 0 2 0
MatZeroEntries 3 1.0 3.6001e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 32 1.0 4.6372e-04 1.3 9.26e+05 1.1 0.0e+00 0.0e+00 3.2e+01 0 20 0 0 29 0 20 0 0 29 3877
KSPSetUp 2 1.0 7.1049e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 1.1578e-02 1.0 4.67e+06 1.1 6.8e+01 4.0e+02 7.4e+01 11100 82 73 66 11100 82 73 67 773
PCSetUp 2 1.0 3.1178e-03 1.1 1.00e+06 1.1 0.0e+00 0.0e+00 7.0e+00 3 21 0 0 6 3 21 0 0 6 609
PCSetUpOnBlocks 1 1.0 2.8598e-03 1.1 1.00e+06 1.1 0.0e+00 0.0e+00 5.0e+00 3 21 0 0 4 3 21 0 0 5 664
PCApply 35 1.0 1.0245e-03 1.1 2.15e+06 1.1 0.0e+00 0.0e+00 0.0e+00 1 46 0 0 0 1 46 0 0 0 4021
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 43 43 215608 0
Vector Scatter 2 2 2072 0
Index Set 7 7 7700 0
IS L to G Mapping 1 1 564 0
Matrix 4 4 490648 0
Krylov Solver 2 2 19360 0
Preconditioner 2 2 1784 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 7.62939e-07
Average time for zero size MPI_Send(): 1.20401e-05
#PETSc Option Table entries:
-ksp_right_pc
-log_summary
-pc_type bjacobi
-sub_pc_factor_levels 4
-sub_pc_factor_zeropivot 0
-sub_pc_type ilu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Thu Nov 8 11:21:02 2012
Configure options: --with-debugging=false --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --with-clanguage=C++ --with-shared-libraries=1 --with-mpi-dir=/opt/apps/ossw/libraries/mpich2/mpich2-1.4.1p1/sl6/intel-12.1 --with-mumps=true --download-mumps=1 --with-metis=true --download-metis=1 --with-parmetis=true --download-parmetis=1 --with-superlu=true --download-superlu=1 --with-superludir=true --download-superlu_dist=1 --with-blacs=true --download-blacs=1 --with-scalapack=true --download-scalapack=1 --with-hypre=true --download-hypre=1 --with-blas-lib="[/opt/apps/sysnet/intel/12.1/mkl/10.3.12.361/lib/intel64/libmkl_intel_lp64.so,/opt/apps/sysnet/intel/12.1/mkl/10.3.12.361/lib/intel64/libmkl_sequential.so,/opt/apps/sysnet/intel/12.1/mkl/10.3.12.361/lib/intel64/libmkl_core.so]" --with-lapack-lib="[/opt/apps/sysnet/intel/12.1/mkl/10.3.12.361/lib/intel64/libmkl_lapack95_lp64.a]"
-----------------------------------------
Libraries compiled on Thu Nov 8 11:21:02 2012 on daedalus.ices.utexas.edu
Machine characteristics: Linux-2.6.32-279.1.1.el6.x86_64-x86_64-with-redhat-6.3-Carbon
Using PETSc directory: /opt/apps/ossw/libraries/petsc/petsc-3.3-p2
Using PETSc arch: intel-12.1-mkl-intel-10.3.12.361-mpich2-1.4.1p1-cxx-opt
-----------------------------------------
Using C compiler: /opt/apps/ossw/libraries/mpich2/mpich2-1.4.1p1/sl6/intel-12.1/bin/mpicxx -wd1572 -O3 -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/apps/ossw/libraries/mpich2/mpich2-1.4.1p1/sl6/intel-12.1/bin/mpif90 -fPIC -O3 ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/opt/apps/ossw/libraries/petsc/petsc-3.3-p2/intel-12.1-mkl-intel-10.3.12.361-mpich2-1.4.1p1-cxx-opt/include -I/opt/apps/ossw/libraries/petsc/petsc-3.3-p2/include -I/opt/apps/ossw/libraries/petsc/petsc-3.3-p2/include -I/opt/apps/ossw/libraries/petsc/petsc-3.3-p2/intel-12.1-mkl-intel-10.3.12.361-mpich2-1.4.1p1-cxx-opt/include -I/opt/apps/ossw/libraries/mpich2/mpich2-1.4.1p1/sl6/intel-12.1/include
-----------------------------------------
Using C linker: /opt/apps/ossw/libraries/mpich2/mpich2-1.4.1p1/sl6/intel-12.1/bin/mpicxx
Using Fortran linker: /opt/apps/ossw/libraries/mpich2/mpich2-1.4.1p1/sl6/intel-12.1/bin/mpif90
Using libraries: -Wl,-rpath,/opt/apps/ossw/libraries/petsc/petsc-3.3-p2/intel-12.1-mkl-intel-10.3.12.361-mpich2-1.4.1p1-cxx-opt/lib -L/opt/apps/ossw/libraries/petsc/petsc-3.3-p2/intel-12.1-mkl-intel-10.3.12.361-mpich2-1.4.1p1-cxx-opt/lib -lpetsc -lX11 -Wl,-rpath,/opt/apps/ossw/libraries/petsc/petsc-3.3-p2/intel-12.1-mkl-intel-10.3.12.361-mpich2-1.4.1p1-cxx-opt/lib -L/opt/apps/ossw/libraries/petsc/petsc-3.3-p2/intel-12.1-mkl-intel-10.3.12.361-mpich2-1.4.1p1-cxx-opt/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lHYPRE -lpthread -lsuperlu_dist_3.0 -lparmetis -lmetis -lscalapack -lblacs -lsuperlu_4.3 -Wl,-rpath,/opt/apps/sysnet/intel/12.1/mkl/10.3.12.361/lib/intel64 -L/opt/apps/sysnet/intel/12.1/mkl/10.3.12.361/lib/intel64 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,-rpath,/opt/apps/ossw/libraries/mpich2/mpich2-1.4.1p1/sl6/intel-12.1/lib -L/opt/apps/ossw/libraries/mpich2/mpich2-1.4.1p1/sl6/intel-12.1/lib -Wl,-rpath,/opt/apps/sysnet/intel/12.1/composer_xe_2011_sp1.7.256/compiler/lib/intel64 -L/opt/apps/sysnet/intel/12.1/composer_xe_2011_sp1.7.256/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lmpichf90 -lifport -lifcore -lm -lm -lmpichcxx -ldl -lmpich -lopa -lmpl -lrt -lpthread -limf -lsvml -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s -ldl
-----------------------------------------
----------------------------------------------------------------------------------------------------------------------
| Processor id: 0 |
| Num Processors: 2 |
| Time: Tue Feb 5 13:38:02 2013 |
| OS: Linux |
| HostName: hbar.ices.utexas.edu |
| OS Release: 2.6.32-279.1.1.el6.x86_64 |
| OS Version: #1 SMP Tue Jul 10 11:24:23 CDT 2012 |
| Machine: x86_64 |
| Username: benkirk |
| Configuration: ./configure '--enable-everything' |
| '--prefix=/workspace/libmesh/install' |
| 'CXX=mpicxx' |
| 'CC=mpicc' |
| 'F77=mpif77' |
| 'FC=mpif90' |
| 'PETSC_DIR=/opt/apps/ossw/libraries/petsc/petsc-3.3-p2' |
| 'PETSC_ARCH=intel-12.1-mkl-intel-10.3.12.361-mpich2-1.4.1p1-cxx-opt' |
| 'SLEPC_DIR=/opt/apps/ossw/libraries/slepc/slepc-3.3-p2-petsc-3.3-p2-cxx-opt' |
| 'TRILINOS_DIR=/opt/apps/ossw/libraries/trilinos/trilinos-10.12.2/sl6/intel-12.1/mpich2-1.4.1p1/mkl-intel-10.3.12.361'|
| 'VTK_DIR=/opt/apps/ossw/libraries/vtk/vtk-5.10.0/sl6/intel-12.1' |
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=0.11478, Active time=0.095599 |
----------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|----------------------------------------------------------------------------------------------------------------|
| |
| |
| DofMap |
| add_neighbors_to_send_list() 1 0.0056 0.005622 0.0075 0.007488 5.88 7.83 |
| build_sparsity() 1 0.0040 0.003999 0.0112 0.011215 4.18 11.73 |
| create_dof_constraints() 1 0.0003 0.000280 0.0003 0.000280 0.29 0.29 |
| distribute_dofs() 1 0.0067 0.006748 0.0162 0.016216 7.06 16.96 |
| dof_indices() 368 0.0214 0.000058 0.0214 0.000058 22.38 22.38 |
| prepare_send_list() 1 0.0000 0.000050 0.0000 0.000050 0.05 0.05 |
| reinit() 1 0.0092 0.009222 0.0092 0.009222 9.65 9.65 |
| |
| EquationSystems |
| build_solution_vector() 1 0.0006 0.000649 0.0072 0.007239 0.68 7.57 |
| |
| FE |
| compute_shape_functions() 142 0.0030 0.000021 0.0030 0.000021 3.17 3.17 |
| init_shape_functions() 31 0.0002 0.000007 0.0002 0.000007 0.22 0.22 |
| inverse_map() 90 0.0008 0.000009 0.0008 0.000009 0.81 0.81 |
| |
| FEMap |
| compute_affine_map() 142 0.0013 0.000009 0.0013 0.000009 1.36 1.36 |
| compute_face_map() 30 0.0005 0.000018 0.0013 0.000044 0.56 1.37 |
| init_face_shape_functions() 1 0.0000 0.000012 0.0000 0.000012 0.01 0.01 |
| init_reference_to_physical_map() 31 0.0007 0.000023 0.0007 0.000023 0.73 0.73 |
| |
| Mesh |
| find_neighbors() 1 0.0029 0.002912 0.0030 0.002995 3.05 3.13 |
| renumber_nodes_and_elem() 2 0.0003 0.000150 0.0003 0.000150 0.31 0.31 |
| |
| MeshCommunication |
| compute_hilbert_indices() 2 0.0020 0.001022 0.0020 0.001022 2.14 2.14 |
| find_global_indices() 2 0.0010 0.000486 0.0040 0.002021 1.02 4.23 |
| parallel_sort() 2 0.0006 0.000319 0.0007 0.000365 0.67 0.76 |
| |
| MeshOutput |
| write_equation_systems() 1 0.0064 0.006368 0.0137 0.013684 6.66 14.31 |
| |
| MeshTools::Generation |
| build_cube() 1 0.0016 0.001612 0.0016 0.001612 1.69 1.69 |
| |
| MetisPartitioner |
| partition() 1 0.0031 0.003053 0.0047 0.004739 3.19 4.96 |
| |
| Parallel |
| allgather() 9 0.0002 0.000021 0.0002 0.000023 0.20 0.22 |
| max(bool) 1 0.0000 0.000003 0.0000 0.000003 0.00 0.00 |
| max(scalar) 105 0.0003 0.000003 0.0003 0.000003 0.28 0.28 |
| max(vector) 24 0.0001 0.000006 0.0003 0.000014 0.15 0.36 |
| min(bool) 121 0.0003 0.000002 0.0003 0.000002 0.31 0.31 |
| min(scalar) 99 0.0011 0.000011 0.0011 0.000011 1.10 1.10 |
| min(vector) 24 0.0002 0.000008 0.0004 0.000016 0.20 0.41 |
| probe() 12 0.0002 0.000014 0.0002 0.000014 0.18 0.18 |
| receive() 12 0.0001 0.000008 0.0003 0.000022 0.10 0.28 |
| send() 12 0.0000 0.000004 0.0000 0.000004 0.05 0.05 |
| send_receive() 16 0.0002 0.000013 0.0006 0.000036 0.22 0.60 |
| sum() 20 0.0002 0.000008 0.0002 0.000011 0.16 0.24 |
| |
| Parallel::Request |
| wait() 12 0.0000 0.000003 0.0000 0.000003 0.04 0.04 |
| |
| Partitioner |
| set_node_processor_ids() 1 0.0007 0.000667 0.0007 0.000744 0.70 0.78 |
| set_parent_processor_ids() 1 0.0002 0.000202 0.0002 0.000202 0.21 0.21 |
| |
| PetscLinearSolver |
| solve() 1 0.0126 0.012600 0.0126 0.012600 13.18 13.18 |
| |
| System |
| assemble() 1 0.0068 0.006838 0.0202 0.020185 7.15 21.11 |
----------------------------------------------------------------------------------------------------------------
| Totals: 1325 0.0956 100.00 |
----------------------------------------------------------------------------------------------------------------
***************************************************************
* Done Running Example introduction_ex3:
* mpirun -np 2 example-devel -pc_type bjacobi -sub_pc_type ilu -sub_pc_factor_levels 4 -sub_pc_factor_zeropivot 0 -ksp_right_pc -log_summary
***************************************************************
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
C++ Includes