828 * Since the Laplacian is
symmetric, the `Tvmult()` (needed by the multigrid
829 * smoother interfaces) operation is simply forwarded to the `vmult()`
case.
835 *
template <
int dim,
int fe_degree,
typename number>
836 *
void LaplaceOperator<dim, fe_degree, number>::Tvmult(
847 * The cell operation is very similar to @ref step_37
"step-37". We
do not use a
848 * coefficient here, though. The
second difference is that we replaced the
851 *
FEEvaluation::gather_evaluate() which internally calls the sequence of
852 * the two individual methods. Likewise,
FEEvaluation::integrate_scatter()
853 * implements the sequence of
FEEvaluation::integrate() followed by
854 *
FEEvaluation::distribute_local_to_global(). In this case, these new
855 *
functions merely save two lines of code. However, we use them for the
863 * template <
int dim,
int fe_degree, typename number>
864 *
void LaplaceOperator<dim, fe_degree, number>::apply_cell(
868 * const
std::pair<
unsigned int,
unsigned int> & cell_range) const
871 *
for (
unsigned int cell = cell_range.first;
cell < cell_range.second; ++
cell)
874 * phi.gather_evaluate(src,
false,
true);
875 *
for (
unsigned int q = 0; q < phi.n_q_points; ++q)
876 * phi.submit_gradient(phi.get_gradient(q), q);
877 * phi.integrate_scatter(
false,
true, dst);
885 * The face operation implements the terms of the interior penalty method in
886 * analogy to @ref step_39
"step-39", as explained in the introduction. We need two
887 * evaluator objects
for this task,
one for handling the solution that comes
888 * from the
cell on
one of the two sides of an interior face, and
one for 889 * handling the solution from the other side. The evaluators
for face
891 *
second slot of the constructor to indicate which of the two sides the
893 *
one of the two sides the `interior`
one and the other the `exterior`
894 *
one. The name `exterior` refers to the fact that the evaluator from both
895 * sides will
return the same normal vector. For the `interior` side, the
896 * normal vector points outwards, whereas it points inwards on the other
897 * side, and is opposed to the outer normal vector of that
cell. Apart from
898 * the
new class name, we again get a range of items to work with in
899 * analogy to what was discussed in @ref step_37
"step-37", but
for the interior faces in
900 *
this case. Note that the data structure of MatrixFree forms batches of
901 * faces that are analogous to the batches of cells
for the
cell 902 * integrals. All faces within a batch involve different
cell numbers but
903 * have the face number within the reference
cell, have the same refinement
904 * configuration (no refinement or the same subface), and the same
905 * orientation, to keep SIMD operations simple and efficient.
909 * Note that there is no implied meaning in interior versus exterior except
910 * the logic decision of the orientation of the normal, which is pretty
911 *
random internally. One can in no way rely on a certain pattern of
912 * assigning interior versus exterior flags, as the decision is made
for the
913 * sake of access regularity and uniformity in the MatrixFree setup
914 * routines. Since most sane DG methods are conservative, i.e., fluxes look
915 * the same from both sides of an interface, the mathematics are unaltered
916 *
if the interior/exterior flags are switched and normal vectors
get the
923 *
template <
int dim,
int fe_degree,
typename number>
924 *
void LaplaceOperator<dim, fe_degree, number>::apply_face(
928 *
const std::pair<unsigned int, unsigned int> & face_range)
const 934 *
for (
unsigned int face = face_range.first; face < face_range.second; ++face)
938 * On a given batch of faces, we
first update the pointers to the
939 * current face and then access the vector. As mentioned above, we
940 * combine the vector access with the evaluation. In the
case of face
941 * integrals, the data access into the vector can be reduced
for the
942 * special
case of an
FE_DGQHermite basis as explained
for the data
943 * exchange above: Since only @f$2(k+1)^{
d-1}@f$ out of the @f$(k+1)^
d@f$ cell
944 * degrees of freedom
get multiplied by a non-
zero value or derivative
945 * of a shape
function,
this structure can be utilized
for the
946 * evaluation, significantly reducing the data access. The
reduction 947 * of the data access is not only beneficial because it reduces the
948 * data in flight and thus helps caching, but also because the data
949 * access to faces is often more irregular than
for cell integrals when
950 * gathering values from cells that are farther apart in the index
954 * phi_inner.reinit(face);
955 * phi_inner.gather_evaluate(src,
true,
true);
956 * phi_outer.reinit(face);
957 * phi_outer.gather_evaluate(src,
true,
true);
961 * The next two statements compute the penalty parameter
for the
962 * interior penalty method. As explained in the introduction, we would
963 * like to have a scaling like @f$\frac{1}{h_\text{i}}@f$ of the length
964 * @f$h_\text{i}@f$ normal to the face. For a
general non-Cartesian mesh,
965 *
this length must be computed by the product of the inverse Jacobian
966 * times the normal vector in real coordinates. From
this vector of
967 * `dim` components, we must
finally pick the component that is
968 * oriented normal to the reference cell. In the geometry data stored
969 * in MatrixFree, a permutation of the components in the Jacobian is
970 * applied such that
this latter direction is
always the last
971 * component `dim-1` (
this is beneficial because reference-cell
972 * derivative sorting can be made agnostic of the direction of the
973 * face). This means that we can simply access the last entry `dim-1`
974 * and must not look up the local face number in
975 * `data.get_face_info(face).interior_face_no` and
976 * `data.get_face_info(face).exterior_face_no`. Finally, we must also
977 * take the absolute
value of these factors as the normal could
point 978 * into either positive or negative direction.
982 * 0.5 * (
std::abs((phi_inner.get_normal_vector(0) *
983 * phi_inner.inverse_jacobian(0))[dim - 1]) +
984 *
std::abs((phi_outer.get_normal_vector(0) *
985 * phi_outer.inverse_jacobian(0))[dim - 1]));
987 * inverse_length_normal_to_face * get_penalty_factor();
991 * In the
loop over the quadrature points, we eventually compute all
992 * contributions to the interior penalty scheme. According to the
993 * formulas in the introduction, the
value of the test
function gets
994 * multiplied by the difference of the jump in the solution times the
995 * penalty parameter and the average of the normal derivative in real
996 * space. Since the two evaluators
for interior and exterior sides
get 997 * different signs due to the jump, we pass the result with a
998 * different
sign here. The normal derivative of the test
function 999 * gets multiplied by the negative jump in the solution between the
1000 * interior and exterior side. This term, coined adjoint consistency
1001 * term, must also include the factor of @f$\frac{1}{2}@f$ in the code in
1002 * accordance with its relation to the primal consistency term that
1003 * gets the factor of
one half due to the average in the test
function 1007 *
for (
unsigned int q = 0; q < phi_inner.n_q_points; ++q)
1010 * (phi_inner.get_value(q) - phi_outer.get_value(q));
1012 * (phi_inner.get_normal_derivative(q) +
1013 * phi_outer.get_normal_derivative(q)) *
1016 * solution_jump * sigma - average_normal_derivative;
1018 * phi_inner.submit_value(test_by_value, q);
1019 * phi_outer.submit_value(-test_by_value, q);
1021 * phi_inner.submit_normal_derivative(-solution_jump * number(0.5), q);
1022 * phi_outer.submit_normal_derivative(-solution_jump * number(0.5), q);
1027 * Once we are done with the
loop over quadrature points, we can
do 1028 * the
sum factorization operations
for the integration loops on faces
1029 * and
sum the results into the result vector,
using the
1031 * distribution of the vector data into scattered positions in the
1033 * the combined
integrate + write operation allows us to reduce the
1037 * phi_inner.integrate_scatter(
true,
true, dst);
1038 * phi_outer.integrate_scatter(
true,
true, dst);
1046 * The boundary face
function follows by and large the interior face
1047 *
function. The only difference is the fact that we
do not have a separate
1048 *
FEFaceEvaluation object that provides us with exterior values @f$u^+@f$, but
1049 * we must define them from the boundary conditions and interior values
1050 * @f$u^-@f$. As explained in the introduction, we use @f$u^+ = -u^- + 2
1051 * g_\text{D}@f$ and @f$\mathbf{n}^-\cdot \nabla u^+ = \mathbf{n}^-\cdot \nabla
1052 * u^-@f$ on Dirichlet boundaries and @f$u^+=u^-@f$ and @f$\mathbf{n}^-\cdot \nabla
1053 * u^+ = -\mathbf{n}^-\cdot \nabla u^- + 2 g_\text{
N}@f$ on Neumann
1054 * boundaries. Since
this operation implements the homogeneous part, i.e.,
1056 * @f$g_\text{D}@f$ and @f$g_\text{
N}@f$ here, and added them to the right hand side
1057 * in `LaplaceProblem::compute_rhs()`. Note that due to extension of the
1058 * solution @f$u^-@f$ to the exterior via @f$u^+@f$, we can keep all factors @f$0.5@f$
1059 * the same as in the inner face
function, see also the discussion in
1060 * @ref step_39
"step-39".
1064 * There is
one catch at
this point: The implementation below uses a
boolean 1065 * variable `is_dirichlet` to
switch between the Dirichlet and the Neumann
1066 * cases. However, we solve a problem where we also want to impose periodic
1067 * boundary conditions on some boundaries, namely along those in the @f$x@f$
1068 * direction. One might wonder how those conditions should be handled
1069 * here. The answer is that MatrixFree automatically treats periodic
1070 * boundaries as what they are technically, namely an inner face where the
1071 * solution values of two adjacent cells meet and must be treated by proper
1072 * numerical fluxes. Thus, all the faces on the periodic boundaries will
1073 * appear in the `apply_face()`
function and not in
this one.
1079 *
template <
int dim,
int fe_degree,
typename number>
1080 *
void LaplaceOperator<dim, fe_degree, number>::apply_boundary(
1084 *
const std::pair<unsigned int, unsigned int> & face_range)
const 1088 *
for (
unsigned int face = face_range.first; face < face_range.second; ++face)
1090 * phi_inner.reinit(face);
1091 * phi_inner.gather_evaluate(src,
true,
true);
1094 *
std::abs((phi_inner.get_normal_vector(0) *
1095 * phi_inner.inverse_jacobian(0))[dim - 1]);
1097 * inverse_length_normal_to_face * get_penalty_factor();
1099 *
const bool is_dirichlet = (data.get_boundary_id(face) == 0);
1101 *
for (
unsigned int q = 0; q < phi_inner.n_q_points; ++q)
1105 * is_dirichlet ? -u_inner : u_inner;
1107 * phi_inner.get_normal_derivative(q);
1109 * is_dirichlet ? normal_derivative_inner : -normal_derivative_inner;
1112 * (normal_derivative_inner + normal_derivative_outer) * number(0.5);
1114 * solution_jump * sigma - average_normal_derivative;
1115 * phi_inner.submit_normal_derivative(-solution_jump * number(0.5), q);
1116 * phi_inner.submit_value(test_by_value, q);
1118 * phi_inner.integrate_scatter(
true,
true, dst);
1126 * Next we turn to the preconditioner initialization. As explained in the
1127 * introduction, we want to construct an (
approximate) inverse of the cell
1128 * matrices from a product of 1D mass and Laplace matrices. Our
first task
1129 * is to compute the 1D matrices, which we
do by
first creating a 1D finite
1130 * element. Instead of anticipating
FE_DGQHermite<1> here, we
get the finite
1131 * element
's name from DoFHandler, replace the @p dim argument (2 or 3) by 1 1132 * to create a 1D name, and construct the 1D element by using FETools. 1138 * template <int dim, int fe_degree, typename number> 1139 * void PreconditionBlockJacobi<dim, fe_degree, number>::initialize( 1140 * const LaplaceOperator<dim, fe_degree, number> &op) 1142 * data = op.get_matrix_free(); 1144 * std::string name = data->get_dof_handler().get_fe().get_name(); 1145 * name.replace(name.find('<
') + 1, 1, "1"); 1146 * std::unique_ptr<FiniteElement<1>> fe_1d = FETools::get_fe_by_name<1>(name); 1150 * As for computing the 1D matrices on the unit element, we simply write 1151 * down what a typical assembly procedure over rows and columns of the 1152 * matrix as well as the quadrature points would do. We select the same 1153 * Laplace matrices once and for all using the coefficients 0.5 for 1154 * interior faces (but possibly scaled differently in different directions 1155 * as a result of the mesh). Thus, we make a slight mistake at the 1156 * Dirichlet boundary (where the correct factor would be 1 for the 1157 * derivative terms and 2 for the penalty term, see @ref step_39 "step-39") or at the 1158 * Neumann boundary where the factor should be zero. Since we only use 1159 * this class as a smoother inside a multigrid scheme, this error is not 1160 * going to have any significant effect and merely affects smoothing 1164 * const unsigned int N = fe_degree + 1; 1165 * FullMatrix<double> laplace_unscaled(N, N); 1166 * std::array<Table<2, VectorizedArray<number>>, dim> mass_matrices; 1167 * std::array<Table<2, VectorizedArray<number>>, dim> laplace_matrices; 1168 * for (unsigned int d = 0; d < dim; ++d) 1170 * mass_matrices[d].reinit(N, N); 1171 * laplace_matrices[d].reinit(N, N); 1174 * QGauss<1> quadrature(N); 1175 * for (unsigned int i = 0; i < N; ++i) 1176 * for (unsigned int j = 0; j < N; ++j) 1178 * double sum_mass = 0, sum_laplace = 0; 1179 * for (unsigned int q = 0; q < quadrature.size(); ++q) 1181 * sum_mass += (fe_1d->shape_value(i, quadrature.point(q)) * 1182 * fe_1d->shape_value(j, quadrature.point(q))) * 1183 * quadrature.weight(q); 1184 * sum_laplace += (fe_1d->shape_grad(i, quadrature.point(q))[0] * 1185 * fe_1d->shape_grad(j, quadrature.point(q))[0]) * 1186 * quadrature.weight(q); 1188 * for (unsigned int d = 0; d < dim; ++d) 1189 * mass_matrices[d](i, j) = sum_mass; 1193 * The left and right boundary terms assembled by the next two 1194 * statements appear to have somewhat arbitrary signs, but those are 1195 * correct as can be verified by looking at @ref step_39 "step-39" and inserting 1196 * the value -1 and 1 for the normal vector in the 1D case. 1200 * (1. * fe_1d->shape_value(i, Point<1>()) * 1201 * fe_1d->shape_value(j, Point<1>()) * op.get_penalty_factor() + 1202 * 0.5 * fe_1d->shape_grad(i, Point<1>())[0] * 1203 * fe_1d->shape_value(j, Point<1>()) + 1204 * 0.5 * fe_1d->shape_grad(j, Point<1>())[0] * 1205 * fe_1d->shape_value(i, Point<1>())); 1208 * (1. * fe_1d->shape_value(i, Point<1>(1.0)) * 1209 * fe_1d->shape_value(j, Point<1>(1.0)) * op.get_penalty_factor() - 1210 * 0.5 * fe_1d->shape_grad(i, Point<1>(1.0))[0] * 1211 * fe_1d->shape_value(j, Point<1>(1.0)) - 1212 * 0.5 * fe_1d->shape_grad(j, Point<1>(1.0))[0] * 1213 * fe_1d->shape_value(i, Point<1>(1.0))); 1215 * laplace_unscaled(i, j) = sum_laplace; 1220 * Next, we go through the cells and pass the scaled matrices to 1221 * TensorProductMatrixSymmetricSum to actually compute the generalized 1222 * eigenvalue problem for representing the inverse: Since the matrix 1223 * approximation is constructed as @f$A\otimes M + M\otimes A@f$ and the 1224 * weights are constant for each element, we can apply all weights on the 1225 * Laplace matrix and simply keep the mass matrices unscaled. In the loop 1226 * over cells, we want to make use of the geometry compression provided by 1227 * the MatrixFree class and check if the current geometry is the same as 1228 * on the last cell batch, in which case there is nothing to do. This 1229 * compression can be accessed by 1230 * FEEvaluation::get_mapping_data_index_offset() once `reinit()` has been 1235 * Once we have accessed the inverse Jacobian through the FEEvaluation 1236 * access function (we take the one for the zeroth quadrature point as 1237 * they should be the same on all quadrature points for a Cartesian cell), 1238 * we check that it is diagonal and then extract the determinant of the 1239 * original Jacobian, i.e., the inverse of the determinant of the inverse 1240 * Jacobian, and set the weight as @f$\text{det}(J) / h_d^2@f$ according to 1241 * the 1D Laplacian times @f$d-1@f$ copies of the mass matrix. 1244 * cell_matrices.clear(); 1245 * FEEvaluation<dim, fe_degree, fe_degree + 1, 1, number> phi(*data); 1246 * unsigned int old_mapping_data_index = numbers::invalid_unsigned_int; 1247 * for (unsigned int cell = 0; cell < data->n_cell_batches(); ++cell) 1251 * if (phi.get_mapping_data_index_offset() == old_mapping_data_index) 1254 * Tensor<2, dim, VectorizedArray<number>> inverse_jacobian = 1255 * phi.inverse_jacobian(0); 1257 * for (unsigned int d = 0; d < dim; ++d) 1258 * for (unsigned int e = 0; e < dim; ++e) 1260 * for (unsigned int v = 0; v < VectorizedArray<number>::size(); ++v) 1261 * AssertThrow(inverse_jacobian[d][e][v] == 0., 1262 * ExcNotImplemented()); 1264 * VectorizedArray<number> jacobian_determinant = inverse_jacobian[0][0]; 1265 * for (unsigned int e = 1; e < dim; ++e) 1266 * jacobian_determinant *= inverse_jacobian[e][e]; 1267 * jacobian_determinant = 1. / jacobian_determinant; 1269 * for (unsigned int d = 0; d < dim; ++d) 1271 * const VectorizedArray<number> scaling_factor = 1272 * inverse_jacobian[d][d] * inverse_jacobian[d][d] * 1273 * jacobian_determinant; 1277 * Once we know the factor by which we should scale the Laplace 1278 * matrix, we apply this weight to the unscaled DG Laplace matrix 1279 * and send the array to the class TensorProductMatrixSymmetricSum 1280 * for computing the generalized eigenvalue problem mentioned in 1287 * for (unsigned int i = 0; i < N; ++i) 1288 * for (unsigned int j = 0; j < N; ++j) 1289 * laplace_matrices[d](i, j) = 1290 * scaling_factor * laplace_unscaled(i, j); 1292 * if (cell_matrices.size() <= phi.get_mapping_data_index_offset()) 1293 * cell_matrices.resize(phi.get_mapping_data_index_offset() + 1); 1294 * cell_matrices[phi.get_mapping_data_index_offset()].reinit( 1295 * mass_matrices, laplace_matrices); 1303 * The vmult function for the approximate block-Jacobi preconditioner is 1304 * very simple in the DG context: We simply need to read the values of the 1305 * current cell batch, apply the inverse for the given entry in the array of 1306 * tensor product matrix, and write the result back. In this loop, we 1307 * overwrite the content in `dst` rather than first setting the entries to 1308 * zero. This is legitimate for a DG method because every cell has 1309 * independent degrees of freedom. Furthermore, we manually write out the 1310 * loop over all cell batches, rather than going through 1311 * MatrixFree::cell_loop(). We do this because we know that we are not going 1312 * to need data exchange over the MPI network here as all computations are 1313 * done on the cells held locally on each processor. 1319 * template <int dim, int fe_degree, typename number> 1320 * void PreconditionBlockJacobi<dim, fe_degree, number>::vmult( 1321 * LinearAlgebra::distributed::Vector<number> & dst, 1322 * const LinearAlgebra::distributed::Vector<number> &src) const 1324 * adjust_ghost_range_if_necessary(*data, dst); 1325 * adjust_ghost_range_if_necessary(*data, src); 1327 * FEEvaluation<dim, fe_degree, fe_degree + 1, 1, number> phi(*data); 1328 * for (unsigned int cell = 0; cell < data->n_cell_batches(); ++cell) 1331 * phi.read_dof_values(src); 1332 * cell_matrices[phi.get_mapping_data_index_offset()].apply_inverse( 1333 * ArrayView<VectorizedArray<number>>(phi.begin_dof_values(), 1334 * phi.dofs_per_cell), 1335 * ArrayView<const VectorizedArray<number>>(phi.begin_dof_values(), 1336 * phi.dofs_per_cell)); 1337 * phi.set_dof_values(dst); 1345 * The definition of the LaplaceProblem class is very similar to 1346 * @ref step_37 "step-37". One difference is the fact that we add the element degree as a 1347 * template argument to the class, which would allow us to more easily 1348 * include more than one degree in the same program by creating different 1349 * instances in the `main()` function. The second difference is the 1350 * selection of the element, FE_DGQHermite, which is specialized for this 1351 * kind of equations. 1357 * template <int dim, int fe_degree> 1358 * class LaplaceProblem 1365 * void setup_system(); 1366 * void compute_rhs(); 1368 * void analyze_results() const; 1370 * #ifdef DEAL_II_WITH_P4EST 1371 * parallel::distributed::Triangulation<dim> triangulation; 1373 * Triangulation<dim> triangulation; 1376 * FE_DGQHermite<dim> fe; 1377 * DoFHandler<dim> dof_handler; 1379 * using SystemMatrixType = LaplaceOperator<dim, fe_degree, double>; 1380 * SystemMatrixType system_matrix; 1382 * using LevelMatrixType = LaplaceOperator<dim, fe_degree, float>; 1383 * MGLevelObject<LevelMatrixType> mg_matrices; 1385 * LinearAlgebra::distributed::Vector<double> solution; 1386 * LinearAlgebra::distributed::Vector<double> system_rhs; 1388 * double setup_time; 1389 * ConditionalOStream pcout; 1390 * ConditionalOStream time_details; 1395 * template <int dim, int fe_degree> 1396 * LaplaceProblem<dim, fe_degree>::LaplaceProblem() 1398 * #ifdef DEAL_II_WITH_P4EST 1401 * Triangulation<dim>::limit_level_difference_at_vertices, 1402 * parallel::distributed::Triangulation<dim>::construct_multigrid_hierarchy) 1405 * triangulation(Triangulation<dim>::limit_level_difference_at_vertices) 1409 * , dof_handler(triangulation) 1411 * , pcout(std::cout, Utilities::MPI::this_mpi_process(MPI_COMM_WORLD) == 0) 1412 * , time_details(std::cout, 1414 * Utilities::MPI::this_mpi_process(MPI_COMM_WORLD) == 0) 1421 * The setup function differs in two aspects from @ref step_37 "step-37". The first is that 1422 * we do not need to interpolate any constraints for the discontinuous 1423 * ansatz space, and simply pass a dummy AffineConstraints object into 1424 * Matrixfree::reinit(). The second change arises because we need to tell 1425 * MatrixFree to also initialize the data structures for faces. We do this 1426 * by setting update flags for the inner and boundary faces, 1427 * respectively. On the boundary faces, we need both the function values, 1428 * their gradients, JxW values (for integration), the normal vectors, and 1429 * quadrature points (for the evaluation of the boundary conditions), 1430 * whereas we only need shape function values, gradients, JxW values, and 1431 * normal vectors for interior faces. The face data structures in MatrixFree 1432 * are always built as soon as one of `mapping_update_flags_inner_faces` or 1433 * `mapping_update_flags_boundary_faces` are different from the default 1434 * value `update_default` of UpdateFlags. 1440 * template <int dim, int fe_degree> 1441 * void LaplaceProblem<dim, fe_degree>::setup_system() 1446 * system_matrix.clear(); 1447 * mg_matrices.clear_elements(); 1449 * dof_handler.distribute_dofs(fe); 1450 * dof_handler.distribute_mg_dofs(); 1452 * pcout << "Number of degrees of freedom: " << dof_handler.n_dofs() 1455 * setup_time += time.wall_time(); 1456 * time_details << "Distribute DoFs " << time.wall_time() << " s" 1460 * AffineConstraints<double> dummy; 1464 * typename MatrixFree<dim, double>::AdditionalData additional_data; 1465 * additional_data.tasks_parallel_scheme = 1466 * MatrixFree<dim, double>::AdditionalData::none; 1467 * additional_data.mapping_update_flags = 1468 * (update_gradients | update_JxW_values | update_quadrature_points); 1469 * additional_data.mapping_update_flags_inner_faces = 1470 * (update_gradients | update_JxW_values | update_normal_vectors); 1471 * additional_data.mapping_update_flags_boundary_faces = 1472 * (update_gradients | update_JxW_values | update_normal_vectors | 1473 * update_quadrature_points); 1474 * const auto system_mf_storage = 1475 * std::make_shared<MatrixFree<dim, double>>(); 1476 * system_mf_storage->reinit(dof_handler, 1478 * QGauss<1>(fe.degree + 1), 1480 * system_matrix.initialize(system_mf_storage); 1483 * system_matrix.initialize_dof_vector(solution); 1484 * system_matrix.initialize_dof_vector(system_rhs); 1486 * setup_time += time.wall_time(); 1487 * time_details << "Setup matrix-free system " << time.wall_time() << " s" 1491 * const unsigned int nlevels = triangulation.n_global_levels(); 1492 * mg_matrices.resize(0, nlevels - 1); 1494 * for (unsigned int level = 0; level < nlevels; ++level) 1496 * typename MatrixFree<dim, float>::AdditionalData additional_data; 1497 * additional_data.tasks_parallel_scheme = 1498 * MatrixFree<dim, float>::AdditionalData::none; 1499 * additional_data.mapping_update_flags = 1500 * (update_gradients | update_JxW_values); 1501 * additional_data.mapping_update_flags_inner_faces = 1502 * (update_gradients | update_JxW_values); 1503 * additional_data.mapping_update_flags_boundary_faces = 1504 * (update_gradients | update_JxW_values); 1505 * additional_data.mg_level = level; 1506 * const auto mg_mf_storage_level = 1507 * std::make_shared<MatrixFree<dim, float>>(); 1508 * mg_mf_storage_level->reinit(dof_handler, 1510 * QGauss<1>(fe.degree + 1), 1513 * mg_matrices[level].initialize(mg_mf_storage_level); 1515 * setup_time += time.wall_time(); 1516 * time_details << "Setup matrix-free levels " << time.wall_time() << " s" 1524 * The computation of the right hand side is a bit more complicated than in 1525 * @ref step_37 "step-37". The cell term now consists of the negative Laplacian of the 1526 * analytical solution, `RightHandSide`, for which we need to first split up 1527 * the Point of VectorizedArray fields, i.e., a batch of points, into a 1528 * single point by evaluating all lanes in the VectorizedArray 1529 * separately. Remember that the number of lanes depends on the hardware; it 1530 * could be 1 for systems that do not offer vectorization (or where deal.II 1531 * does not have intrinsics), but it could also be 8 or 16 on AVX-512 of 1532 * recent Intel architectures. 1535 * template <int dim, int fe_degree> 1536 * void LaplaceProblem<dim, fe_degree>::compute_rhs() 1540 * const MatrixFree<dim, double> &data = *system_matrix.get_matrix_free(); 1541 * FEEvaluation<dim, fe_degree> phi(data); 1542 * RightHandSide<dim> rhs_func; 1543 * Solution<dim> exact_solution; 1544 * for (unsigned int cell = 0; cell < data.n_cell_batches(); ++cell) 1547 * for (unsigned int q = 0; q < phi.n_q_points; ++q) 1549 * VectorizedArray<double> rhs_val = VectorizedArray<double>(); 1550 * Point<dim, VectorizedArray<double>> point_batch = 1551 * phi.quadrature_point(q); 1552 * for (unsigned int v = 0; v < VectorizedArray<double>::size(); ++v) 1554 * Point<dim> single_point; 1555 * for (unsigned int d = 0; d < dim; ++d) 1556 * single_point[d] = point_batch[d][v]; 1557 * rhs_val[v] = rhs_func.value(single_point); 1559 * phi.submit_value(rhs_val, q); 1561 * phi.integrate_scatter(true, false, system_rhs); 1566 * Secondly, we also need to apply the Dirichlet and Neumann boundary 1567 * conditions. This function is the missing part of to the function 1568 * `LaplaceOperator::apply_boundary()` function once the exterior solution 1569 * values @f$u^+ = -u^- + 2 g_\text{D}@f$ and @f$\mathbf{n}^-\cdot \nabla u^+ = 1570 * \mathbf{n}^-\cdot \nabla u^-@f$ on Dirichlet boundaries and @f$u^+=u^-@f$ and 1571 * @f$\mathbf{n}^-\cdot \nabla u^+ = -\mathbf{n}^-\cdot \nabla u^- + 2 1572 * g_\text{N}@f$ on Neumann boundaries are inserted and expanded in terms of 1573 * the boundary functions @f$g_\text{D}@f$ and @f$g_\text{N}@f$. One thing to 1574 * remember is that we move the boundary conditions to the right hand 1575 * side, so the sign is the opposite from what we imposed on the solution 1580 * We could have issued both the cell and the boundary part through a 1581 * MatrixFree::loop part, but we choose to manually write the full loop 1582 * over all faces to learn how the index layout of face indices is set up 1583 * in MatrixFree: Both the inner faces and the boundary faces share the 1584 * index range, and all batches of inner faces have lower numbers than the 1585 * batches of boundary cells. A single index for both variants allows us 1586 * to easily use the same data structure FEFaceEvaluation for both cases 1587 * that attaches to the same data field, just at different positions. The 1588 * number of inner face batches (where a batch is due to the combination 1589 * of several faces into one for vectorization) is given by 1590 * MatrixFree::n_inner_face_batches(), whereas the number of boundary face 1591 * batches is given by MatrixFree::n_boundary_face_batches(). 1594 * FEFaceEvaluation<dim, fe_degree> phi_face(data, true); 1595 * for (unsigned int face = data.n_inner_face_batches(); 1596 * face < data.n_inner_face_batches() + data.n_boundary_face_batches(); 1599 * phi_face.reinit(face); 1601 * const VectorizedArray<double> inverse_length_normal_to_face = 1602 * std::abs((phi_face.get_normal_vector(0) * 1603 * phi_face.inverse_jacobian(0))[dim - 1]); 1604 * const VectorizedArray<double> sigma = 1605 * inverse_length_normal_to_face * system_matrix.get_penalty_factor(); 1607 * for (unsigned int q = 0; q < phi_face.n_q_points; ++q) 1609 * VectorizedArray<double> test_value = VectorizedArray<double>(), 1610 * test_normal_derivative = 1611 * VectorizedArray<double>(); 1612 * Point<dim, VectorizedArray<double>> point_batch = 1613 * phi_face.quadrature_point(q); 1615 * for (unsigned int v = 0; v < VectorizedArray<double>::size(); ++v) 1617 * Point<dim> single_point; 1618 * for (unsigned int d = 0; d < dim; ++d) 1619 * single_point[d] = point_batch[d][v]; 1623 * The MatrixFree class lets us query the boundary_id of the 1624 * current face batch. Remember that MatrixFree sets up the 1625 * batches for vectorization such that all faces within a 1626 * batch have the same properties, which includes their 1627 * `boundary_id`. Thus, we can query that id here for the 1628 * current face index `face` and either impose the Dirichlet 1629 * case (where we add something to the function value) or the 1630 * Neumann case (where we add something to the normal 1634 * if (data.get_boundary_id(face) == 0) 1635 * test_value[v] = 2.0 * exact_solution.value(single_point); 1638 * Tensor<1, dim> normal; 1639 * for (unsigned int d = 0; d < dim; ++d) 1640 * normal[d] = phi_face.get_normal_vector(q)[d][v]; 1641 * test_normal_derivative[v] = 1642 * -normal * exact_solution.gradient(single_point); 1645 * phi_face.submit_value(test_value * sigma - test_normal_derivative, 1647 * phi_face.submit_normal_derivative(-0.5 * test_value, q); 1649 * phi_face.integrate_scatter(true, true, system_rhs); 1654 * Since we have manually run the loop over cells rather than using 1655 * MatrixFree::loop(), we must not forget to perform the data exchange 1656 * with MPI - or actually, we would not need that for DG elements here 1657 * because each cell carries its own degrees of freedom and cell and 1658 * boundary integrals only evaluate quantities on the locally owned 1659 * cells. The coupling to neighboring subdomain only comes in by the inner 1660 * face integrals, which we have not done here. That said, it does not 1661 * hurt to call this function here, so we do it as a reminder of what 1662 * happens inside MatrixFree::loop(). 1665 * system_rhs.compress(VectorOperation::add); 1666 * setup_time += time.wall_time(); 1667 * time_details << "Compute right hand side " << time.wall_time() 1675 * The `solve()` function is copied almost verbatim from @ref step_37 "step-37". We set up 1676 * the same multigrid ingredients, namely the level transfer, a smoother, 1677 * and a coarse grid solver. The only difference is the fact that we do not 1678 * use the diagonal of the Laplacian for the preconditioner of the Chebyshev 1679 * iteration used for smoothing, but instead our newly resolved class 1680 * `%PreconditionBlockJacobi`. The mechanisms are the same, though. 1683 * template <int dim, int fe_degree> 1684 * void LaplaceProblem<dim, fe_degree>::solve() 1687 * MGTransferMatrixFree<dim, float> mg_transfer; 1688 * mg_transfer.build(dof_handler); 1689 * setup_time += time.wall_time(); 1690 * time_details << "MG build transfer time " << time.wall_time() 1694 * using SmootherType = 1695 * PreconditionChebyshev<LevelMatrixType, 1696 * LinearAlgebra::distributed::Vector<float>, 1697 * PreconditionBlockJacobi<dim, fe_degree, float>>; 1698 * mg::SmootherRelaxation<SmootherType, 1699 * LinearAlgebra::distributed::Vector<float>> 1701 * MGLevelObject<typename SmootherType::AdditionalData> smoother_data; 1702 * smoother_data.resize(0, triangulation.n_global_levels() - 1); 1703 * for (unsigned int level = 0; level < triangulation.n_global_levels(); 1708 * smoother_data[level].smoothing_range = 15.; 1709 * smoother_data[level].degree = 3; 1710 * smoother_data[level].eig_cg_n_iterations = 10; 1714 * smoother_data[0].smoothing_range = 2e-2; 1715 * smoother_data[0].degree = numbers::invalid_unsigned_int; 1716 * smoother_data[0].eig_cg_n_iterations = mg_matrices[0].m(); 1718 * smoother_data[level].preconditioner = 1719 * std::make_shared<PreconditionBlockJacobi<dim, fe_degree, float>>(); 1720 * smoother_data[level].preconditioner->initialize(mg_matrices[level]); 1722 * mg_smoother.initialize(mg_matrices, smoother_data); 1724 * MGCoarseGridApplySmoother<LinearAlgebra::distributed::Vector<float>> 1726 * mg_coarse.initialize(mg_smoother); 1728 * mg::Matrix<LinearAlgebra::distributed::Vector<float>> mg_matrix( 1731 * Multigrid<LinearAlgebra::distributed::Vector<float>> mg( 1732 * mg_matrix, mg_coarse, mg_transfer, mg_smoother, mg_smoother); 1734 * PreconditionMG<dim, 1735 * LinearAlgebra::distributed::Vector<float>, 1736 * MGTransferMatrixFree<dim, float>> 1737 * preconditioner(dof_handler, mg, mg_transfer); 1739 * SolverControl solver_control(10000, 1e-12 * system_rhs.l2_norm()); 1740 * SolverCG<LinearAlgebra::distributed::Vector<double>> cg(solver_control); 1741 * setup_time += time.wall_time(); 1742 * time_details << "MG build smoother time " << time.wall_time() 1744 * pcout << "Total setup time " << setup_time << " s\n"; 1748 * cg.solve(system_matrix, solution, system_rhs, preconditioner); 1750 * pcout << "Time solve (" << solver_control.last_step() << " iterations) " 1751 * << time.wall_time() << " s" << std::endl; 1758 * Since we have solved a problem with analytic solution, we want to verify 1759 * the correctness of our implementation by computing the L2 error of the 1760 * numerical result against the analytic solution. 1766 * template <int dim, int fe_degree> 1767 * void LaplaceProblem<dim, fe_degree>::analyze_results() const 1769 * Vector<float> error_per_cell(triangulation.n_active_cells()); 1770 * VectorTools::integrate_difference(dof_handler, 1774 * QGauss<dim>(fe.degree + 2), 1775 * VectorTools::L2_norm); 1776 * pcout << "Verification via L2 error: " 1778 * Utilities::MPI::sum(error_per_cell.norm_sqr(), MPI_COMM_WORLD)) 1786 * The `run()` function sets up the initial grid and then runs the multigrid 1787 * program in the usual way. As a domain, we choose a rectangle with 1788 * periodic boundary conditions in the @f$x@f$-direction, a Dirichlet condition 1789 * on the front face in @f$y@f$ direction (i.e., the face with index number 2, 1790 * with boundary id equal to 0), and Neumann conditions on the back face as 1791 * well as the two faces in @f$z@f$ direction for the 3D case (with boundary id 1792 * equal to 1). The extent of the domain is a bit different in the @f$x@f$ 1793 * direction (where we want to achieve a periodic solution given the 1794 * definition of `Solution`) as compared to the @f$y@f$ and @f$z@f$ directions. 1800 * template <int dim, int fe_degree> 1801 * void LaplaceProblem<dim, fe_degree>::run() 1803 * const unsigned int n_ranks = 1804 * Utilities::MPI::n_mpi_processes(MPI_COMM_WORLD); 1805 * pcout << "Running with " << n_ranks << " MPI process" 1806 * << (n_ranks > 1 ? "es" : "") << ", element " << fe.get_name() 1809 * for (unsigned int cycle = 0; cycle < 9 - dim; ++cycle) 1811 * pcout << "Cycle " << cycle << std::endl; 1815 * Point<dim> upper_right; 1816 * upper_right[0] = 2.5; 1817 * for (unsigned int d = 1; d < dim; ++d) 1818 * upper_right[d] = 2.8; 1819 * GridGenerator::hyper_rectangle(triangulation, 1822 * triangulation.begin_active()->face(0)->set_boundary_id(10); 1823 * triangulation.begin_active()->face(1)->set_boundary_id(11); 1824 * triangulation.begin_active()->face(2)->set_boundary_id(0); 1825 * for (unsigned int f = 3; f < GeometryInfo<dim>::faces_per_cell; ++f) 1826 * triangulation.begin_active()->face(f)->set_boundary_id(1); 1828 * std::vector<GridTools::PeriodicFacePair< 1829 * typename Triangulation<dim>::cell_iterator>> 1831 * GridTools::collect_periodic_faces( 1832 * triangulation, 10, 11, 0, periodic_faces); 1833 * triangulation.add_periodicity(periodic_faces); 1835 * triangulation.refine_global(6 - 2 * dim); 1837 * triangulation.refine_global(1); 1841 * analyze_results(); 1842 * pcout << std::endl; 1845 * } // namespace Step59 1851 * There is nothing unexpected in the `main()` function. We call `MPI_Init()` 1852 * through the `MPI_InitFinalize` class, pass on the two parameters on the 1853 * dimension and the degree set at the top of the file, and run the Laplace 1860 * int main(int argc, char *argv[]) 1864 * using namespace Step59; 1866 * Utilities::MPI::MPI_InitFinalize mpi_init(argc, argv, 1); 1868 * LaplaceProblem<dimension, degree_finite_element> laplace_problem; 1869 * laplace_problem.run(); 1871 * catch (std::exception &exc) 1873 * std::cerr << std::endl 1875 * << "----------------------------------------------------" 1877 * std::cerr << "Exception on processing: " << std::endl 1878 * << exc.what() << std::endl 1879 * << "Aborting!" << std::endl 1880 * << "----------------------------------------------------" 1886 * std::cerr << std::endl 1888 * << "----------------------------------------------------" 1890 * std::cerr << "Unknown exception!" << std::endl 1891 * << "Aborting!" << std::endl 1892 * << "----------------------------------------------------" 1900 <a name="Results"></a><h1>Results</h1> 1903 <a name="Programoutput"></a><h3>Program output</h3> 1906 Like in @ref step_37 "step-37", we evaluate the multigrid solver in terms of run time. In 1907 two space dimensions with elements of degree 8, a possible output could look 1910 Running with 12 MPI processes, element FE_DGQHermite<2>(8) 1913 Number of degrees of freedom: 5184 1914 Total setup time 0.0282445 s 1915 Time solve (14 iterations) 0.0110712 s 1916 Verification via L2 error: 1.66232e-07 1919 Number of degrees of freedom: 20736 1920 Total setup time 0.0126282 s 1921 Time solve (14 iterations) 0.0157021 s 1922 Verification via L2 error: 2.91505e-10 1925 Number of degrees of freedom: 82944 1926 Total setup time 0.0227573 s 1927 Time solve (14 iterations) 0.026568 s 1928 Verification via L2 error: 6.64514e-13 1931 Number of degrees of freedom: 331776 1932 Total setup time 0.0604685 s 1933 Time solve (14 iterations) 0.0628356 s 1934 Verification via L2 error: 5.57513e-13 1937 Number of degrees of freedom: 1327104 1938 Total setup time 0.154359 s 1939 Time solve (13 iterations) 0.219555 s 1940 Verification via L2 error: 3.08139e-12 1943 Number of degrees of freedom: 5308416 1944 Total setup time 0.467764 s 1945 Time solve (13 iterations) 1.1821 s 1946 Verification via L2 error: 3.90334e-12 1949 Number of degrees of freedom: 21233664 1950 Total setup time 1.73263 s 1951 Time solve (13 iterations) 5.21054 s 1952 Verification via L2 error: 4.94543e-12 1955 Like in @ref step_37 "step-37", the number of CG iterations remains constant with increasing 1956 problem size. The iteration counts are a bit higher, which is because we use a 1957 lower degree of the Chebyshev polynomial (2 vs 5 in @ref step_37 "step-37") and because the 1958 interior penalty discretization has a somewhat larger spread in 1959 eigenvalues. Nonetheless, 13 iterations to reduce the residual by 12 orders of 1960 magnitude, or almost a factor of 9 per iteration, indicates an overall very 1961 efficient method. In particular, we can solve a system with 21 million degrees 1962 of freedom in 5 seconds when using 12 cores, which is a very good 1963 efficiency. Of course, in 2D we are well inside the regime of roundoff for a 1964 polynomial degree of 8 – as a matter of fact, around 83k DoFs or 0.025s 1965 would have been enough to fully converge this (simple) analytic solution 1968 Not much changes if we run the program in three spatial dimensions, except for 1969 the fact that we now use do something more useful with the higher polynomial 1970 degree and increasing mesh sizes, as the roundoff errors are only obtained at 1971 the finest mesh. Still, it is remarkable that we can solve a 3D Laplace 1972 problem with a wave of three periods to roundoff accuracy on a twelve-core 1973 machine pretty easily - using about 3.5 GB of memory in total for the second 1974 to largest case with 24m DoFs, taking not more than eight seconds. The largest 1975 case uses 30GB of memory with 191m DoFs. 1978 Running with 12 MPI processes, element FE_DGQHermite<3>(8) 1981 Number of degrees of freedom: 5832 1982 Total setup time 0.0210681 s 1983 Time solve (15 iterations) 0.0956945 s 1984 Verification via L2 error: 0.0297194 1987 Number of degrees of freedom: 46656 1988 Total setup time 0.0452428 s 1989 Time solve (15 iterations) 0.113827 s 1990 Verification via L2 error: 9.55733e-05 1993 Number of degrees of freedom: 373248 1994 Total setup time 0.190423 s 1995 Time solve (15 iterations) 0.218309 s 1996 Verification via L2 error: 2.6868e-07 1999 Number of degrees of freedom: 2985984 2000 Total setup time 0.627914 s 2001 Time solve (15 iterations) 1.0595 s 2002 Verification via L2 error: 4.6918e-10 2005 Number of degrees of freedom: 23887872 2006 Total setup time 2.85215 s 2007 Time solve (15 iterations) 8.30576 s 2008 Verification via L2 error: 9.38583e-13 2011 Number of degrees of freedom: 191102976 2012 Total setup time 16.1324 s 2013 Time solve (15 iterations) 65.57 s 2014 Verification via L2 error: 3.17875e-13 2017 <a name="Comparisonofefficiencyatdifferentpolynomialdegrees"></a><h3>Comparison of efficiency at different polynomial degrees</h3> 2020 In the introduction and in-code comments, it was mentioned several times that 2021 high orders are treated very efficiently with the FEEvaluation and 2022 FEFaceEvaluation evaluators. Now, we want to substantiate these claims by 2023 looking at the throughput of the 3D multigrid solver for various polynomial 2024 degrees. We collect the times as follows: We first run a solver at problem 2025 size close to ten million, indicated in the first four table rows, and record 2026 the timings. Then, we normalize the throughput by recording the number of 2027 million degrees of freedom solved per second (MDoFs/s) to be able to compare 2028 the efficiency of the different degrees, which is computed by dividing the 2029 number of degrees of freedom by the solver time. 2031 <table align="center" class="doxtable"> 2048 <th>Number of DoFs</th> 2063 <th>Number of iterations</th> 2078 <th>Solver time [s]</th> 2109 We clearly see how the efficiency per DoF initially improves until it reaches 2110 a maximum for the polynomial degree @f$k=4@f$. This effect is surprising, not only 2111 because higher polynomial degrees often yield a vastly better solution, but 2112 especially also when having matrix-based schemes in mind where the denser 2113 coupling at higher degree leads to a monotonously decreasing throughput (and a 2114 drastic one in 3D, with @f$k=4@f$ being more than ten times slower than 2115 @f$k=1@f$!). For higher degrees, the throughput decreases a bit, which is both due 2116 to an increase in the number of iterations (going from 12 at @f$k=2,3,4@f$ to 19 2117 at @f$k=10@f$) and due to the @f$\mathcal O(k)@f$ complexity of operator 2118 evaluation. Nonetheless, efficiency as the time to solution would be still 2119 better for higher polynomial degrees because they have better convergence rates (at least 2120 for problems as simple as this one): For @f$k=12@f$, we reach roundoff accuracy 2121 already with 1 million DoFs (solver time less than a second), whereas for @f$k=8@f$ 2122 we need 24 million DoFs and 8 seconds. For @f$k=5@f$, the error is around 2123 @f$10^{-9}@f$ with 57m DoFs and thus still far away from roundoff, despite taking 16 2126 Note that the above numbers are a bit pessimistic because they include the 2127 time it takes the Chebyshev smoother to compute an eigenvalue estimate, which 2128 is around 10 percent of the solver time. If the system is solved several times 2129 (as e.g. common in fluid dynamics), this eigenvalue cost is only paid once and 2130 faster times become available. 2132 <a name="Evaluationofefficiencyofingredients"></a><h3>Evaluation of efficiency of ingredients</h3> 2135 Finally, we take a look at some of the special ingredients presented in this 2136 tutorial program, namely the FE_DGQHermite basis in particular and the 2137 specification of MatrixFree::DataAccessOnFaces. In the following table, the 2138 third row shows the optimized solver above, the fourth row shows the timings 2139 with only the MatrixFree::DataAccessOnFaces set to `unspecified` rather than 2140 the optimal `gradients`, and the last one with replacing FE_DGQHermite by the 2141 basic FE_DGQ elements where both the MPI exchange are more expensive and the 2142 operations done by FEFaceEvaluation::gather_evaluate() and 2143 FEFaceEvaluation::integrate_scatter(). 2145 <table align="center" class="doxtable"> 2162 <th>Number of DoFs</th> 2177 <th>Solver time optimized as in tutorial [s]</th> 2192 <th>Solver time MatrixFree::DataAccessOnFaces::unspecified [s]</th> 2207 <th>Solver time FE_DGQ [s]</th> 2223 The data in the table shows that not using MatrixFree::DataAccessOnFaces 2224 increases costs by around 10% for higher polynomial degrees. For lower 2225 degrees, the difference is obviously less pronounced because the 2226 volume-to-surface ratio is more beneficial and less data needs to be 2227 exchanged. The difference is larger when looking at the matrix-vector product 2228 only, rather than the full multigrid solver shown here, with around 20% worse 2229 timings just because of the MPI communication. 2231 For @f$k=1@f$ and @f$k=2@f$, the Hermite-like basis functions do obviously not really 2232 pay off (indeed, for @f$k=1@f$ the polynomials are exactly the same as for FE_DGQ) 2233 and the results are similar as with the FE_DGQ basis. However, for degrees 2234 starting at three, we see an increasing advantage for FE_DGQHermite, showing 2235 the effectiveness of these basis functions. 2237 <a name="Possibilitiesforextension"></a><h3>Possibilities for extension</h3> 2240 As mentioned in the introduction, the fast diagonalization method is tied to a 2241 Cartesian mesh with constant coefficients. If we wanted to solve 2242 variable-coefficient problems, we would need to invest a bit more time in the 2243 design of the smoother parameters by selecting proper generalizations (e.g., 2244 approximating the inverse on the nearest box-shaped element). 2246 Another way of extending the program would be to include support for adaptive 2247 meshes, for which interface operations at edges of different refinement 2248 level become necessary, as discussed in @ref step_39 "step-39". 2251 <a name="PlainProg"></a> 2252 <h1> The plain program</h1> 2253 @include "step-59.cc"
void loop(ITERATOR begin, typename identity< ITERATOR >::type end, DOFINFO &dinfo, INFOBOX &info, const std::function< void(DOFINFO &, typename INFOBOX::CellInfo &)> &cell_worker, const std::function< void(DOFINFO &, typename INFOBOX::CellInfo &)> &boundary_worker, const std::function< void(DOFINFO &, DOFINFO &, typename INFOBOX::CellInfo &, typename INFOBOX::CellInfo &)> &face_worker, ASSEMBLER &assembler, const LoopControl &lctrl=LoopControl())
Expression sign(const Expression &x)
__global__ void reduction(Number *result, const Number *v, const size_type N)
void integrate(const bool integrate_values, const bool integrate_gradients)
Contents is actually a matrix.
void read_dof_values(const VectorType &src, const unsigned int first_index=0)
static const types::blas_int one
void gather_evaluate(const VectorType &input_vector, const bool evaluate_values, const bool evaluate_gradients, const bool evaluate_hessians=false)
void call(const std::function< RT()> &function, internal::return_value< RT > &ret_val)
void random(DoFHandlerType &dof_handler)
T sum(const T &t, const MPI_Comm &mpi_communicator)
Point< spacedim > point(const gp_Pnt &p, const double tolerance=1e-10)
SymmetricTensor< 2, dim, Number > d(const Tensor< 2, dim, Number > &F, const Tensor< 2, dim, Number > &dF_dt)
void approximate(SynchronousIterators< std::tuple< TriaActiveIterator< ::DoFCellAccessor< DoFHandlerType< dim, spacedim >, false >>, Vector< float >::iterator >> const &cell, const Mapping< dim, spacedim > &mapping, const DoFHandlerType< dim, spacedim > &dof_handler, const InputVector &solution, const unsigned int component)
static const types::blas_int zero
void integrate_scatter(const bool integrate_values, const bool integrate_gradients, VectorType &output_vector)
inline ::VectorizedArray< Number, width > abs(const ::VectorizedArray< Number, width > &x)
int(&) functions(const void *v1, const void *v2)