Logo

C++, Templates, Compilers and Optimization.

You got to be kidding me.

I've been working on a small computer vision program that works with the Intel RealSense depth camera and does some plane detection with RANSAC. Recently, I noticed that the program suddenly ran only at about 1/10th of the framerate I had acheived previously (~ 4 FPS as opposed to 30 FPS).

After some digging, I traced the issue down to this line:


if (fabs(plane.n.dot(*((Eigen::Vector3f*)(&point))) - plane.d) < distance*0.01)
	continue;

This calculates the distance from a 3D point to the plane and skips further processing if the distance is below a certain threshold. So far, so good. The calculation basically is more or less a single dot product between two vectors, so 3 float multiplications and 3 float additions (and one comparison).

How can this tiny bit of code slow down my program like this? I compiled everything with profiling information, ran gprof on the result and voilá:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
  5.71      0.51     0.51 25944017     0.00     0.00  rs_deproject_pixel_to_point(float*, rs_intrinsics const*, float const*, float)
  5.09      0.96     0.45      205     0.00     0.03  deproject_all(unsigned short*, unsigned char*)
  4.41      1.35     0.39 114585359     0.00     0.00  Eigen::internal::binary_evaluator, Eigen::Matrix const, Eigen::Matrix const>, Eigen::internal::IndexBased, Eigen::internal::IndexBased, float, float>::coeff(long, long) const
  4.29      1.73     0.38 229171022     0.00     0.00  Eigen::internal::evaluator > >::coeff(long, long) const
  3.05      2.00     0.27 196776158     0.00     0.00  Eigen::DenseStorage::rows()
  2.54      2.22     0.23 25944017     0.00     0.00  rs::intrinsics::deproject(rs::float2 const&, float) const
  2.37      2.43     0.21 38195120     0.00     0.00  float Eigen::DenseBase, Eigen::Matrix const, Eigen::Matrix const> >::redux >(Eigen::internal::scalar_sum_op const&) const
  2.15      2.62     0.19 114585360     0.00     0.00  float Eigen::internal::pmul(float const&, float const&)
  2.09      2.81     0.19 114585361     0.00     0.00  Eigen::internal::scalar_conj_product_op::operator()(float const&, float const&) const
  2.03      2.99     0.18 229171051     0.00     0.00  Eigen::internal::variable_if_dynamic::value()
  2.03      3.17     0.18 194341549     0.00     0.00  Eigen::DenseStorage::cols()
  2.03      3.35     0.18 114585363     0.00     0.00  Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >::coeffByOuterInner(long, long) const
  1.98      3.52     0.18 38195114     0.00     0.00  Eigen::internal::redux_novec_unroller, Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >, 1, 2>::run(Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> > const&, Eigen::internal::scalar_sum_op const&)
  1.86      3.69     0.17 76855171     0.00     0.00  Eigen::internal::variable_if_dynamic::variable_if_dynamic(long)
  1.81      3.85     0.16 114585363     0.00     0.00  Eigen::internal::conj_helper::pmul(float const&, float const&) const
  1.81      4.01     0.16 38195117     0.00     0.00  Eigen::CwiseBinaryOp, Eigen::Matrix const, Eigen::Matrix const>::CwiseBinaryOp(Eigen::Matrix const&, Eigen::Matrix const&, Eigen::internal::scalar_conj_product_op const&)
  1.70      4.16     0.15 76855108     0.00     0.00  Eigen::internal::evaluator_base >::~evaluator_base()
  1.70      4.31     0.15 196776157     0.00     0.00  Eigen::PlainObjectBase >::rows() const
  1.70      4.46     0.15 194341552     0.00     0.00  Eigen::PlainObjectBase >::cols() const
  1.70      4.61     0.15 78941923     0.00     0.00  Eigen::PlainObjectBase >::data() const
  1.58      4.75     0.14 76390238     0.00     0.00  Eigen::internal::scalar_conj_product_op::scalar_conj_product_op(Eigen::internal::scalar_conj_product_op const&)
  1.53      4.88     0.14 234852543     0.00     0.00  Eigen::EigenBase >::derived() const
  1.53      5.02     0.14 38195120     0.00     0.00  Eigen::internal::dot_nocheck, Eigen::Matrix, false>::run(Eigen::MatrixBase > const&, Eigen::MatrixBase > const&)
  1.47      5.15     0.13 38195120     0.00     0.00  Eigen::internal::binary_evaluator, Eigen::Matrix const, Eigen::Matrix const>, Eigen::internal::IndexBased, Eigen::internal::IndexBased, float, float>::binary_evaluator(Eigen::CwiseBinaryOp, Eigen::Matrix const, Eigen::Matrix const> const&)
  1.30      5.26     0.12 76390438     0.00     0.00  Eigen::internal::scalar_sum_op::operator()(float const&, float const&) const
  1.24      5.37     0.11 38195121     0.00     0.00  Eigen::internal::binary_evaluator, Eigen::Matrix const, Eigen::Matrix const>, Eigen::internal::IndexBased, Eigen::internal::IndexBased, float, float>::~binary_evaluator()
  1.24      5.48     0.11 76855110     0.00     0.00  Eigen::internal::evaluator > >::evaluator(Eigen::PlainObjectBase > const&)
  1.19      5.59     0.11 38195120     0.00     0.00  Eigen::CwiseBinaryOp, Eigen::Matrix const, Eigen::Matrix const> const Eigen::MatrixBase >::binaryExpr, Eigen::Matrix >(Eigen::MatrixBase > const&, Eigen::internal::scalar_conj_product_op const&) const
  1.13      5.69     0.10 38195120     0.00     0.00  Eigen::internal::redux_novec_unroller, Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >, 0, 3>::run(Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> > const&, Eigen::internal::scalar_sum_op const&)
  1.13      5.79     0.10 38195119     0.00     0.00  Eigen::internal::scalar_product_traits >::Scalar>::ReturnType Eigen::MatrixBase >::dot >(Eigen::MatrixBase > const&) const
  1.07      5.88     0.10 76506773     0.00     0.00  Eigen::internal::evaluator const>::evaluator(Eigen::Matrix const&)
  1.02      5.97     0.09 76855103     0.00     0.00  Eigen::internal::evaluator > >::~evaluator()
  1.02      6.06     0.09 76506774     0.00     0.00  Eigen::internal::evaluator const>::~evaluator()
  1.02      6.15     0.09 38195120     0.00     0.00  Eigen::internal::redux_novec_unroller, Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >, 2, 1>::run(Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> > const&, Eigen::internal::scalar_sum_op const&)
  1.02      6.24     0.09 38195119     0.00     0.00  Eigen::internal::evaluator_base, Eigen::Matrix const, Eigen::Matrix const> >::~evaluator_base()
  0.90      6.32     0.08 78477082     0.00     0.00  Eigen::EigenBase >::rows() const
  0.90      6.40     0.08 38195118     0.00     0.00  Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >::redux_evaluator(Eigen::CwiseBinaryOp, Eigen::Matrix const, Eigen::Matrix const> const&)
  0.90      6.48     0.08 38195118     0.00     0.00  Eigen::internal::special_scalar_op_base, Eigen::Matrix const, Eigen::Matrix const>, float, float, Eigen::DenseCoeffsBase, Eigen::Matrix const, Eigen::Matrix const>, 0>, false>::special_scalar_op_base()
  0.85      6.56     0.08 119803981     0.00     0.00  Eigen::internal::noncopyable::noncopyable()

WHAT. THE. FRACK.

(This actually goes on for a couple 100 pages more.) I've used Eigen for the linear algebra stuff, which is a very useful and handy template library, and that's obviously the issue here - Eigen makes such heavy use of C++ templates, stacking them I don't know how many levels deep (3 or 4 at the very least), that the compiler has to generate insane amounts of temporary objects, operator calls, constructors and whatever else. For a single dot product.

After seeing this, I figured that a lot of that can probably be inlined by the compiler, and oh look, the Makefile I was using didn't enable any compiler optimizations at all. So I added -Ofast to my CFLAGS entry, and the entire issue was fixed. By one single compiler flag. Morale: never use template libraries without optimizations.