I've been working on a small computer vision program that works with the Intel RealSense depth camera and does some plane detection with RANSAC. Recently, I noticed that the program suddenly ran only at about 1/10th of the framerate I had acheived previously (~ 4 FPS as opposed to 30 FPS).
After some digging, I traced the issue down to this line:
if (fabs(plane.n.dot(*((Eigen::Vector3f*)(&point))) - plane.d) < distance*0.01)
continue;
This calculates the distance from a 3D point to the plane and skips further processing if the distance is below a certain threshold. So far, so good. The calculation basically is more or less a single dot product between two vectors, so 3 float multiplications and 3 float additions (and one comparison).
How can this tiny bit of code slow down my program like this? I compiled everything with profiling information, rangprof
on the result and voilá:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
5.71 0.51 0.51 25944017 0.00 0.00 rs_deproject_pixel_to_point(float*, rs_intrinsics const*, float const*, float)
5.09 0.96 0.45 205 0.00 0.03 deproject_all(unsigned short*, unsigned char*)
4.41 1.35 0.39 114585359 0.00 0.00 Eigen::internal::binary_evaluator, Eigen::Matrix const, Eigen::Matrix const>, Eigen::internal::IndexBased, Eigen::internal::IndexBased, float, float>::coeff(long, long) const
4.29 1.73 0.38 229171022 0.00 0.00 Eigen::internal::evaluator > >::coeff(long, long) const
3.05 2.00 0.27 196776158 0.00 0.00 Eigen::DenseStorage::rows()
2.54 2.22 0.23 25944017 0.00 0.00 rs::intrinsics::deproject(rs::float2 const&, float) const
2.37 2.43 0.21 38195120 0.00 0.00 float Eigen::DenseBase, Eigen::Matrix const, Eigen::Matrix const> >::redux >(Eigen::internal::scalar_sum_op const&) const
2.15 2.62 0.19 114585360 0.00 0.00 float Eigen::internal::pmul(float const&, float const&)
2.09 2.81 0.19 114585361 0.00 0.00 Eigen::internal::scalar_conj_product_op::operator()(float const&, float const&) const
2.03 2.99 0.18 229171051 0.00 0.00 Eigen::internal::variable_if_dynamic::value()
2.03 3.17 0.18 194341549 0.00 0.00 Eigen::DenseStorage::cols()
2.03 3.35 0.18 114585363 0.00 0.00 Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >::coeffByOuterInner(long, long) const
1.98 3.52 0.18 38195114 0.00 0.00 Eigen::internal::redux_novec_unroller, Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >, 1, 2>::run(Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> > const&, Eigen::internal::scalar_sum_op const&)
1.86 3.69 0.17 76855171 0.00 0.00 Eigen::internal::variable_if_dynamic::variable_if_dynamic(long)
1.81 3.85 0.16 114585363 0.00 0.00 Eigen::internal::conj_helper::pmul(float const&, float const&) const
1.81 4.01 0.16 38195117 0.00 0.00 Eigen::CwiseBinaryOp, Eigen::Matrix const, Eigen::Matrix const>::CwiseBinaryOp(Eigen::Matrix const&, Eigen::Matrix const&, Eigen::internal::scalar_conj_product_op const&)
1.70 4.16 0.15 76855108 0.00 0.00 Eigen::internal::evaluator_base >::~evaluator_base()
1.70 4.31 0.15 196776157 0.00 0.00 Eigen::PlainObjectBase >::rows() const
1.70 4.46 0.15 194341552 0.00 0.00 Eigen::PlainObjectBase >::cols() const
1.70 4.61 0.15 78941923 0.00 0.00 Eigen::PlainObjectBase >::data() const
1.58 4.75 0.14 76390238 0.00 0.00 Eigen::internal::scalar_conj_product_op::scalar_conj_product_op(Eigen::internal::scalar_conj_product_op const&)
1.53 4.88 0.14 234852543 0.00 0.00 Eigen::EigenBase >::derived() const
1.53 5.02 0.14 38195120 0.00 0.00 Eigen::internal::dot_nocheck, Eigen::Matrix, false>::run(Eigen::MatrixBase > const&, Eigen::MatrixBase > const&)
1.47 5.15 0.13 38195120 0.00 0.00 Eigen::internal::binary_evaluator, Eigen::Matrix const, Eigen::Matrix const>, Eigen::internal::IndexBased, Eigen::internal::IndexBased, float, float>::binary_evaluator(Eigen::CwiseBinaryOp, Eigen::Matrix const, Eigen::Matrix const> const&)
1.30 5.26 0.12 76390438 0.00 0.00 Eigen::internal::scalar_sum_op::operator()(float const&, float const&) const
1.24 5.37 0.11 38195121 0.00 0.00 Eigen::internal::binary_evaluator, Eigen::Matrix const, Eigen::Matrix const>, Eigen::internal::IndexBased, Eigen::internal::IndexBased, float, float>::~binary_evaluator()
1.24 5.48 0.11 76855110 0.00 0.00 Eigen::internal::evaluator > >::evaluator(Eigen::PlainObjectBase > const&)
1.19 5.59 0.11 38195120 0.00 0.00 Eigen::CwiseBinaryOp, Eigen::Matrix const, Eigen::Matrix const> const Eigen::MatrixBase >::binaryExpr, Eigen::Matrix >(Eigen::MatrixBase > const&, Eigen::internal::scalar_conj_product_op const&) const
1.13 5.69 0.10 38195120 0.00 0.00 Eigen::internal::redux_novec_unroller, Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >, 0, 3>::run(Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> > const&, Eigen::internal::scalar_sum_op const&)
1.13 5.79 0.10 38195119 0.00 0.00 Eigen::internal::scalar_product_traits >::Scalar>::ReturnType Eigen::MatrixBase >::dot >(Eigen::MatrixBase > const&) const
1.07 5.88 0.10 76506773 0.00 0.00 Eigen::internal::evaluator const>::evaluator(Eigen::Matrix const&)
1.02 5.97 0.09 76855103 0.00 0.00 Eigen::internal::evaluator > >::~evaluator()
1.02 6.06 0.09 76506774 0.00 0.00 Eigen::internal::evaluator const>::~evaluator()
1.02 6.15 0.09 38195120 0.00 0.00 Eigen::internal::redux_novec_unroller, Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >, 2, 1>::run(Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> > const&, Eigen::internal::scalar_sum_op const&)
1.02 6.24 0.09 38195119 0.00 0.00 Eigen::internal::evaluator_base, Eigen::Matrix const, Eigen::Matrix const> >::~evaluator_base()
0.90 6.32 0.08 78477082 0.00 0.00 Eigen::EigenBase >::rows() const
0.90 6.40 0.08 38195118 0.00 0.00 Eigen::internal::redux_evaluator, Eigen::Matrix const, Eigen::Matrix const> >::redux_evaluator(Eigen::CwiseBinaryOp, Eigen::Matrix const, Eigen::Matrix const> const&)
0.90 6.48 0.08 38195118 0.00 0.00 Eigen::internal::special_scalar_op_base, Eigen::Matrix const, Eigen::Matrix const>, float, float, Eigen::DenseCoeffsBase, Eigen::Matrix const, Eigen::Matrix const>, 0>, false>::special_scalar_op_base()
0.85 6.56 0.08 119803981 0.00 0.00 Eigen::internal::noncopyable::noncopyable()
WHAT. THE. FRACK.
(This actually goes on for a couple 100 pages more.) I've used Eigen for the linear algebra stuff, which is a very useful and handy template library, and that's obviously the issue here - Eigen makes such heavy use of C++ templates, stacking them I don't know how many levels deep (3 or 4 at the very least), that the compiler has to generate insane amounts of temporary objects, operator calls, constructors and whatever else. For a single dot product.
After seeing this, I figured that a lot of that can probably be inlined by the compiler, and oh look, the Makefile
I was using didn't enable any compiler optimizations at all.
So I added -Ofast
to my CFLAGS
entry, and the entire issue was fixed. By one single compiler flag. Morale: never use template libraries without optimizations.