Re: Use of a Nvidia K40 in OLB
most likely you have 2 E5-2620v4 which are 8 core/16 thread CPUs. In total you will see 32 virtual cores but there are only 16 physical cores, so the best performance will be reached with 16 MPI jobs. With anything above 16 you have additional context switches that take time and limit the available cache per core.
For additional information about that I recommend to read about Simultaneous multithreading (SMT) or Hyper-threading (HT).