Java Specialists' Java Training Europehome of the java specialists' newsletter

The Java Specialists' Newsletter
Issue 1352006-11-06 Category: Performance Java version: JDK 5

GitHub Subscribe Free RSS Feed

Are you really Multi-Core?

by Dr. Heinz M. Kabutz
Abstract:
With Java 5, we can measure CPU cycles per thread. Here is a small program that runs several CPU intensive tasks in separate threads and then compares the elapsed time to the total CPU time of the threads. The factor should give you some indication of the CPU based acceleration that the multi cores are giving you.

Welcome to the 135th edition of The Java(tm) Specialists' Newsletter, now sent from a beautiful little island in Greece. We arrived safely two weeks ago and have been running around organising the basics, such as purchasing a vehicle, opening a bank account, getting cell phone contracts. Things happen really quickly in Greece. We can get my wife's Greek birth certificate in one week. In South Africa, this took me about 4 months to do. In about a week's time, I should be ready to apply for permanent residence here in Greece, so now I am the "First Java Champion in Greece" :))

The Java Performance Tuning course almost didn't happen, due to the hotel being washed into the sea by the storms. Fortunately my friend George Niavradakis (who sells real estate in Crete) jumped in and organised a new venue for us. And the dinner at Irene's was unforgetable, as always!

Are you really Multi-Core?

A few weeks ago, I presented a Java 5 and a Design Patterns Course in Cape Town to a bunch of developers. They were mostly developing in Linux, and one of the chaps was impressing us all with his multi-core machine. A Dell Latitude notebook, with tons of RAM, a great graphics card, etc. It looked really fast, especially the 3D effects of his desktop.

One of the exercises that we do in the Java 5 course is to measure the CPU cycles that a thread has used, as opposed to elapsed time. If you have one CPU in your machine, then these should be roughly the same. However, when you have several CPUs in your machine, the CPU cycles should be a factor more than the elapsed time. The factor should never be more than the number of actual CPUs, and may be less when you either have other processes running, or too many threads per CPU. Also, as all good computer scientists know, you can never scale completely linearly on one machine, so as you approach a large number of CPUs, the factor will grow more slowly.

Here is a short piece of code that starts 5 threads. Each thread runs through a loop from 0 to 999999999. For each thread we measure the thread CPU time with the new ThreadMXBean. These are added up and then we divide the total by the elapsed time (also called "wall clock time"). In order to not introduce contention, I'm using the AtomicLong and the CountDownLatch.

import java.lang.management.*;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicLong;

public class MultiCoreTester {
  private static final int THREADS = 5;
  private static CountDownLatch ct = new CountDownLatch(THREADS);
  private static AtomicLong total = new AtomicLong();

  public static void main(String[] args)
      throws InterruptedException {
    long elapsedTime = System.nanoTime();
    for (int i = 0; i < THREADS; i++) {
      Thread thread = new Thread() {
        public void run() {
          total.addAndGet(measureThreadCpuTime());
          ct.countDown();
        }
      };
      thread.start();
    }
    ct.await();
    elapsedTime = System.nanoTime() - elapsedTime;
    System.out.println("Total elapsed time " + elapsedTime);
    System.out.println("Total thread CPU time " + total.get());
    double factor = total.get();
    factor /= elapsedTime;
    System.out.printf("Factor: %.2f%n", factor);
  }

  private static long measureThreadCpuTime() {
    ThreadMXBean tm = ManagementFactory.getThreadMXBean();
    long cpuTime = tm.getCurrentThreadCpuTime();
    long total=0;
    for (int i = 0; i < 1000 * 1000 * 1000; i++) {
      // keep ourselves busy for a while ...
      // note: we had to add some "work" into the loop or Java 6
      // optimizes it away.  Thanks to Daniel Einspanjer for
      // pointing that out.
      total += i;
      total *= 10;
    }
    cpuTime = tm.getCurrentThreadCpuTime() - cpuTime;
    System.out.println(total + " ... " + Thread.currentThread() +
        ": cpuTime = " + cpuTime);
    return cpuTime;
  }
}
  

When I run this on my little D800 Latitude, I get:

    Thread[Thread-3,5,main]: cpuTime = 1920000000
    Thread[Thread-2,5,main]: cpuTime = 1920000000
    Thread[Thread-1,5,main]: cpuTime = 1930000000
    Thread[Thread-4,5,main]: cpuTime = 1920000000
    Thread[Thread-0,5,main]: cpuTime = 1940000000
    Total elapsed time 9759677000
    Total thread CPU time 9630000000
    Factor: 0.99
  

As always with performance testing, we have to be careful to run it on a quiet machine. If I copy a large file at the same time while running the test, I get:

    Thread[Thread-0,5,main]: cpuTime = 1920000000
    Thread[Thread-4,5,main]: cpuTime = 1990000000
    Thread[Thread-2,5,main]: cpuTime = 1960000000
    Thread[Thread-1,5,main]: cpuTime = 1980000000
    Thread[Thread-3,5,main]: cpuTime = 1960000000
    Total elapsed time 10979895000
    Total thread CPU time 9810000000
    Factor: 0.89
  

When I run the program twice in parallel on a quiet system, the Factor should be close to 0.5, hopefully:

    Thread[Thread-3,5,main]: cpuTime = 4090000000
    Thread[Thread-4,5,main]: cpuTime = 4070000000
    Thread[Thread-0,5,main]: cpuTime = 2660000000
    Thread[Thread-2,5,main]: cpuTime = 4020000000
    Thread[Thread-1,5,main]: cpuTime = 2970000000
    Total elapsed time 33988220000
    Total thread CPU time 17810000000
    Factor: 0.52
  

and the second run, started slightly later

    Thread[Thread-1,5,main]: cpuTime = 3320000000
    Thread[Thread-3,5,main]: cpuTime = 3120000000
    Thread[Thread-4,5,main]: cpuTime = 3190000000
    Thread[Thread-0,5,main]: cpuTime = 2590000000
    Thread[Thread-2,5,main]: cpuTime = 3070000000
    Total elapsed time 32353817000
    Total thread CPU time 15290000000
    Factor: 0.47
  

When we ran this program on the student's supa-dupa multi-core system, we were puzzled in that the factor was just below 1. We rebooted the machine into Windows, and the factor went up to just below 2. Fortunately we had a system administrator in the group, and he pointed out that the kernel on that Linux machine was incorrect. By simply putting the correct kernel on, the dream machine laptop was able to run at double the CPU cycles.

Your exercise for today is to find a multi-core or multi-cpu machine and see what factor you get. You need at least a JDK 5. Let me know how you fare ... :)

Just a hint: the number of threads should probably be a multiple of the number of CPUs or cores that you have available.

Kind regards from Greece

Heinz

Performance Articles Related Java Course

Java Master
Java Concurrency
Design Patterns
In-House Courses



© 2010-2014 Heinz Kabutz - All Rights Reserved Sitemap
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. JavaSpecialists.eu is not connected to Oracle, Inc. and is not sponsored by Oracle, Inc.
@CORE_THE_BAND #RBBJGR