CPU Accounting Controller
-------------------------

The CPU accounting controller is used to group tasks using cgroups and
account the CPU usage of these groups of tasks.

The CPU accounting controller supports multi-hierarchy groups. An accounting
group accumulates the CPU usage of all of its child groups and the tasks
directly present in its group.

Accounting groups can be created by first mounting the cgroup filesystem.

# mkdir /cgroups
# mount -t cgroup -ocpuacct none /cgroups

With the above step, the initial or the parent accounting group
becomes visible at /cgroups. At bootup, this group includes all the
tasks in the system. /cgroups/tasks lists the tasks in this cgroup.
/cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by
this group which is essentially the CPU time obtained by all the tasks
in the system.

New accounting groups can be created under the parent group /cgroups.

# cd /cgroups
# mkdir g1
# echo $$ > g1

The above steps create a new group g1 and move the current shell
process (bash) into it. CPU time consumed by this bash and its children
can be obtained from g1/cpuacct.usage and the same is accumulated in
/cgroups/cpuacct.usage also.

cpuacct.stat file lists a few statistics which further divide the
CPU time obtained by the cgroup into user and system times. Currently
the following statistics are supported:

user: Time spent by tasks of the cgroup in user mode.
system: Time spent by tasks of the cgroup in kernel mode.

user and system are in USER_HZ unit.

cpuacct.cpufreq file gives CPU time (in nanoseconds) spent at each CPU
frequency. Platform hooks must be implemented inorder to properly track
time at each CPU frequency.

cpuacct.power file gives CPU power consumed (in milliWatt seconds). Platform
must provide and implement power callback functions.

cpuacct controller uses percpu_counter interface to collect user and
system times. This has two side effects:

- It is theoretically possible to see wrong values for user and system times.
  This is because percpu_counter_read() on 32bit systems isn't safe
  against concurrent writes.
- It is possible to see slightly outdated values for user and system times
  due to the batch processing nature of percpu_counter.