runaway-processes

This is an old revision of the document!


Runaway Processes

Occasionally a process will stop responding to the system and run wild. These processes ignore their scheduling priority and insist on taking up 100% of the CPU. Because other processes can only get limited access to the CPU, the machine begins to run very slowly.

The 'w' command at the terminal will print out a list of current users of a machine, and it will tell you the machine's “load average.” The load average of a machine is related to how much input/output the machine has to do. A load average of 1 is a machine under full load. Anything over 1 is extremely high and means that the machine is getting behind on its processing. If your machine has a load average near or over 1, and you are not running anything really resource intensive on the machine, then you probably have a runaway process sapping your machine's processing power.

The 'top' command lists the processes that are taking up the most system resources. In the left column of top's output is the PID number, or the process ID. This number is necessary to identify the process if you want to kill it. Some of the other information that top yields is: the user that owns the process, the priority and nice value with which the process is running, the amount of CPU and memory that are being consumed, how much CPU time the process has consumed, and the command that was executed to generate the process. We use this information to determine if a process is truly a runaway, or if it is a resource intensive program that we should allow to continue executing.

The ps -aux command gives you much of the same information about processes that top provides. When used in conjunction with grep it can be a very useful utility. For example ps -aux |grep vim will list the details about all of the vim processes running on the system.

See man w, man top, man ps, and man grep for more information about any of these commands.

When it comes to running away, Vi and Vim are the worst offenders. Any time Vi or Vim are closed without using the :q command, they ignore the standard kill signal and run forever. This happens whenever a bash window is closed without Vi being exited first, or whenever an ssh session is interrupted while using Vi.

kill is the standard Unix utility for terminating nasty processes. You only have the right to kill your own processes. Some processes do not respond to the standard kill command. These processes might need a more forceful signal such as -9. The command killall can help you to kill multiple processes at once. man kill and man killall will give you the details about these commands.

Tyler Larson wrote a utility that will allow runaway processes to be killed without root privileges. Before userkill will terminate a process, it examines the specified process to see if it matches criteria which would signify that it is in fact a runaway. These criteria are:

  The process must not be owned by the system (uid<100).
  The owner of the process is not logged in. In this case you should message the owner using the write command. If that fails, you should inform the System Administrators of the problem.
  The process must not be niced greater than 9.
  The CPU must not be more than 10% idle.
  All runaway processes combined must be using more than 70% of the CPU.
  The process must have already consumed an outrageous amount of CPU time.

userkill functions in a manner very similar to kill. The basic syntax is userkill <killsignal> <processid>. The killsignal field defaults to the standard sigterm signal. To send a stronger kill signal (such as -9) you must specify it. The userkill binary (program) is located in /usr/network/bin.

The System Administrators

If you can not kill a process using kill or userkill, and you think that it is taking an inappropriate share of system resources, please report it to the System Administrators in room 1140 TMCB or send an email to system@cs.byu.edu.

How not to have your process killed

Use nice and renice

The commands nice and renice control the priority of your processes. The higher the nice value, the lower the priority. nice is used to spawn a new process with the specified priority. renice is used to adjust the priority of a currently running process. A non-root user can lower the priority (increase the niceness) of his or her own processes, but can not raise the priority of any process. This includes processes that the user originally niced, i.e. niceing a process can not be undone without root access.

You can tell the priority of a process using top. The PRI column (third) is the priority of each process, and the NI column (fourth) is the nice value of each process. By default nice sets the nice value to 10. 19 is the highest possible nice value (lowest priority). For more information, see the man pages.

Stay logged in

Userkill can not kill your process while you are logged in. You are also better able to monitor your use of system resources if you stay logged in. Be advised, however, that the System Administrators are authorized to kill your process and log you out if they feel that you are inappropriately monopolizing system resources. This usually involves being logged in and idle for an hour, maintaining a high load on the machine when the labs are being heavily used, or otherwise preventing others from using the system for extended periods of time.

Get prior authorization

If you feel that you need to run a process that will be exceptionally resource intensive or that needs to run for an extended period of time, you need to get prior approval to run that process so that it does not get killed. Such approval must be granted through the department CSRs at the request of a sponsoring professor. In such a situation, it is probably just as easy to get permission to use the Fulton Supercomputer The Supercomputer is much more appropriate for most resource intensive research and projects than the open lab machines.

  • runaway-processes.1487706905.txt.gz
  • Last modified: 2017/02/21 12:55
  • by adam92