aiida_vasp.calcs.monitors#

VASP Calculation Monitoring Functions.

This module provides monitoring functions for detecting and handling problems that may occur during VASP calculations running on remote machines. These monitors are designed to identify issues early and take preventive actions to avoid system crashes or resource waste.

The monitoring functions can detect:

  • Stdout file overflow that could crash the AiiDA daemon

  • Electronic loop timing issues that indicate inefficient calculations

  • Stalled calculations that are no longer making progress

These monitors are typically used by AiiDA’s calculation monitoring system to provide real-time feedback about running VASP calculations and automatically handle problematic situations.

Note

These monitors operate on the remote machine where the VASP calculation is running and require a transport connection to access files and execute commands.

Module Contents#

Functions#

monitor_stdout

Monitor stdout file size to prevent overflow crashes.

monitor_loop_time

Monitor electronic loop timing to detect inefficient or stalled calculations.

Data#

API#

aiida_vasp.calcs.monitors._FILE_NOT_FOUND_ERRORS: tuple = 'tuple(...)'#
aiida_vasp.calcs.monitors.monitor_stdout(node: aiida.orm.CalcJobNode, transport: aiida.transports.Transport, size_threshold_mb: float = 5) str | None[source]#

Monitor stdout file size to prevent overflow crashes.

This function monitors the size of the VASP stdout file during calculation execution. If the file becomes too large, it indicates a potential problem (such as excessive output from convergence issues) that could crash the AiiDA daemon when attempting to retrieve and parse the file.

When an oversized stdout file is detected, the function automatically truncates it to prevent system crashes, though this means the calculation is considered lost.

Parameters:
  • node (CalcJobNode) – The CalcJobNode representing the running VASP calculation

  • transport (Transport) – Transport connection to the remote machine where VASP is running

  • size_threshold_mb (int) – Maximum allowed stdout file size in megabytes before truncation occurs

Returns:

None if no overflow detected, otherwise an error message describing the overflow condition

Return type:

str or None

Warning

When stdout overflow is detected, the calculation is automatically terminated by truncating the output file. This prevents system crashes but results in loss of the calculation.

Note

The default threshold of 5 MB is typically sufficient for normal VASP calculations. Larger thresholds may be needed for very large systems or calculations with verbose output.

aiida_vasp.calcs.monitors.monitor_loop_time(node: aiida.orm.CalcJobNode, transport: aiida.transports.Transport, minimum_electronic_loops: int = 10, patience_num_loops: int = 5, patience_minimum_time: float = 1800) str | None[source]#

Monitor electronic loop timing to detect inefficient or stalled calculations.

This function analyzes the timing of electronic self-consistency loops in VASP calculations to identify potential problems:

  1. Slow convergence: If electronic loops take too long relative to the walltime limit, the calculation may not complete within the allocated time.

  2. Stalled calculations: If the stdout file hasn’t been updated for an extended period, the calculation may have crashed or become stuck.

The function examines the OUTCAR file to extract loop timing information and compares it against the walltime limits and recent file modification times.

Parameters:
  • node (CalcJobNode) – The CalcJobNode representing the running VASP calculation

  • transport (Transport) – Transport connection to the remote machine where VASP is running

  • minimum_electronic_loops (int) – Minimum number of electronic loops that should be completable within the walltime limit

  • patience_num_loops (int) – Number of loop times to wait before considering a calculation stalled

  • patience_minimum_time (int) – Minimum time in seconds to wait before checking for stalled calculations

Returns:

None if timing is acceptable, otherwise an error message describing the detected problem (slow loops or stalled calculation)

Return type:

str or None