Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • If you submit jobs asking for more than 288 cores, your job will never get to run.
  • If you have currently running jobs, your queued jobs cannot start unless the resulting total number of cores used would still remain below the 288 core limit (e.g. your 90 core job can't start if you are already using 200 cores).
  • If you submit jobs asking for more resources than available (i.e., memory) your job will never get the opportunity to run.
  • Asking for a relatively large resource allocation (i.e., lots of CPUs instead of just a few, or all CPUs on a single node) means the scheduler must wait for current jobs to complete and schedule future jobs in such a way as to leave a "hole" for your job to run which may result in a wait time despite there appearing to be resources free.
    For example, asking for 240 cores on 10 nodes will require the scheduler to wait for any and all jobs to finish on 10 nodes (approx. 1/5th of the total capacity) before your job can run, even though there may already be 240 cores free across the entire cluster.
  • Freeing this level of contiguous resource can take time, as there may be a mixture of long running and short running jobs previously scheduled and running.

...

  • Be aware of the core and memory limit for each node, asking for more than available may mean your job will never get to run.
  • Be aware of system overheads when requesting memory - 128GB nodes have closer to 125GB available to jobs while 512GB nodes have just over 500GB available123GB available.
  • There are per-user limits on the number of concurrent jobs.

Batching jobs

  • If you "batch up" jobs, which individually can run, but collectively consume significant resources, the system will run them in a suitable manner (i.e., keeping you below the 288 core CPU limit) and possibly also lower later jobs priorities due to fair share. This means that whilst your jobs are running, some jobs may end up with high wait times.

...