Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Artemis's job scheduler determines when and where jobs will be run. The scheduler is a live system and regularly re-­prioritises work based on the following considerations:

Job Size

...

- CPUs and memory

...

requested

  • If you submit jobs asking for more than 288 cores, your job will never get to run.
  • If you have currently running jobs, your queued jobs cannot start unless the resulting total number of cores used would still remain below the 288 core limit (e.g. your 90 core job can't start if you are already using 200 cores).
  • If you submit jobs asking for more resources than available (i.e., memory) your job will never run.
  • Asking for a relatively large resource allocation (i.e., lots of CPUs instead of just a few, or all CPUs on a single node) means the scheduler must wait for current jobs to complete and schedule future jobs in such a way as to leave a "hole" for your job to run which may result in a wait time despite there appearing to be resources free.
    For example, asking for 240 cores on 10 nodes will require the scheduler to wait for any and all jobs to finish on 10 nodes (approx. 1/5th of the total capacity) before your job can run, even though there may already be 240 cores free across the entire cluster.
  • Freeing this level of contiguous resource can take time, as there may be a mixture of long running and short running jobs previously scheduled and running.
  • If you "batch up" jobs, which individually can run, but collectively consume significant resources, the system will run them in a suitable manner (i.e., keeping you below the 288 core CPU limit) and possibly also lower later jobs priorities due to fair share. This means that whilst your jobs are running, some jobs may end up with high wait times.

Time (or walltime) requested

  • Jobs that request longer wall times (that is, run times according to the time on the clock hanging on the wall, or real time) will always take longer to start than jobs that request less wall time.
  • In particular, jobs that request more than 1 week of walltime will run in "large" and take a long time to start. Jobs that request less than 24 hours of wall time will typically start very quickly, unless Artemis is very busy.
  • Some jobs may finish sooner than their set wall time. This means that your estimated start time may change to an earlier time (if other jobs finish early, are cancelled, fail etc.) or a later time if jobs are scheduled that fairly move you further down the fair share queue.

...