Attention: : Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au

Skip to end of banner
Go to start of banner

When will my job start?

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

“Fair Share” assigns priority to jobs based on each project’s recent usage of the system. If a project has recently used a lot of CPU time, then the priority of their future jobs, relative to other projects, will be reduced. Once a job runs, it is allowed to complete and is unaffected by fair share.

Fair share only has an impact when there is contention for resources. Fair Share is calculated at a project level, so if one member of a project uses a lot of CPU time, future jobs submitted by that project will have lower priority.

Different queues (see the Job Queues section for a description of each queue) have different fair share weightings. The small, normal and large queues have a fair share weight of 10, which is considered to be the “standard” fair share weighting. The high memory and GPU queues, however, have a fair share weight of 50. If you request excessive resources (for example, too much memory), your job may be placed in a queue with a higher fair share weighting.

In addition to the above accumulation, fair share also decays with a “half-life” of 2 weeks. If you were to stop or reduce your use of Artemis, your fair share would decrease and the priority of your future jobs would increase.

Fair Share likely won't affect your job priority unless you're submitting more than 30,000 CPU hours of work every month. Generally, the sooner you submit your jobs, the sooner they will run.

Factors that impact job start times

It is the responsibility of the job scheduler to determine when and where jobs will be run. The scheduler is a live system and regularly re-­prioritises work based on the following considerations:

  • Fair share operates on a project level, not for individual users.
  • If you are part of a busy project you may, as an individual, get less CPU time than someone in a project which is not using the system heavily.
  • If you run small jobs you are more likely to "fill up the gaps", however several people all wanting to use large resource allocations may compete.

Size

  • If you submit jobs asking for more than 288 cores, your job will never get to run.
  • If you have currently running jobs, your queued jobs cannot start unless the resulting total number of cores used would still remain below the 288 core limit (e.g. your 90 core job can't start if you are already using 200 cores).
  • If you submit jobs asking for more resources than available (i.e., memory) your job will never run.
  • Asking for a relatively large resource allocation (i.e., lots of CPUs instead of just a few, or all CPUs on a single node) means the scheduler must wait for current jobs to complete and schedule future jobs in such a way as to leave a "hole" for your job to run which may result in a wait time despite there appearing to be resources free.
    For example, asking for 240 cores on 10 nodes will require the scheduler to wait for any and all jobs to finish on 10 nodes (approx. 1/5th of the total capacity) before your job can run, even though there may already be 240 cores free across the entire cluster.
  • Freeing this level of contiguous resource can take time, as there may be a mixture of long running and short running jobs previously scheduled and running.

Capacity limits

  • Be aware of the core and memory limit for each node, asking for more than available may mean your job will never get to run.
  • Be aware of system overheads when requesting memory - 128GB nodes have closer to 123GB available.
  • There are per-user limits on the number of concurrent jobs.

Batching jobs

  • If you "batch up" jobs, which individually can run, but collectively consume significant resources, the system will run them in a suitable manner (i.e., keeping you below the 288 core CPU limit) and possibly also lower later jobs priorities due to fair share. This means that whilst your jobs are running, some jobs may end up with high wait times.

Time

  • Some jobs may finish sooner than their set wall time. This means that your estimated start time may change to an earlier time (if other jobs finish early, are cancelled, fail etc.) or a later time if jobs are scheduled that fairly move you further down the fair share queue.
  • No labels