Each group's fairshare target is based on the percentage of total Service Units (SUs) target that the group has received for the current target period. The minimum fairshare target is 1 processor (The minimum percent will vary from system to system because different systems have a different number of processors.) The fairshare target is the percentage of the core resources that a group has been allocated for the 6 months allocation period. For example, if a group's fairshare target is 5, then for the 6 month allocation period, the group can use 5% of the resources. If, as often happens, during the course of several days, the group uses more than 5% of a machine, then they have exceeded their fairshare target and their priority will be decreased for jobs waiting to run on that machine.
Furthermore, if a group uses their Service Units faster than their uniform rate (Total allocation for that group divided by 26, the number of weeks in an allocation period), the groups fairshare target will decrease. The updated target is based on the number of Service Units remaining for the group and the Total Service Units of all the groups. Fairshare targets are updated daily.
To help groups determine their usage rate, The command acctinfo states the percent of the allocation remaining and the percent remaining in the allocation period. This command also shows the fairshare target, and the fairshare usage on each of the core resources. The fairshare target may vary from day to day due to the fact that new groups are added during the year, and existing groups request additional allocations.
The Institute uses a weighted average of the last 7 days usage in the
calculation of the fairshare. The weight decreases each day we go back. The
current weights are:
Day Ago 0 1 2 3 4 5 6 Weight 1.0000 0.8000 0.6400 0.5120 0.4096 0.3277 0.2621
When scheduling jobs and calculating priority for waiting jobs there are many factors to consider -- Fairshare is one such factor. The Institute also uses time waiting the queue (All jobs should run). The Institute also tries to schedule jobs requesting a large number of processors first, and then schedule smaller jobs around the larger jobs. Jobs requesting a large number of processors need to reserve processors in order to run. They cannot run until there is a sufficient number of free processors. It is inefficient to have unused processors, so we use smaller jobs to fill in gaps created by the reservations of the large jobs. This is called backfill. It is far more efficient to schedule smaller jobs around larger jobs. Accurate estimates of wall clock time on your programs, especially small jobs, will help the scheduler schedule your jobs promptly.
The Institute understands that no one wants to wait. It is also true that no scheduling policy can guarantee that no one will wait (Only larger machines can guarantee that), so we use fairshare to try to get a mix of jobs from all users. We monitor the queues and often adjust parameters to get better turnaround times of jobs. Your comments are always welcome.