Universal Allocation Program Manual

Note that this program is deprecated and replaced by the Generic Allocator.

This program was created during an internship at the chair in informatics 6 of the Julius-Maximilians-Universität Würzburg by Martins Avots and Kirill Djebko with assistance of Prof. Dr. Frank Puppe.

Given a set of supplies and a set of demands an allocation assigns each object of a subset of demands an element of supplies. Each supply can only be assigned to a single demand. A set of preferences contain for some demands one or more preferred supplies. A set of constraints describe some properties which should be met by an allocation. The rating of an allocation quantifies its compliance with the given preferences and constraints. The higher the rating the more constraints and preferences are not met by an allocation. A rating of zero means that every preference and constraint is met by an allocation. An optimal allocation is an allocation in which each demand is assigned to a supply and where the rating is minimal.

A classical example for such an assignment problem would be the Sudoku puzzle in which a set of numbers needs to be placed onto a grid following certain rules. Another problem is the graph coloring problem in which each node of a graph is colored and neighboring nodes have to have different colors.

This program solves given problems with the help of multiple algorithms. Nevertheless, there are problems that can be modeled as an assignment problem but cannot be solved by the Universal Allocation Program.

The problem which is solved by the program is stored inside an Excel file. The program can be started via the following shell command:

java -jar nap.jar -s <Path to the Excel file >

For further information of the command line options following command can be used:

java -jar nap.jar -h

On some platforms thejava -jarpart can be omitted. An alternative way to start the program is to double click at the nap.jar file which will start a very basic web server. In this case the standard web browser will open a site to which the Excel files can be uploaded. During the calculations the browser will show the loading page animation specified by the browser. When the calculations are done a download is started. Note that this method does not require an internet connection as everything is done on the local computer.

A pipeline of assignment problems can be used in order to describe more complex problems. Such a pipeline consists of multiple stages which have an absolute order. The set of supplies and the set of demands need to be explicitly stated for first assignment problem of such a pipeline. The subsequent problems only require an explicit enumeration of the supply elements. The respective set of demands is defined by the solution of the previous assignment problem of the pipeline. In the following, the stage number is the position of a problem inside the pipeline. A single assignment problem without connections to other problems is formally a pipeline with one element in this context.

The set of supplies is defined by a table. The supply header line defines the attributes of which each supply consists of. Each entry has a name and ends with(int)if the attribute is an integer. Otherwise the entry ends with(ref). The following rows of the table contain the supply elements and use the same columns as the header. Each row describes a unique element of the supply set but their content do not have to be unique. This table is stored into a sheet calledAngebot_0. The following example shows a supply definition which consists of a Group of 4. Note that the first and second supply have the same content but still are two different supplies. This example shows that a project with the idK1needs 4 peoples with different primary and secondary duties.

ProjectId(ref)
Primary Duty(ref)
Secondary Duty(ref)
K1
Programming
None
K1
Programming
None
K1
Testing
Planning
K1
Planning
Documentation

The demand definition has the same format as the supply definition. The only difference is that the sheet's name isNachrage_<stage>. Where <stage> is the number of the problem's stage. Note that most problems probably will only contain one stage. Which means that most users won't bother with stages at all as they can be ignored in such cases. The following table shows a demand definition suited for the previous supply definition. In this case the sheet's name isNachfrage_0because it is the first and only stage of this example. As you can see the people which need to be assigned to a Project do not have perfectly fit skills.

Name(ref)
Many experience in(ref)
Some experience in(ref)
Alpha
Programming
None
Bravo
Testing
Programming
Charlie
Programming
None
Delta
Programming
Documentation

The preference definition is stored into a sheet namedPräferenzen_<stage>where <stage> is the stage number of the corresponding assignment problem of the pipeline. The header of the preferences sheet is defined by the concatenation of the headers' of the supplies and the demands of respective stage and a weight attribute calledGewicht(lb). This means that the header of the preference is the header of the solution for a given stage plus theGewicht(lb)attribute.

The preferences table contains assignments which the user wishes to be part of the solution. TheGewicht(lb)attribute of a preference describes how important it is for the user that this assignment is located in the solution. It consists of a positive integer. The lower the number the more important the preference is. The preference of a demand with the highest number defines how important it is that this demand will get any of its preferences. The higher this number is the more it is likely that any preference of this demand will be met. In most cases theGewicht(lb)attribute should be a very low number which guarantees that constraints are more important than preferences. The following table shows a preferences definition based on the previous tables. This examples also show that nearly all persons want to get a role as a programmer which is not possible. Preferences are often less important than constraints because in the other case the preference would already be part of a correct solution and therefore the corresponding supply and demand could be removed from the input in order to get better results or to calculate the solution more quickly.

ProjectId(ref)
PrimaryDuty(ref)
SecondaryDuty(ref)
Name(ref)
Many experience in(ref)
Some experience in (ref)
Gewicht(lb)
K1
Programming
None
Alpha
Programming
None
1
K1
Testing
Planning
Bravo
Testing
Programming
1
K1
Programming
None
Charlie
Programming
None
1
K1
Programming
None
Delta
Programming
Documentation
1

The constraint definition of an assignment problem is stored into a sheet namedConstraints_<stage>where <stage> is the stage number of the concerned problem. It has the same header as the preference table of the same stage. Each row describes one constraint. A constraint defines the following abstract rule: the biggest assignment subset which complies with IF has also to comply with THEN. To put it in other words: every cluster which is created by the condition IF also must comply with the condition THEN.

Lets assume following solution in order to explain the functioning of constraints by example:

ProjectId(cell)
Primary Duty(ref)
Secondary Duty(ref)
Name(ref)
Many experience in(ref)
Some experience in(ref)
K1
Programming
None
Charlie
Programming
None
K1
Programming
None
Alpha
Programming
None
K1
Testing
Planning
Bravo
Testing
Programming
K1
Planning
Documentation
Delta
Programming
Documentation

Every entry of a constraint's row which has the format<value>is part of IF. Such an entry means that every element in each cluster, which is formed by this constraint, has to have the Value <value> at the attribute of the same column. The following constraint creates a cluster consisting of Charlie and Alpha because they are the only people who got the role as a programmer in the assumed solution. Note that this constraint does not have any effect as there is no entry which is part of THEN. Constraint which selects every assignment in the solution where the primary duty isProgramming:

ProjectId(ref)
Primary Duty(ref)
Secondary Duty(ref)
Name(ref)
Many experience in(ref)
Some experience in(ref)
Gewicht(lb)
Programming
1000

The cluster which is formed by the IF part of the previous constraint:

K1
Programming
None
Charlie
Programming
None
K1
Programming
None
Alpha
Programming
None

Every entry of a constraint which consists of*is part of IF. Such an entry means that the cluster which is created by this constraint is parted into multiple clusters. For each cluster all elements have the same value at the column of such an entry. The following tables show the clusters created by a constraint which consists of a*at thePrimary Duty(ref)attribute. Constraint which clusters the solution according to thePrimary Dutyattribute:

ProjectId(ref)
Primary Duty(ref)
Secondary Duty(ref)
Name(ref)
Many experience in(ref)
Some experience in(ref)
Gewicht(lb)
*
1000

First cluster which is formed by the IF part of the previous constraint:

K1
Programming
None
Charlie
Programming
None
K1
Programming
None
Alpha
Programming
None

Cluster 2 which is formed by the IF part of the previous constraint:

K1
Testing
Planning
Bravo
Testing
Programming

Last cluster which is formed by the IF part of the previous constraint:

K1
Planning
Documentation
Delta
Programming
Documentation

Every other entry which starts with#followed by a function call is part of THEN. Possible function calls are listed at the chapter constraint functions. These function calls describe which conditions has to be met by the clusters which are created by the IF part of the same constraint. The following table shows the constraints which were used in order to generate the solution mentioned at the beginning of this chapter:

ProjectId(ref)
Primary Duty(ref)
Secondary Duty(ref)
Name(ref)
Many experience in(ref)
Some experience in(ref)
Gewicht(lb)
Programming
#OR(Programming)
1 000 000 000
Testing
#OR(Testing)
1 000 000 000
Planning
#OR(Planning)
1 000 000
Planning
#OR(Planning)
1 000 000
Testing
#OR(Testing)
1 000 000

The first, second and third constraints say that the guy whose primary duty is X should also have many experience in X (i.e. that the people whose primary duty is programming should also have many experience in programming). Each of these constraints has the same value for theGewicht(lb)attribute which means that each of them is equally important. The last two constraints demand that the people whose secondary duty is planning or testing also should have some experience in it. These two constraints have a lower value at theGewicht(lb)attribute which means that these two constraints are less important than the first three constraints. TheGewicht(lb)is often set to very high values in order to ensure that the constraints are more important than the preferences and is limited to the values between 0 and 922337203685477.

The THEN part of each constraint consists of constraint functions which describe the conditions that have to be meet by the clusters formed by the IF part of the same constraint. All available functions are listed below.

Demands that each cluster defined by the IF part of a constraint has exactly <Integer> elements which have the value <Value> at the same column as the appearance of this function.

Demands that each cluster defined by the IF part of a constraint has at least <Integer> elements which have the value <Value> at the same column as the appearance of this function.

Demands that each cluster defined by the IF part of a constraint has at at most <Integer> many elements which have the value <Value> at the same column as the appearance of this function.

Demands that each cluster defined by the IF part of a constraint has only elements containing a value of the enumartion <Values...> at the same column as the appearance of this function. Each element of <Values...> is separated by a comma.

Demands that each cluster defined by the IF part of a constraint has only elements which has no value at the same column as the appearance of this function that is an element of <Values...>. Each element of <Values...> is separated by a comma.

Demands that each cluster defined by the IF part of a constraint has only elements which have a pairwise minimum distance of <Integer> or greater. The distance between two elements is defined by the difference of the two integer values of these two elements which are located at the same column as the appearance of this constraint function. Therefore this function can only meaningfully be used at columns which are of the attribute(int).

Demands that each cluster defined by the IF part of a constraint has only elements which have a pairwise maximum distance of <Integer> or less. The distance between two elements is defined by the difference of the two integer values of these two elements which are located at the same column as the appearance of this constraint function. Therefore this function can only meaningfully be used at columns which are of the attribute(int).

Demands that each cluster defined by the IF part of a constraint has only elements which are consecutive. This means that for each element in one cluster there is exactly one element at the same cluster which is adjacent to this element. Two elements are adjacent if their pairwise distance is exactly 1. The distance between two elements is defined by the difference of the two integer values of these two elements which are located at the same column as the appearance of this constraint function.

In other words this constraint function demands that the elements of a cluster form an ordered queue without any gap between two elements and therefore this function is only sensibly usable at columns which are of the attribute(int).

The configuration definition is stored in a table named "config_<stage>" where <stage> is the stage number of the affected assignment problem. This table consists of two the columns with the headerKeyandValue. In the next subsection the possible options are listed.

Value:an integer describing how many restarts are used during problem solving.

Default: false

Value:the flusher is enabled if set to "true" and in the other case set to "false". See chapter Flusher for information about the flusher.

Default: false

Value:random restarts are computed in parallel if set to "true" and in the other case set to "false".

Default: false

Value:maximal number of plateau steps which should be walked by the hill climber before aborting the computation.

Default: 10

Value:a list of column numbers which are separated via semicolon. The column numbering is starting with 1 and after each number there have to be a semicolon. The list describes in which order the columns of the solution are going to be sorted.

Default:Empty String (no sorting).

Value:a string which describes the temperature function which is used for simulated annealing. The following values are valid:

Default:simulated annealing is disabled

LOGTHREE:  f(x) = [log(3) / log(x^1.5)] - 0.05

Value:repeats hill climbing algorithm with enabled flusher and without restarted temperature function as long as there is an improvement found if set totrue.

Default: false

Value:the maximum number of threads which are used during Random Restart.

Default:3

The following chapters describe the problem solving components of the program in the order they are used in the program. Some of them can be disabled.

The Initializer assigns demands to supplies. At the first stage it assigns demands to supplies based on preferences only. It iterates multiple times over all not yet assigned demands. At the first iteration it tries to assign to each demand the most preferred supply if there is one available. At the second iteration it tries to assign to each demand the second most preferred supply if there is one available and so on. The first stage ends if all demands are assigned or if there are no preferred supplies left for not yet assigned demands. At the second stage supplies are assigned to demands based on proposals which are created via constraints. If there are multiple proposed supplies for one demand then the supply is assigned to the demand which complies best to the constraints. At the third stage the initializer assigns any supplies left to each free demand randomly.

The hill climber tries to improve the current solution by swapping. There are 3 types of swaps. The first attribute, which is used most of the time in general, selects two assigned demands and swaps their respective supplies. The next attribute replaces the supply of an assigned demand with a supply which is not assigned to any demand. The last attribute replaces the demand of an assigned supply with a demand that is not assigned to any supply. The hill climber works in two phases. At the first phase at each hill climbing step swaps are proposed by the constraints. The hill climber checks which proposed swap improves the solution the most. At the end of the step the best proposed swap is executed. If no proposed swap improves the solution a plateau step is done. This step changes the current assignment of demands to supplies by one step without downgrading the current assignment. The first phase runs as long it finds swaps which improve the current assignment, as long not too many consecutively plateau steps are done or if no swap can be found which does not downgrade the solution. At the second phase the hill climber uses proposed swaps which are generated by preferences but aside from that works like in the first phase.

If the flusher is enabled it removes some assignments of the solution created by the hill climber based on proposals of the constraints. This is only done if the removal of each assignment improves the compliance of the current solution with the given preferences and constraints. After that the initializer and hill climber are used again. This is repeated as long the flusher removes assignments from the solution provided by the hill climber and as long each iteration improves the solution.

If simulated annealing is activated there is a probability that a random swap is done during an hill climbing step. A downgrade by this random swap is accepted. This probability is defined by the temperature function and gets lower as the time progresses during problem solving.

If random restart is activated the complete calculation of the solution is repeated multiple times. The solution with the best rating is returned as the result. These calculations can be executed in parallel. If one of these solutions is complying with all constraints and preferences than every other remaining calculations are aborted.

If the program is used via console the output of the solution is stored in an excel table and the path of this file is printed to the console. If the program is used via web browser a download of the excel table is started.

There are multiple examples at the foldertestDataof the provided program of this program. At the end of each entry the corresponding command is listed in order to execute the example from the console.

Problem
Excel file name
Command for console
graph coloring
graph_coloring.xlsx
java -jar nap.jar -s ./testData/graph_coloring.xlsx
8-Queen problem
queen_8.xlsx
java -jar nap.jar -s ./testData/queen_8.xlsx
sudoku
Sudoku_hard.xlsx
java -jar nap.jar -s ./testData/Sudoku_hard.xlsx
class scheduling problem
class_scheduling.xlsx
java -jar nap.jar -s ./testData/class_scheduling.xlsx