Note that this program is deprecated and replaced by the Generic Allocator.
This program was created during an internship at the chair in informatics 6 of the Julius-Maximilians-Universität Würzburg by Martins Avots and Kirill Djebko with assistance of Prof. Dr. Frank Puppe.
Given a set of supplies and a set of demands an allocation assigns each object of a subset of demands an element of supplies. Each supply can only be assigned to a single demand. A set of preferences contain for some demands one or more preferred supplies. A set of constraints describe some properties which should be met by an allocation. The rating of an allocation quantifies its compliance with the given preferences and constraints. The higher the rating the more constraints and preferences are not met by an allocation. A rating of zero means that every preference and constraint is met by an allocation. An optimal allocation is an allocation in which each demand is assigned to a supply and where the rating is minimal.
A classical example for such an assignment problem would be the Sudoku puzzle in which a set of numbers needs to be placed onto a grid following certain rules. Another problem is the graph coloring problem in which each node of a graph is colored and neighboring nodes have to have different colors.
This program solves given problems with the help of multiple algorithms. Nevertheless, there are problems that can be modeled as an assignment problem but cannot be solved by the Universal Allocation Program.
The problem which is solved by the program is stored inside an Excel file. The program can be started via the following shell command:
java -jar nap.jar -s <Path to the Excel file >
For further information of the command line options following command can be used:
java -jar nap.jar -h
On some platforms thejava -jar
part can be omitted. An alternative way to start the program is to double click at
the nap.jar file which will start a very basic web server. In this case the standard
web browser will open a site to which the Excel files can be uploaded. During the
calculations the browser will show the loading page animation specified by the browser.
When the calculations are done a download is started. Note that this method does not
require an internet connection as everything is done on the local computer.
A pipeline of assignment problems can be used in order to describe more complex problems. Such a pipeline consists of multiple stages which have an absolute order. The set of supplies and the set of demands need to be explicitly stated for first assignment problem of such a pipeline. The subsequent problems only require an explicit enumeration of the supply elements. The respective set of demands is defined by the solution of the previous assignment problem of the pipeline. In the following, the stage number is the position of a problem inside the pipeline. A single assignment problem without connections to other problems is formally a pipeline with one element in this context.
The set of supplies is defined by a table. The supply header line defines the attributes
of which each supply consists of. Each entry has a name and ends with(int)
if the attribute is an integer. Otherwise the entry ends with(ref)
. The following rows of the table contain the supply elements and use the same columns
as the header. Each row describes a unique element of the supply set but their content
do not have to be unique. This table is stored into a sheet calledAngebot_0
. The following example shows a supply definition which consists of a Group of 4.
Note that the first and second supply have the same content but still are two different
supplies. This example shows that a project with the idK1
needs 4 peoples with different primary and secondary duties.
The demand definition has the same format as the supply definition. The only difference
is that the sheet's name isNachrage_<stage>
. Where <stage> is the number of the problem's stage. Note that most problems probably
will only contain one stage. Which means that most users won't bother with stages
at all as they can be ignored in such cases. The following table shows a demand definition
suited for the previous supply definition. In this case the sheet's name isNachfrage_0
because it is the first and only stage of this example. As you can see the people
which need to be assigned to a Project do not have perfectly fit skills.
The preference definition is stored into a sheet namedPräferenzen_<stage>
where <stage> is the stage number of the corresponding assignment problem of the pipeline.
The header of the preferences sheet is defined by the concatenation of the headers'
of the supplies and the demands of respective stage and a weight attribute calledGewicht(lb)
. This means that the header of the preference is the header of the solution for a
given stage plus theGewicht(lb)
attribute.
The preferences table contains assignments which the user wishes to be part of the
solution. TheGewicht(lb)
attribute of a preference describes how important it is for the user that this assignment
is located in the solution. It consists of a positive integer. The lower the number
the more important the preference is. The preference of a demand with the highest
number defines how important it is that this demand will get any of its preferences.
The higher this number is the more it is likely that any preference of this demand
will be met. In most cases theGewicht(lb)
attribute should be a very low number which guarantees that constraints are more important
than preferences. The following table shows a preferences definition based on the
previous tables. This examples also show that nearly all persons want to get a role
as a programmer which is not possible. Preferences are often less important than constraints
because in the other case the preference would already be part of a correct solution
and therefore the corresponding supply and demand could be removed from the input
in order to get better results or to calculate the solution more quickly.
The constraint definition of an assignment problem is stored into a sheet namedConstraints_<stage>
where <stage> is the stage number of the concerned problem. It has the same header
as the preference table of the same stage. Each row describes one constraint. A constraint
defines the following abstract rule: the biggest assignment subset which complies
with IF has also to comply with THEN. To put it in other words: every cluster which
is created by the condition IF also must comply with the condition THEN.
Lets assume following solution in order to explain the functioning of constraints by example:
Every entry of a constraint's row which has the format<value>
is part of IF. Such an entry means that every element in each cluster, which is formed
by this constraint, has to have the Value <value> at the attribute of the same column.
The following constraint creates a cluster consisting of Charlie and Alpha because
they are the only people who got the role as a programmer in the assumed solution.
Note that this constraint does not have any effect as there is no entry which is part
of THEN. Constraint which selects every assignment in the solution where the primary
duty isProgramming
:
The cluster which is formed by the IF part of the previous constraint:
Every entry of a constraint which consists of*
is part of IF. Such an entry means that the cluster which is created by this constraint
is parted into multiple clusters. For each cluster all elements have the same value
at the column of such an entry. The following tables show the clusters created by
a constraint which consists of a*
at thePrimary Duty(ref)
attribute. Constraint which clusters the solution according to thePrimary Duty
attribute:
First cluster which is formed by the IF part of the previous constraint:
Cluster 2 which is formed by the IF part of the previous constraint:
Last cluster which is formed by the IF part of the previous constraint:
Every other entry which starts with#
followed by a function call is part of THEN. Possible function calls are listed at
the chapter constraint functions. These function calls describe which conditions has
to be met by the clusters which are created by the IF part of the same constraint.
The following table shows the constraints which were used in order to generate the
solution mentioned at the beginning of this chapter:
The first, second and third constraints say that the guy whose primary duty is X should
also have many experience in X (i.e. that the people whose primary duty is programming
should also have many experience in programming). Each of these constraints has the
same value for theGewicht(lb)
attribute which means that each of them is equally important. The last two constraints
demand that the people whose secondary duty is planning or testing also should have
some experience in it. These two constraints have a lower value at theGewicht(lb)
attribute which means that these two constraints are less important than the first
three constraints. TheGewicht(lb)
is often set to very high values in order to ensure that the constraints are more
important than the preferences and is limited to the values between 0 and 922337203685477.
The THEN part of each constraint consists of constraint functions which describe the conditions that have to be meet by the clusters formed by the IF part of the same constraint. All available functions are listed below.
Demands that each cluster defined by the IF part of a constraint has exactly <Integer> elements which have the value <Value> at the same column as the appearance of this function.
Demands that each cluster defined by the IF part of a constraint has at least <Integer> elements which have the value <Value> at the same column as the appearance of this function.
Demands that each cluster defined by the IF part of a constraint has at at most <Integer> many elements which have the value <Value> at the same column as the appearance of this function.
Demands that each cluster defined by the IF part of a constraint has only elements containing a value of the enumartion <Values...> at the same column as the appearance of this function. Each element of <Values...> is separated by a comma.
Demands that each cluster defined by the IF part of a constraint has only elements which has no value at the same column as the appearance of this function that is an element of <Values...>. Each element of <Values...> is separated by a comma.
Demands that each cluster defined by the IF part of a constraint has only elements
which have a pairwise minimum distance of <Integer> or greater. The distance between
two elements is defined by the difference of the two integer values of these two elements
which are located at the same column as the appearance of this constraint function.
Therefore this function can only meaningfully be used at columns which are of the
attribute(int)
.
Demands that each cluster defined by the IF part of a constraint has only elements
which have a pairwise maximum distance of <Integer> or less. The distance between
two elements is defined by the difference of the two integer values of these two elements
which are located at the same column as the appearance of this constraint function.
Therefore this function can only meaningfully be used at columns which are of the
attribute(int)
.
Demands that each cluster defined by the IF part of a constraint has only elements which are consecutive. This means that for each element in one cluster there is exactly one element at the same cluster which is adjacent to this element. Two elements are adjacent if their pairwise distance is exactly 1. The distance between two elements is defined by the difference of the two integer values of these two elements which are located at the same column as the appearance of this constraint function.
In other words this constraint function demands that the elements of a cluster form
an ordered queue without any gap between two elements and therefore this function
is only sensibly usable at columns which are of the attribute(int)
.
The configuration definition is stored in a table named "config_<stage>" where <stage>
is the stage number of the affected assignment problem. This table consists of two
the columns with the headerKey
andValue
. In the next subsection the possible options are listed.
Value:an integer describing how many restarts are used during problem solving.
Default:
false
Value:the flusher is enabled if set to "true" and in the other case set to "false". See chapter Flusher for information about the flusher.
Default:
false
Value:random restarts are computed in parallel if set to "true" and in the other case set to "false".
Default:
false
Value:maximal number of plateau steps which should be walked by the hill climber before aborting the computation.
Default:
10
Value:a list of column numbers which are separated via semicolon. The column numbering is starting with 1 and after each number there have to be a semicolon. The list describes in which order the columns of the solution are going to be sorted.
Default:Empty String (no sorting).
Value:a string which describes the temperature function which is used for simulated annealing. The following values are valid:
Default:simulated annealing is disabled
ONEDIVX:
f(x) = [log(2) / log(x^1.5)] - 0.05
LOGTWO:
f(x) = [log(2) / log(x^1.5)] - 0.05
LOGTHREE:
f(x) = [log(3) / log(x^1.5)] - 0.05
LOGFOUR:
f(x) = [log(4) / log(x^2)] - 0.05
LOGFIVE:
f(x) = [log(5) / log(x^2)] - 0.05
Value:repeats hill climbing algorithm with enabled flusher and without restarted temperature
function as long as there is an improvement found if set totrue
.
Default:
false
Value:the maximum number of threads which are used during Random Restart.
Default:3
The following chapters describe the problem solving components of the program in the order they are used in the program. Some of them can be disabled.
The Initializer assigns demands to supplies. At the first stage it assigns demands to supplies based on preferences only. It iterates multiple times over all not yet assigned demands. At the first iteration it tries to assign to each demand the most preferred supply if there is one available. At the second iteration it tries to assign to each demand the second most preferred supply if there is one available and so on. The first stage ends if all demands are assigned or if there are no preferred supplies left for not yet assigned demands. At the second stage supplies are assigned to demands based on proposals which are created via constraints. If there are multiple proposed supplies for one demand then the supply is assigned to the demand which complies best to the constraints. At the third stage the initializer assigns any supplies left to each free demand randomly.
The hill climber tries to improve the current solution by swapping. There are 3 types of swaps. The first attribute, which is used most of the time in general, selects two assigned demands and swaps their respective supplies. The next attribute replaces the supply of an assigned demand with a supply which is not assigned to any demand. The last attribute replaces the demand of an assigned supply with a demand that is not assigned to any supply. The hill climber works in two phases. At the first phase at each hill climbing step swaps are proposed by the constraints. The hill climber checks which proposed swap improves the solution the most. At the end of the step the best proposed swap is executed. If no proposed swap improves the solution a plateau step is done. This step changes the current assignment of demands to supplies by one step without downgrading the current assignment. The first phase runs as long it finds swaps which improve the current assignment, as long not too many consecutively plateau steps are done or if no swap can be found which does not downgrade the solution. At the second phase the hill climber uses proposed swaps which are generated by preferences but aside from that works like in the first phase.
If the flusher is enabled it removes some assignments of the solution created by the hill climber based on proposals of the constraints. This is only done if the removal of each assignment improves the compliance of the current solution with the given preferences and constraints. After that the initializer and hill climber are used again. This is repeated as long the flusher removes assignments from the solution provided by the hill climber and as long each iteration improves the solution.
If simulated annealing is activated there is a probability that a random swap is done during an hill climbing step. A downgrade by this random swap is accepted. This probability is defined by the temperature function and gets lower as the time progresses during problem solving.
If random restart is activated the complete calculation of the solution is repeated multiple times. The solution with the best rating is returned as the result. These calculations can be executed in parallel. If one of these solutions is complying with all constraints and preferences than every other remaining calculations are aborted.
If the program is used via console the output of the solution is stored in an excel table and the path of this file is printed to the console. If the program is used via web browser a download of the excel table is started.
There are multiple examples at the foldertestData
of the provided program of this program. At the end of each entry the corresponding
command is listed in order to execute the example from the console.