Blog

Update

Perfomance

23. January, 2022 | Nicolas Justen


The performance of the algorithm has increased significantly in the last few weeks. At the beginning of October, it was still impossible to compute larger data sets in a period of a few days. At the beginning of November, for the first time, a data set containing 82 users and 215,677 authorizations with 4 active target criteria was successfully computed in less than 48 hours.

Computing time

Due to several updates, the required computing time (without considering the meaningfulness) could be reduced by a factor of about 10 in the last month. Among other things, the tracking of deviations from the input matrix of solution concepts was improved. Instead of having to recalculate them after each change, they are now written to a separate matrix each time a user gains or loses authorizations. Also the addRole methods, with which the automatically generated roles are integrated into the individual solution concepts, could be made more efficient by some adjustments.

Evolutionary Algorithms

In general, the runtime of the Evolutionary Algorithm depends very much on the size of the matrices to be computed. It increases exponentially with the number of users and authorizations. In a first pre-processing step, the matrix with all users and calculations to be calculated is therefore clustered into small similar matrices. Next, the individual matrices resulting from the clustering are combined again by grouping users that require the same authorizations as well as authorizations that are assigne to the same users. Through these two mechanisms it is possible to separate the entire problem into many smaller sub-problems. The smaller sub-problems can then be computed much more efficiently by the Evolutionary Algorithm. After calculating the sub-problems, they are recombined. The finished result can then be examined in the user interface. This two-step pre-processing has now been carried out for the first time for real trace data sets. Due to this significantly more efficient calculation, it is now possible to investigate the quality of results of larger real data sets.

Performance killer

Currently, by far the biggest performance killer is the meaningfulness rating. For each randomly generated role by the Evolutionary Algorithm, a neural network must be used to evaluate the significance of the identified role. The more meaningful a role is, the more likely it is that a meaningful name can be generated for it at the end of the calculation.  The calculation could already be accelerated significantly by some adjustments, but it is still by far the most computationally intensive step.