back to first page
This complements the first page, which describes the process adopted by the program, whereas here we get more into the program logic.
This is how the latest version of the program works. It's many orders of magnitude more productive than the original 2012 version.
The file (udefile) of previously generated UDEs (actually UDE/unused-entry-point sets) is read into an array in memory.
Each UDE (shorthand hereinafter for UDE/unused EPs set) is held as 20 4-byte integers, each integer containing 13 values, each value being 0,1,2,3 or 4 to represent an unused entry point (EP).
As well as the 20 4-byte integers, for each UDE there's a 1-byte field which is the UDE's number of unused EPs, and a 1-byte flag: 1 if the UDE has been analysed, 0 if it hasn't. So 82 bytes are needed to store each UDE in memory arrays (and 103 on disk, including spaces between fields).
The udefile is already sorted such that UDEs with most unused EPs come first. As the file is read into the array, a note is kept of where in the array UDEs of each number of unused EPs starts and how many of them there are. For example, UDEs with 9 unused EPs might start at the 20,135th place in the array and there might be 1,204,371 of them.
The program has a form of checkpoint restart. After processing a certain number of UDEs from the udefile, the array, including new sets found, is written back to the udefile.
In each of the program's "superloops" a number of UDEs from udefile are processed, the number being specified by parameter (hardcoded at the top of the program - this turned out to be less hassle than reading it, and other parameters, from a file).
There is an option (another parameter) to process the udefile - actually the array - either starting at the front where the UDEs with the most unused EPs are found, or the back end where those with the fewest are found.
The superloop now begins. We'll describe the process assuming we are starting at the beginning of the file. We find the first UDE in the array that hasn't yet been processed (flag=0) and set its flag to 1. This "starting UDE" is decompressed into a 256 element array.
We zip through the 256 elements to find which EPs are unused (value 4) and make a list of these unused EPs. We make a copy of the 256 element UDE. Into this copy, which we will call "newUDE" we put random values of 0, 1, 2, or 3 into each unused EP.
This newUDE will now be analysed to see if it can breed more UDEs.
What follows is the method for taking the newUDE (with values 0-3 in each of its 256 positions) and generating new UDEs from it.
We determine for newUDE which unary functor each substitution simulates. There are 369 substitutions but only 256 unary functors (UFs) so some UFs will be simulated by more than one substitution. We build a table showing for each UF which substitution(s) simulate it.
For UFs with only one substitution, the substitution's 4 entry points will obviously have to be used. They are marked as such on a simple 256 element array where 0 means not used, 1 means used. These UFs are noted as defined.
We examine UFs with more than one substitution. For each of these UFs we see if there is a substitution than only uses EPs than were marked as used in the paragraph above. If so, these UFs can be marked as defined without using up any more unused EPs - which is good.
For each UF that is still undefined, we see if ALL of its substitutions use a common EP. Those EPs are marked as used. For example, if there are 3 substitutions and they all use EP 127, EP 127 will have to be used.
We now re-examine the undefined UFs. If any has a substitution that uses only EPs that are now marked as used, they are marked as defined. (Performing this check both before and after the 'common EPs' check is actually faster than doing it only after the common EPs check.)
We now have a 'hardcore' of undefined unary functors, each with more than one substitution. We move these 'hardcore' UFs into a table. Of course, for each of these hardcore UFs, any one of its substitutions could be used to define it.
So, we now select one substitution for each of these hardcore UFs, and mark all the EPs they need as used. Our list of used/unused 256 EPs will now show most EPs have been used. But some will remain unused.
Some UDEs can end up with as many as 50 hardcore UFs at this point. Typically each will have 2 or 3 substitutions that could be used to define it (and exceptionally up to a dozen or so). Trying to do all 250 or 350 permutations of these substitutions would be well nigh impossible.
So a limit is imposed. For UDEs with 16 or fewer hardcore UFs all combinations of substitutions are examined. But for UDEs with more than 16 hardcore UFs, the following process is adopted. For the 17th and above UFs a random substitution is selected, and then all combinations of the other 16 UFs' substitutions are processed. The number of sets of random substitutions for UFs 17+ can be varied.
This would seem to favour the first 16 UFs. A run was tried in which the hardcore UFs list was inverted so that the last 16 UFs were fully permed. This yielded hardly any additional UDEs. This is not too surprising. If the first UFs tended to use, say, substitutions with low numbers and later UFs substitutions with higher numbers, then processing these later UFs would tend to yield additional UDEs. But it is not so: a later UF is just as likely to use a low-numbered substitution as an earlier one, and vice versa. In other words, perming the later UFs is likely to yield combinations of substitutions much like those yielded by the earlier UFs - and thus the same sets of unused EPs.
The real value of doing a few sets of random combinations of substitutions for UFs 17+ is that no UDE is ignored. Previously the program ignored any UDEs that ended up with more than 16 hardcore UFs.
A recent (2018) improvement sorts the hardcore UFs such that those with the most substitutions are in the lowest positions of the array and thus examined in the innermost of the 16 loops. This reduces the number of executions of outer loops. This change alone resulted in a significant performance improvement.
For the first 16 UFs we make all the permutations of substitutions by 16 nested DO loops, each doing 1, number-of-substitutions-for-that-UF. (Originally the program had two rather more elegant nested loops that could do any number of UFs, but it was found that hardcoded loops run faster.)
Say there are 7 UFs. We select a substitution for the first of these 7 UFs. A subroutine is called which determines which EPs this substitution uses, and deletes those EPs from the list of unused EPs. We then ask: does the, now depleted, list still have enough unused EPs? "Enough" is set by a parameter. In some runs we might want to find new UDEs with just 1 unused EP, in which case this parameter will be set to 1. In other runs we may be looking only for new UDEs with 28 EPs. So, if the depleted list of unused EPs does not have the requisite number of EPs, no more loops are executed.
In the first (more elegant looking) version of the program, the EPs used by the substitutions for all 7 (in our example) UFs would have been found and only then would the question be asked: are there enough (or indeed any) unused EPs remaining. The latest version of the program runs much faster as it so often obviates the need to execute all the nested loops.
If we reach the innermost loop we have found a new UDE with enough unused EPs. We are now at the part of the program that is executed most, and its code has consequently been mercilessly tuned.
Reaching the start of the innermost loop, we have a line of unused EPs for newUDE. For example: 17 97 104 208 211. (If you've been following, you'll realise that in this run we're happy to record new UDEs with 5 unused EPs - so we could be running with the parameter 'recordeps' set to 1, 2, 3, 4 or 5. Let's say recordeps is 5: we're hunting for new UDEs with 5 or more unused EPs.)
In essence this 'inner sanctun' code answers a simple question: is the new UDE/unused EP set which is represented by this line of unused EPs, actually new or is it a duplicate or subset of a previously found set?
First we see if this set of unused EPs is a subset of the starting UDE's unused EPs. If so we ignore this line and cycle to have a look at the next combination of substitutions and see what EPs it leaves unused.
If this line isn't a subset of the starting UDE's, we examine a table of lines of unused EPs that have already been produced from previous combinations of substitutions for newUDE.
Each line in the table is a set of unused EPs for this newUDE. For example, for this newUDE the table might be:
17 24 97 104 208 211
15 23 97 104 223
85 92 97 203 254
We check the new set against the table. If the new set is a duplicate or subset, it's ignored and we cycle (to the end of the innermost substitutions loop and thus see if the next permutation of substitutions will be more successful).
If you're on the ball you'll have spotted that our new line, 17 97 104 208 211, is actually a subset of the first line in the table. So our newline is ignored and we cycle.
If this new line of unused EPs wasn't a subset or a duplicate, is it a superset?
Let's say our newline had been
15 23 97 101 104 223. It's a superset of line 2. So line 2 (and any other subsets of newline) is deleted from the table.
If newline is a superset, or just a new set, it is added to the table.
When we've executed all the nested loops we have a table of lines of unused EPs. We make a copy of newUDE and into this we put each line of unused EPs and see if the resultant UDE/EP set is new, or a duplicate of an old one.
For example suppose newUDE is:
3 0 2 1 1 0 0 2 1 3 0 1 1 3 3 3 2 2 0 1 etc to 256 digits, and our line of unused EPs is 2, 7, 12, 15,19. Our new UDE/EP set would be
3 4 2 1 1 0 4 2 1 3 0 4 1 3 4 3 2 2 4 1 etc to 256 digits.
This new UDE/EP set is compressed into 20 digit form and added to the array of UDE/EP sets found in this superloop.
(Some newUDEs may have thousands of combinations of substitutions and may produce thousands of lines of unused EPs. In a previous version of the program, each of these was converted into a UDE and compared to the new and old UDE arrays. But, now that the newlines table is subsetted, typically only a few dozen lines may emerge from the loops, so only this much smaller number have to be converted and compared to previously discovered UDEs.)
We then go round the 'randoms' loop again and put another set of random values into the stating UDE's unused EPs to make a new newUDE.
To summarise the above. We have three nested loops:
Loop 1 (superloop): Read the udefile into memory, process a number of UDEs to create new UDEs, write UDEs back to file.
Loop 2: Take a UDE from the udefile and put sets of random values into its unused EPs to create (many) "newUDE". Take newUDE, find which UFs each substitution simulates, find the 'hardcore' UFs each of which could be defined by two or more substitutions.
Loop 3: Examine all combinations of substitutions that could be used to define the hardcore UFs. Build a table of lines of unused EPs for newUDE. Exit loop 3 and execute the remainder of loop 2.
Loop 2 remainder: check if each line of unused EPs for newUDE represents a new UDE/EP set or is a duplicate of one already found. Add new UDE/EP sets to the array of UDE/EP sets found in this superloop.
To illustrate the above with an example. In loop 1 the following UDE/EP set is taken from the UDE array. It has 3 unused EPs.
(We'll just show the first 20 of the 256 of the UDE's digits.)
3 0 2 1 4 0 0 2 1 3 0 1 1 4 4 3 2 2 0 1...
In loop 2, random values are put into the starting UDE's unused EPs. Putting the first set of random values in might give us:
3 0 2 1 3 0 0 2 1 3 0 1 1 2 0 3 2 2 0 1...
Then in loop 2 we find what sets of unused EPs this unique newUDE can produce, and then in loop 3 we thrash through all combinations of substitutions for this newUDE's 'hardcore' UFs and then generate some new UDE/EP sets and check them against previous UDE/EP sets to see if the new ones really are new. Then we go round loop 2 again with another set of random values in the starting UDE's EPs, say
3 0 2 1 1 0 0 2 1 3 0 1 1 3 3 3 2 2 0 1...
And so on until we've done the requisite number of sets of randoms. And when that's done we take the next unprocessed UDE/EP set from the array and repeat the process for it.
Which all sounds straightforward enough.
When we've been round the superloop (loop 1) the requisite number of times (set by a parameter), we come to end of superloop processing.
The new UDEs are sorted within number of unused EPs. Even if there are 1,500,000 new sets, this sort takes maybe 10 seconds. The sort saves bucketloads of time in the duplicate hunt and in the subset hunt.
The new UDE/EP sets are examined and duplicates are deleted. The remaining UDE/EP sets are compared to UDE/EP sets on the udefile and duplicates are deleted. Having the UDE/EP sets sorted makes this duplicate deletion process fast. Whereas checking, say, 30,000 new UDEs vs the udefile sometimes took 20 minutes, it now takes 1 or 2 seconds.
We then examine the remaining new UDEs to see if any is a subset of any other in the new UDEs array and if so set subsets' flags to 8.
We then see if any of the remaining new UDE/EP sets are subsets of old UDE/EP sets and if so, flag them for deletion. If a new UDE has, say, 10 unused EPs, we only need examine old UDEs which have 11 or more unused EPs.
We then check whether any new UDE is a superset of UDEs in the old file. If a new UDE/EP set has, say, 10 unused EPs it must be compared to old ones with 9 or fewer unused EPs. There can be a huge number of these and this process can take quite a time - and find very few subsets. In some runs this superset check is skipped. This means subsets exist on udefile. The penalty of this is only that we end up processing a subset and its superset - the first being unnecessary. This doesn't really matter as long as there is a relatively small number of subsets. An external subsetter has been written which periodically cleans the file, but it's rarely used in practice.
The old and new UDEs are merged such that they are sorted on UDE within number of unused EPs. After a certain number (set by parameter) of superloops the UDEs are written back to the udefile.
After a certain number of UDEs have been processed, a file called "udestopper" is read.
If it contains a 1, the program goes to end-of-superloop processing, does the subsetting, writes to the udefile and stops.
Thus the program can be brought to a controlled stop at any time by using Notepad to put a 1 in the stopper file.
A UDE with 30 unused EPs is equivalent to 230 unique UDEs. Ideally all of them would be analysed to see what new UDE/EP sets they might produce. Impossible!
So, our first limitation is to only try a small number of sets of values in a UDE's unused EPs. Experience has shown that under a hundred sets of random values is usually more than enough to give good results. Indeed, with lower order UDEs, say with 15 unused EPs or fewer, analysing just 1 set of random values on average produces more than 1 new UDE/EP set. Only when we get to the top of the food chain do we try thousands of sets of random values to squeeze everything we can out of these rarer and less fecund beasties.
The second major limit, as mentioned above, is only trying a few combinations of random substitutions for UFs 17 and above when there are more than 16 hardcore Undefined Functors. (Another version of the program has been written which tries random substitutions for all UFs, but this does not produce better or faster results.)
A version of the main program which can fully perm up to 36 hardcore UFs' substitutions has also been tried. This takes a very long time to run and yields hardly any additional UDEs.
So, the program has the above two major limits.
A further limit is simply the number of UDEs that can be handled by the program. For example, some vast and unknown number of UDE/unused EP sets with only one unused EP could be generated. Once the number of UDEs on the udefile gets above 12 million or so processing slows and the laptop starts to struggle. So once the file gets to around 15 million, UDEs with the lowest EPs are dropped. How much of a limitation is this? In a run we usually hold UDEs of 3 or 4 levels. For example, 30s, 29s, 28s and 27s. There might be 40 times more 29s than 30s, 40 times more 28s than 29s, etc. So the number of 27s is the problem. But by restricting their number (a parameter says what the maximum number of any level of UDE will be) we might be excluding genetic material that might be the magic gateway to UDEs with the very highest number of unused EPs.
Originally the program could only handle a few hundred thousand UDEs. The current practical limit of around 15 million
is testament to the program's improvement.
A possible further improvement would be to hold every nth UDE in full and subsequent ones only as "what's different" data.
This would clearly
help the file size and array size problem. But the time taken finding if a new UDE is a subset of an old one is directly
proportional to the number of old UDEs, and subsetting against a few million UDEs is already a struggle. A parallel computing
approach, where a separate engine handles the elimination of subsets for example, would be great. But rather beyond the
capabilities of a laptop. In long runs, if no subsets at all are eliminated the file tends to end up dominated by subsets.
Notes on why the program now achieves so much more.
In March and April 2012 the initial version of the progam took weeks to produce 100,000 UDE/EP sets. Today's version of the program can produce that number in minutes - on the same trusty old laptop. The lastest version is 4 or 5 orders of mangitude faster than the first version. Though in defence of V1, at that time there was nothing to suggest a need for speed or the need to store large numbers of UDEs.
Some of the program enhancements (and dead ends) have been hinted at above.
Looking back to the initial runs in 2012, it took several weeks of running to produce UDEs with 27 unused entry points. It is a measure of the improvement in the program's performance that, starting with one UDE with 1 unused entry point, UDEs with 27 can today be produced in a run taking ten minutes. What has contributed to this improvement in performance?
One contributor to enhanced speed is testing. How much faster does a loop run if it refers to a single dimension array udej(i) rather than a two dimensional array ude(i,j) - and how many loops make the overhead of assigning ude(i,j) to udej(i) worthwhile? Are 16 hardcoded nested loops faster than two (elegant) loops that do the same thing? (Yes.) Many such tweaks to the code have made a big difference.
In the very first version of the program, UDE/EP sets were held in a 256 character array and a 256 integer array, and a list of unused EPs was also held. And each UDE had a serial number. Which all seemed like good ideas at the time. Now each UDE/EP sets is held in 82 bytes (20 4-byte elements, each holding 13 EP values) plus how many unused EPs the UDE has and a 1 byte processed/unprocessed flag.
A major improvement was the switch to random values in unused EPs. Initially, if a UDE had, say, 16 unused EPs, the program generated 216 UDEs from it by putting all combinations of values in those unused EPs. However, it has been found that a mere 20 or so sets of random values in those 16 EPs produces the vast majority of the UDEs that trying all 216 sets would. Clearly, this is much faster.
As mentioned above, doing all combinations of substitutions only for 16 hardcore UFs - rather than potentially up to 50 or so - made a big difference. (Very roughly half of UDEs end up with fewer than 16 hardcore UFs, so these at least are fully permed.)
Subsetting the lines of unused EPs for a given newUDE, and only transforming the remaining (non-duplicate, non-subset) lines into UDEs to be compared to old UDEs in the hunt for duplicates also made a marked improvement.
The code at the heart of the program had been successively tuned to such an extent that the time spent checking whether a new UDE/EP set was already in the file became disproportionately large.
Perhaps the most significant recent improvement has been to sort the UDEs on their first 9 (of 20) elements. Previously, to find if a new UDE was a duplicate of an old one meant searching through maybe millions of old (unsorted) UDEs. With say 10,000 new UDEs, that meant 10,000 passes through the millions-strong old UDE array. Slow. Now, running through a sorted table of new UDEs and a sorted table of old UDEs means essentially a single pass through the old UDEs. A superloop that yields say 5000 new (non-duplicate, non-subset) UDEs may have generated a million UDEs - most of the other 995,000 being duplicates amongst themselves or duplicates of old UDEs (and some being subsets), so efficient duplicate hunting turns out to be absolutely key to performance.
Over the past four and a half years there have been three or four splurges of activity, lasting a few weeks, during which improvements have beem made. The program has been put away a few times believing no more imporvement is possible - only to be resurrected (after a gap of two and a half years prior to October 2016) and improvements made.
Now, December 2016, the program is going back into cold storage as it seems the limit of what can be achieved on a laptop
may well have been reached.
Webpages written: May 9th 2012 - December 2016 (on and off)
Copyright M Harding Roberts 2012 - 2016