Stable Marriage algorithm for matching lists of patterns
This is for - RHOL-533Getting issue details... STATUS
Why not keep it in JIRA
This document found its place in the wiki in result of the grief after 30 minutes of unsaved work being lost after pressing the wrong keyboard button in JIRA. I will not make this mistake again.
Currently, listMatch(patterns, terms) implements non-deterministic matching which can lead to backtracking and exponential matching cost.
By using the stable marriage algorithm with rankings for patterns and terms that lean on the total order we have for them, we can provide deterministic with O(M*N) time complexity (M = #patterns, N = #terms)
Stages (conceptual, actual implementation path may be different):
1. No wildcards, no remainders
The algorithm is (men ~ patterns, women ~ terms):
while ∃ free man m who still has a woman w to propose to { w = first woman on m’s list to whom m has not yet proposed if w is free (m, w) become engaged else some pair (m', w) already exists if w prefers m to m' m' becomes free (m, w) become engaged else (m', w) remain engaged
This assumes that there are M patterns and N terms and that M = N.
If M < N, there will not be a match unless there is a wildcard or a remainder (see their sections below) and we should short-circuit unless there is one.
Definition
For the purpose of this document, and general coherence of mental models across the teams, wildcards and remainders should not be counted into the pattern count. Their handling is special-case enough to justify this definition by the algorithm simplifications it allows.
The candidate rankings (preferences) in our setting are defined based on following criteria:
For patterns: 1) matching terms before non-matching ones 2) lower term score first
For terms: 1) matching patterns before non-matching ones 2) lower pattern score first
where score is determined using
*SortMatcher.sortMatch
Notice that in this statement of the algo, all participants always end up married. This is OK, as the marriage is guaranteed to be stable, meaning that there will be no (x, y) where x and y would both prefer to be together according to their candidate rankings. This means, given our ranking functions, that we'll always get a solution with maximal number of patterns matched to terms (and vice versa, as the Matches: <Pattern, Term> relation is symmetric). If that solution turns out to contain a pattern-term pair that doesn't match, then this set of patterns and terms can't be matched.
2. Optimization: short-circuit if a pattern has no matching terms
In such case, the algo above would return an assignment where at least one pattern-term pair is 'married' despite not matching each other. Any such solution means the list of patterns we're matching can't be matched to the list of terms.
We can detect such solutions early, at the end of each iteration of the main loop. At the point where the algo finishes the main loop iteration for a given man m, there's no chance m will get a better option later. In fact, it can only get worse b/c of the swap that happens in lines 6-8. This means, in our case, that once a pattern ends its loop and doesn't have a matching term assigned, we won't be able to match the patterns to the terms and can terminate.
3. Wildcards support
Is basically for free. We just run the algorithm for M (#patterns) <= N (#terms) and declare the result a match iff all patterns have a term assigned and there's at least one wildcard in the pattern list. E.g.
{ 1 | 2 | 3 } matches { 2 | _ | 1 | _ }
is matching N = 3 terms to a pattern list of M = 2 patterns and K = 2 wildcards, and the match will be successful.
4. Remainders support
If M (#patterns, not including the remainder!) <= N (#terms), we populate the pattern pool with M - N 'remainder proxies' that match any term that is concrete. A term is concrete iff its locallyFree set isEmpty. If the algo finds a match, the remainder variable gets assigned the composition of whatever matched all the remainder proxies. This includes the case were there were M = N patterns - in such case, the remainder is assigned the zero (Nil / empty list / empty set / empty map)
Why do we restrict matches to concrete terms only?
This is to avoid capturing by the pattern a value that would be undefined. So far, known examples include capturing values of variables not available in the pattern's scope. For example, the pattern:
for (x <- @0) { y }
must not match the term
for (x <- @0) { *x }
because a match would have to return on the variable y the value of *x, which is impossible until the read's body is executed.
This can be guarded against by restricting to concrete terms, like:
for (x <- @0) { 42 }
which would, in turn, match our pattern.
Isn't that too restrictive?
What about capturing free variables that are common in the pattern's and the term's scope? E.g. one would correctly expect
new z in { for (@x <- Nil) { z } matches for (@x <- _) { s } //the pattern is on the rhs of `matches` keyword }
the matches expression to evaluate to true above, as the 'z' value is known in both the pattern and the target term. The match target is not concrete though, as it contains a free variable 'z'. Shouldn't we then just restrict to terms that could be evaluated in the pattern's scope?
The above match will though be successful, as one would expect. This is because, before we reach spatial matching phase, all the match target terms (and patterns) are fully substituted. In place of each free variable that was defined in the target term's scope, there's now a concrete term.