|
|
@ -5,3 +5,101 @@ ideas presented in [Peter Seymour's Efficient Lexicographic Encoding of
|
|
|
|
Numbers](elen.pdf). The paper's original source can be found at
|
|
|
|
Numbers](elen.pdf). The paper's original source can be found at
|
|
|
|
[http://www.zanopha.com/docs/elen.pdf](http://www.zanopha.com/docs/elen.pdf),
|
|
|
|
[http://www.zanopha.com/docs/elen.pdf](http://www.zanopha.com/docs/elen.pdf),
|
|
|
|
but it is re-hosted [here](elen.pdf) for posterity.
|
|
|
|
but it is re-hosted [here](elen.pdf) for posterity.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Numbers are ordered. That's part of their whole thing, what with them being
|
|
|
|
|
|
|
|
numbers and all. 1 comes before 2, and 2 comes before 3, and so on, and so
|
|
|
|
|
|
|
|
forth. If you sort a list of integers like `[1, 9, 4, 10, 13, 31]` you'll get
|
|
|
|
|
|
|
|
them back in their normal ordering: `[1, 4, 9, 10, 13, 31]`. If each of those
|
|
|
|
|
|
|
|
were to be strings, such that the input is `['1', '9', '4', '10', '13', '31']`,
|
|
|
|
|
|
|
|
sorting them lexically would not produce a proper numerical sorting, you would
|
|
|
|
|
|
|
|
wind up with `['1', '10', '13', '31', '4', '9']`. That is quite annoying when
|
|
|
|
|
|
|
|
doing things like appending numbers to identical file names or basically
|
|
|
|
|
|
|
|
anything where you wish to put a number in the middle of what is otherwise a
|
|
|
|
|
|
|
|
string.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You could always just pad your numbers with zeroes, so the prior example would
|
|
|
|
|
|
|
|
have an input more like `['01', '09', '04', '10', '13', '31']`. This works
|
|
|
|
|
|
|
|
fine, but it requires that you know something about your input in advance.
|
|
|
|
|
|
|
|
Namely, it requires that you know the range of the input, which may not always
|
|
|
|
|
|
|
|
be the case. More subtly, it also requires that you know that you're only
|
|
|
|
|
|
|
|
dealing with positive integers; even with zero padding, negative numbers would
|
|
|
|
|
|
|
|
lexically sort backwards (e.g., `'+1'` lexically precedes `'+2'`, which makes
|
|
|
|
|
|
|
|
sense because 1 is less than 2, but it's also the case that `'-1'` precedes
|
|
|
|
|
|
|
|
`'-2'`, which makes no sense, since -2 is less than -1). Doubly subtle is the
|
|
|
|
|
|
|
|
fact that the only reason a negative number string precedes a positive number
|
|
|
|
|
|
|
|
string is that the negative character precedes the zero character in the ascii
|
|
|
|
|
|
|
|
table. Triply subtle is that if you just always put a sign in front of your
|
|
|
|
|
|
|
|
numbers, the positive numbers will precede the negative numbers because the `+`
|
|
|
|
|
|
|
|
character precedes the `-` character in ascii. The end result is that a list
|
|
|
|
|
|
|
|
like `['+1', '+4', '-9', '+10', '-13', '+31']` would lexically sort to `['+1',
|
|
|
|
|
|
|
|
'+10', '+31', '+4', '-13', '-9']`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For a full description of how the problem is solved, read [the white
|
|
|
|
|
|
|
|
paper](elen.pdf).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## example
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following program would count from -20 to 20 and print their lexnum
|
|
|
|
|
|
|
|
strings:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
|
|
package main
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
import (
|
|
|
|
|
|
|
|
"fmt"
|
|
|
|
|
|
|
|
"github.com/jordanorelli/lexnum"
|
|
|
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
func main() {
|
|
|
|
|
|
|
|
e := lexnum.NewEncoder('=', '-')
|
|
|
|
|
|
|
|
for i := -20; i <= 20; i++ {
|
|
|
|
|
|
|
|
fmt.Printf("%-12s%d\n", e.EncodeInt(i), i)
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Running it would produce the following output:
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
--779 -20
|
|
|
|
|
|
|
|
--780 -19
|
|
|
|
|
|
|
|
--781 -18
|
|
|
|
|
|
|
|
--782 -17
|
|
|
|
|
|
|
|
--783 -16
|
|
|
|
|
|
|
|
--784 -15
|
|
|
|
|
|
|
|
--785 -14
|
|
|
|
|
|
|
|
--786 -13
|
|
|
|
|
|
|
|
--787 -12
|
|
|
|
|
|
|
|
--788 -11
|
|
|
|
|
|
|
|
--789 -10
|
|
|
|
|
|
|
|
-0 -9
|
|
|
|
|
|
|
|
-1 -8
|
|
|
|
|
|
|
|
-2 -7
|
|
|
|
|
|
|
|
-3 -6
|
|
|
|
|
|
|
|
-4 -5
|
|
|
|
|
|
|
|
-5 -4
|
|
|
|
|
|
|
|
-6 -3
|
|
|
|
|
|
|
|
-7 -2
|
|
|
|
|
|
|
|
-8 -1
|
|
|
|
|
|
|
|
0 0
|
|
|
|
|
|
|
|
=1 1
|
|
|
|
|
|
|
|
=2 2
|
|
|
|
|
|
|
|
=3 3
|
|
|
|
|
|
|
|
=4 4
|
|
|
|
|
|
|
|
=5 5
|
|
|
|
|
|
|
|
=6 6
|
|
|
|
|
|
|
|
=7 7
|
|
|
|
|
|
|
|
=8 8
|
|
|
|
|
|
|
|
=9 9
|
|
|
|
|
|
|
|
==210 10
|
|
|
|
|
|
|
|
==211 11
|
|
|
|
|
|
|
|
==212 12
|
|
|
|
|
|
|
|
==213 13
|
|
|
|
|
|
|
|
==214 14
|
|
|
|
|
|
|
|
==215 15
|
|
|
|
|
|
|
|
==216 16
|
|
|
|
|
|
|
|
==217 17
|
|
|
|
|
|
|
|
==218 18
|
|
|
|
|
|
|
|
==219 19
|
|
|
|
|
|
|
|
==220 20
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|