Jordan Orelli 460eeb1257 | 9 years ago | |
---|---|---|
LICENSE | 9 years ago | |
README.md | 9 years ago | |
elen.pdf | 9 years ago | |
lexnum.go | 9 years ago | |
lexnum_test.go | 9 years ago |
README.md
lexnum
lexnum provides a lexicographic encoding of numbers. It is based off of the ideas presented in Peter Seymour's Efficient Lexicographic Encoding of Numbers. The paper's original source can be found at http://www.zanopha.com/docs/elen.pdf, but it is re-hosted here for posterity.
Numbers are ordered. That's part of their whole thing, what with them being
numbers and all. 1 comes before 2, and 2 comes before 3, and so on, and so
forth. If you sort a list of integers like [1, 9, 4, 10, 13, 31]
you'll get
them back in their normal ordering: [1, 4, 9, 10, 13, 31]
. If each of those
were to be strings, such that the input is ['1', '9', '4', '10', '13', '31']
,
sorting them lexically would not produce a proper numerical sorting, you would
wind up with ['1', '10', '13', '31', '4', '9']
. That is quite annoying when
doing things like appending numbers to identical file names or basically
anything where you wish to put a number in the middle of what is otherwise a
string.
You could always just pad your numbers with zeroes, so the prior example would
have an input more like ['01', '09', '04', '10', '13', '31']
. This works
fine, but it requires that you know something about your input in advance.
Namely, it requires that you know the range of the input, which may not always
be the case. More subtly, it also requires that you know that you're only
dealing with positive integers; even with zero padding, negative numbers would
lexically sort backwards (e.g., '+1'
lexically precedes '+2'
, which makes
sense because 1 is less than 2, but it's also the case that '-1'
precedes
'-2'
, which makes no sense, since -2 is less than -1). Doubly subtle is the
fact that the only reason a negative number string precedes a positive number
string is that the negative character precedes the zero character in the ascii
table. Triply subtle is that if you just always put a sign in front of your
numbers, the positive numbers will precede the negative numbers because the +
character precedes the -
character in ascii. The end result is that a list
like ['+1', '+4', '-9', '+10', '-13', '+31']
would lexically sort to ['+1', '+10', '+31', '+4', '-13', '-9']
.
For a full description of how the problem is solved, read the white paper.
example
The following program would count from -20 to 20 and print their lexnum strings:
package main
import (
"fmt"
"github.com/jordanorelli/lexnum"
)
func main() {
e := lexnum.NewEncoder('=', '-')
for i := -20; i <= 20; i++ {
fmt.Printf("%-12s%d\n", e.EncodeInt(i), i)
}
}
Running it would produce the following output:
--779 -20
--780 -19
--781 -18
--782 -17
--783 -16
--784 -15
--785 -14
--786 -13
--787 -12
--788 -11
--789 -10
-0 -9
-1 -8
-2 -7
-3 -6
-4 -5
-5 -4
-6 -3
-7 -2
-8 -1
0 0
=1 1
=2 2
=3 3
=4 4
=5 5
=6 6
=7 7
=8 8
=9 9
==210 10
==211 11
==212 12
==213 13
==214 14
==215 15
==216 16
==217 17
==218 18
==219 19
==220 20