James Routley

I have worked on a number of programming exercises and CTFs recently, and I noticed I often lose time on this tradeoff:

When I need to keep track of a range of values (e.g. in an array), should my index boundaries be inclusive, exclusive, or a mix?

I didn't have a strong answer for this. I'd model out the index variations (slow). Or if I had time pressure, I'd just go with my gut (risky). Either case led to a slowdown in my coding. I've never personally realized nor seen a pattern for which rationale is "better", however, I'd know I made the wrong choice if I start seeing +1 and -1 index adjustments scattered in my code.

If you've ever encountered same problem, I have good news for you. I found a classic Dijkstra paper that concisely argues for a specific range boundary choice. It's EWD831: Why numbering should start at zero.

Dijkstra argues that [a, b) is the best way to represent an range boundary and give the following properties as evidence:

| range  | length    | next range | edge values |
| ------ | --------- | ---------- | ----------- |
| [a: b) | b - a     | [b: c)     |  0, n       | simplest
| [a: b] | b - a + 1 | [b+1: c]   |  0, n+1     |
| (a: b) | b - a - 1 | (b-1: c)   | -1, n       |
| (a: b] | b - a     | (b: c]     | -1, n+1     |

With the [a, b) range boundary, it is simple to calculate the length of the range, the next range, and the edge values of the range. It doesn't require any +1 or -1 adjustments or need a -1 case.

This paper won't win any SEO awards, as his choice for the best range boundary is an intermediary argument made to support his 0-indexing claim. But honestly, his range boundary argument is impactful and deserves a mention in the title. Otherwise, you might skip reading it because nearly every programming language and programmer prefer 0-indexing, even if it's just due to convention.

The [a: b) pattern shows up in multiple places.

Array slicing

It's typical to see languages slice arrays with [a: b) pattern.

a = 0, b = 3

""" Python """
arr = [1, 2, 3, 4, 5]
print(arr[0:3])
# [1, 2, 3]


const arr = [1, 2, 3, 4, 5]
console.log(arr.slice(0, 3))

Cursors are a common pattern in API pagination that reference a position for the next fetch of data. They're essentially a pointer so we can model a cursor as an array index (typically they're an encoded id reference to a database record).

table = ['apple', 'banana', 'apricot', 'date', 'fig', 'kiwi', 'lime']

@app.get("/api/fruits")
def get_fruit(cursor: int = 0, count: int = 2):
    next_cursor = cursor + count
    fruits = table[cursor:min(next_cursor, len(table))]
    
    return {
        "fruits": fruits,
        "next_cursor": next_cursor if next_cursor >= len(table) else None,
    }

Here, next_cursor is always the next element outside the returned set of fruits. It’s the exclusive b of the [a, b) range. Clients can use next_cursor to fetch the next set of fruits and expect [b, b + count) to be the next range.

Scanning/Lexical analysis

I've been working on an implementation of a lexical analyzer while working through Crafting Interpreters. These typically use sliding window implementations to tokenize the input source code while following the [a, b) range boundary. Here is a simple implementation that can tokenize let x.

const source = 'let x'
const tokens = []


let a = 0
let b = 0

function advance() {
    const char = source[b++]
    return char
}

while (b < source.length) {
    const char = advance()
    switch (char) {
        case ' ':
            break
        default:
            if (isAlphabeticCharacter(char)) {
                
                while (isAlphabeticCharacter(peek())) {
                    advance()
                }
                const text = source.slice(a, b) 
                if (text === 'let') {
                    tokens.push(Token('keyword', 'let'))
                } else {
                    tokens.push(Token('identifier', text))
                }
            }
            break
    }
    a = b
}

Notice how the b pointer is always advanced outside of the token range, making source.slice(a, b) easy to implement. And a is always reset to b before starting the next token.

Conclusion

I've been using the [a, b) range boundary for a while now and I haven't looked back. It's a simple and effective way to keep track of range boundaries. It's saving me time and reducing off by one errors. Thank you Dijkstra!

References

Why numbering should start at zero by Edsger W. Dijkstra

Dijkstra's preferred interval notation by Daniel Farlow

Reddit discussion on Dijkstra's Why numbering should start at zero

How a classic Dijkstra paper saved me from range boundary hell

Array slicing

Scanning/Lexical analysis

Conclusion

References