Patterns Look Different When a Machine Has to Actually Check Them
Published at March 16, 2026 ... views
One thing that clicked for me recently is how different it feels to describe a pattern versus checking whether something matches it.
Describing is easy. You can say "any number of a's followed by any number of b's" and a human immediately gets it. But if you had to build a machine — a very simple machine, with no scratch paper, no memory beyond where it currently is — could it still check that pattern?
It turns out, yes. And the machine that does it is called a deterministic finite automaton, or DFA. It's the simplest computation model that exists, and yet it can recognize a surprisingly wide range of patterns.
In the previous post, I covered the building blocks: alphabets, strings, languages, and the three operations (union, concatenation, Kleene star). This post picks up right where that left off — first making regular expressions formal, then introducing the machines that actually do the checking.
Regular expressions are recursive — you build them from the bottom up
The formal definition of a regular expression is elegant because it's recursive. You start from three atomic pieces, and you combine them using three operations. That's it. Everything else is built from those.
The rules:
- Any single character from the alphabet is a regular expression
- ε (the empty string) is a regular expression
- ∅ (the empty set) is a regular expression
- If R₁ and R₂ are regular expressions, then R₁ ∪ R₂ is too
- If R₁ and R₂ are regular expressions, then R₁R₂ (concatenation) is too
- If R is a regular expression, then R* is too
That's the complete definition. Every regular expression you'll ever write can be traced back to these six rules applied some number of times.
Think of it like cooking. You start with raw ingredients (individual characters, ε, ∅). Then you combine them using three techniques (union, concatenation, Kleene star) — and you can keep combining the results with more of the same techniques, layering complexity until you have something sophisticated. A Beurre blanc is just butter, shallots, and wine — combined in a specific recursive sequence of reductions.

Shorthand you'll sometimes see
There are two common shortcuts that aren't part of the formal definition but are built from it:
- R⁺ = RR* (one or more repetitions of R)
- Rᵏ = R concatenated with itself k times
These are just conveniences. They don't add any power — anything you can write with R⁺ or Rᵏ, you can write with the six rules above.
The L(R) notation keeps descriptions and languages separate
This is a small but important point. A regular expression R is a string of symbols — it's a description. The language it describes is written L(R).
So when you write 0*1, that's a pattern. When you write L(0*1), that's the actual set of strings matching that pattern.
Some textbooks blur this distinction and write things like 0*1 = #123;1, 01, 001, ...#125;. That's technically sloppy because the left side is a description and the right side is a set — two different types of objects. But you'll see it in the wild.

Order of operations matters more than you'd think
Just like arithmetic has "multiplication before addition," regular expressions have a precedence order:
- Kleene star (highest — applied first)
- Concatenation
- Union (lowest — applied last)
This means ab* is NOT (ab)*. It's a(b*) — the string a followed by zero or more b's.
This is exactly like how 2 + 3 × 4 equals 14 (not 20) because multiplication binds tighter than addition. Parentheses override the default order when you need them.
The empty set is a black hole for concatenation
Here's something that tripped me up at first. When you concatenate anything with the empty set, you get the empty set back.
L(R ∘ ∅) = ∅ — no matter what R is.
Why? Because concatenation needs a string from both operands. If one of them has no strings, you can't build any concatenation at all. It's like trying to make a sandwich when you have bread but no filling — and the recipe requires both.
But the empty string works differently:
L(R ∘ ε) = L(R) — concatenating with the empty string changes nothing.
∅ destroys concatenation. ε is invisible to it. ∅ is invisible to union.

Now let's push it — how far can regular expressions go?
This is the question I find most interesting. We've built up this notation with three operations. Can it describe any language? Or are there patterns too complex for it?
Keep that question in mind. We won't fully answer it until we get to the pumping lemma in a later post. But the hint is: yes, there are languages that no regular expression can describe. And proving that turns out to be one of the most elegant arguments in computer science.
For now, let's shift from describing patterns to checking them.
A DFA is a machine that reads one character at a time and never looks back
Imagine you're a security guard checking IDs at a door. You have a very specific set of rules:
- You start in a "waiting" state
- When someone shows their ID, you check one piece of information at a time
- Based on what you see and what state you're currently in, you transition to a new state
- At the end, you're either in an "allow entry" state or a "deny entry" state
- You can't go back and re-check something you already saw
That's a DFA. It's a machine with:
- A finite set of states (your possible mental states while checking)
- An alphabet (the characters it can read)
- A transition function (for every state + character combination, there's exactly one next state)
- A start state (where computation begins)
- A set of accept states (the "yes" outcomes)
The formal definition (from Sipser's Definition 1.5) writes it as a 5-tuple: (Q, Σ, δ, q₀, F).
The transition function δ is the heart of the machine. It's a function that takes a state and a character and returns exactly one next state. No ambiguity, no choices — that's what makes it deterministic.

DFAs look like directed graphs with a special vocabulary
When we draw a DFA, it's a directed graph where:
- Circles are states
- Arrows are transitions (labeled with characters)
- The start state gets an incoming arrow from nowhere
- Accept states get double circles
- Every state has exactly one outgoing arrow per character in the alphabet
Here's a concrete example. This DFA recognizes strings over a, b where a's are never followed by b's that are then followed by a's again. In other words: all the a's come first, then all the b's.
States q0, q1, and q2 are accept states (shown as double circles in actual diagrams). q3 is a "trap state" — once you get there, you can never leave, and the string will be rejected.
Computation is just following arrows
Given an input string, computation works like this:
- Start at the start state
- Read the first character of the input
- Follow the arrow labeled with that character to the next state
- Read the next character, follow the corresponding arrow
- Repeat until you've consumed the entire string
- If you end in an accept state → accept. Otherwise → reject.
Let's trace through the input aabb on the DFA above:
Wait — that ends in q3 (reject). But aabb should be "a's followed by b's." Let me re-examine the DFA... Actually, q1 transitions to q3 on b, meaning that once you've seen a's and then encounter a b, you go to the trap state.
That's the wrong DFA for this language. Let me use the correct one from the lecture: a DFA where q1 → q4 on b, and q4 is the accept state for the "seen b after a" phase.
Here's the correct DFA for "a's followed by b's" (at least one a, at least one b):
Here q4 is the only accept state. q2 is a "dead" state — once a pattern violation is detected, you end up there permanently.
Tracing aabb:
| Step | State | Read | Next State |
|---|---|---|---|
| Start | q1 | a | q3 |
| 2 | q3 | a | q3 |
| 3 | q3 | b | q4 |
| 4 | q4 | b | q4 |
End state: q4 (accept). The string aabb matches the pattern.
Tracing abab:
| Step | State | Read | Next State |
|---|---|---|---|
| Start | q1 | a | q3 |
| 2 | q3 | b | q4 |
| 3 | q4 | a | q2 |
| 4 | q2 | b | q2 |
End state: q2 (reject). The string abab doesn't match because an a appeared after a b.
Every state in a DFA has a "job"
This is something I picked up that makes DFAs much easier to reason about. Instead of thinking about states as abstract labels, assign each one a role — a description of what being in that state "means" about the input so far.
For the DFA above:
| State | Role |
|---|---|
| q1 | Haven't seen anything yet |
| q3 | Seen one or more a's, no b's yet |
| q4 | Seen a's then b's — pattern is valid so far |
| q2 | Pattern violated — dead state |
This is like how a recipe has stages. You're either "prepping ingredients," "cooking the base," "adding sauce," or "it's burnt and there's no going back." Each stage determines what you should do next. A DFA works the same way — your current state encodes everything you need to know about what you've seen so far.
The key insight: the current state is the DFA's only form of memory. It can't remember the exact string it's seen — only which state it ended up in. That's what makes it "finite." And that's also what limits it — but more on that later.

The language of a DFA is the set of all strings it accepts
Every DFA defines a language: the set of all strings it accepts. We write this as L(M) where M is the DFA.
This connects directly to regular expressions. The language of the DFA above is the same language described by the regular expression a⁺b⁺ — one or more a's followed by one or more b's.
And this raises the big question from the previous post: is there a regular expression for every DFA? And conversely, is there a DFA for every regular expression?
The answer to both is yes. That equivalence is one of the foundational results of this field, and we'll prove it formally in a future post. But for now, just know that these two formalisms — one algebraic (regular expressions), one mechanical (DFAs) — describe exactly the same class of languages: the regular languages.
Different DFAs can recognize different languages — and the proof is just one string
How do you prove two DFAs recognize different languages? You find a single string that one accepts and the other rejects.
That's it. One counterexample is enough to show L(M₁) ≠ L(M₂).
But proving two DFAs recognize the same language is much harder. You'd need to show that every string is handled identically by both machines. One counterexample breaks equality; confirming equality requires exhaustive proof. There are algorithms for this (we'll see them later), but the asymmetry is worth noting.
Complementing a DFA is as easy as flipping accept states
Here's something beautiful. Say you have a DFA that recognizes "strings where ab appears as a substring." What if you want the opposite — strings where ab never appears?
Just take the same DFA and swap the accept states with the non-accept states. Every string that was accepted is now rejected, and vice versa. The complement language is recognized by the complemented DFA.
This works because a DFA processes every string to completion — it always lands in some state. So flipping which states count as "accept" perfectly inverts the language.
Think of it like a music audition. The judges have a scorecard. If you flip the pass/fail threshold — what used to be "pass" becomes "fail" and vice versa — you've complemented the selection criteria without changing the audition process itself.
This property is called closure under complementation, and it's one of the reasons regular languages are so well-behaved mathematically.

DFAs in the real world are everywhere — you just don't notice them
The concept of a DFA shows up constantly in systems you interact with daily:
- Traffic lights cycle through green → yellow → red → green based on timers (input signals). Each color is a state, and the transitions are deterministic.
- Vending machines track how much money you've inserted. Each amount is a state. Inserting a coin transitions you to the next state. The "accept state" is when you have enough to dispense.
- Elevator controllers respond to floor requests. The current floor is the state, and button presses are the input symbols that drive transitions.
- Email validation — when you type an email address and a website highlights it red or green, that's often a DFA checking the pattern character by character.

The connection is that all these systems have finite memory (a fixed number of states), deterministic rules (the same input always produces the same transition), and process inputs sequentially (one event at a time). That's exactly the DFA model.
Where this is heading
We now have two ways to describe regular languages: regular expressions (algebraic descriptions) and DFAs (mechanical checkers). Next up, we'll introduce nondeterministic finite automata (NFAs) — machines that can be in multiple states at once — and show that they're equivalent to DFAs despite being seemingly more powerful.
Then comes the pumping lemma, which gives us a way to prove that certain languages are not regular — that they're genuinely beyond what any DFA or regular expression can handle. That's where the real limits start to appear.
A few things I'm taking away
- Regular expressions are defined recursively — you build them from atoms (characters, ε, ∅) using three operations (union, concatenation, Kleene star), and the result can be combined again using those same operations
- The L(R) notation keeps the description (the regular expression) separate from the thing being described (the language) — mixing them up is like confusing a recipe with the dish it makes
- Order of operations in regular expressions — Kleene star first, then concatenation, then union — can completely change what language is described
- Concatenating with the empty set destroys everything (like multiplying by zero), but concatenating with the empty string changes nothing (like multiplying by one)
- A DFA is a 5-tuple: states, alphabet, transition function, start state, and accept states — and the transition function is what makes it deterministic
- Every state in a DFA has a "job" — assigning roles to states makes it much easier to understand and build DFAs
- The current state is a DFA's only memory — it can't remember the exact input it's seen, only the state it ended up in
- Different DFAs can recognize different languages, and a single counterexample string is enough to prove it
- Complementing a DFA just means flipping which states are accept states — the machine stays the same, only the judgment changes
- Regular expressions and DFAs describe exactly the same class of languages (regular languages) — this equivalence is a foundational result we'll prove later
- DFAs show up everywhere in daily life: traffic lights, vending machines, elevators, email validators — any system with finite states and deterministic rules
That second-to-last point is the one that stays with me. Two completely different formalisms — one that looks like algebra, one that looks like a machine with arrows — and they have exactly the same expressive power. Not "roughly the same" or "overlapping." Exactly the same. That kind of perfect equivalence between seemingly unrelated models is what makes this field feel like it's uncovering something fundamental, not just inventing notation.
Part 2 of 3 in "Theory of Computation"