Regular Languages Stay Regular No Matter What You Do to Them
Published at March 16, 2026 ... views
One thing I keep finding satisfying about this field is how often a proof boils down to: "just build the machine."
You want to show that some class of languages has a nice property? Don't argue abstractly — construct the machine that does the thing, then show it works. That's what this post is about.
In the previous post, we introduced DFAs — the simplest machines that can check whether a string matches a pattern. Now we're going to do two things: get better at building DFAs from scratch, and then prove that the class of languages they recognize is closed under complementation, union, and intersection.
"Closed" means that if you start with regular languages and apply these operations, you always get another regular language. You can't escape. It's like a kitchen where every dish you make from the ingredients on hand is guaranteed to still be food — you can't accidentally create something inedible by combining valid recipes.

Building a DFA starts with assigning each state a job
The single most useful technique for building DFAs is to think of each state as having a role — a description of what it "means" to be in that state, given what you've seen so far in the input.
Let's build a DFA that recognizes the language "all strings with at least two a's" over the alphabet a, b.
The roles are obvious once you think about it:
| State | Role | Accept? |
|---|---|---|
| q0 | Haven't seen any a's yet | No |
| q1 | Seen exactly one a so far | No |
| q2 | Seen at least two a's | Yes |
Notice that b-transitions always loop back to the same state. That's because b's don't affect the count of a's — they don't change what we "know" about the input so far. And q2 is a "sink" in the positive sense: once you've seen two a's, you'll always have seen at least two a's, no matter what comes next.
This is like inspecting a property for a real estate deal. You have a checklist. State q0 means "haven't found any of the required features yet." State q1 means "found one, still looking for the second." State q2 means "both requirements met — this property qualifies." Once you've checked both boxes, nothing else on the walk-through can un-check them.

Flipping the question gives you a new DFA for free
Now, what about "all strings with at most two a's"? The roles change slightly:
| State | Role | Accept? |
|---|---|---|
| q0 | Seen zero a's so far | Yes |
| q1 | Seen exactly one a | Yes |
| q2 | Seen exactly two a's | Yes |
| q3 | Seen three or more a's | No |
q3 is a trap state — once you get there, you're stuck forever. It's the DFA equivalent of burning the dish. You've seen too many a's, and no amount of additional input can fix that.
The key insight: q3 traps bad inputs permanently. Every DFA character must go somewhere (the transition function is total), so you need a state to absorb inputs that have already violated the pattern. Trap states are that solution.
Tracking a substring requires remembering progress
Here's a more interesting example: build a DFA that recognizes strings containing baba as a substring. This is harder because you need to track how much of the target word you've built so far.
The roles:
| State | Role |
|---|---|
| q0 | No progress toward "baba" |
| q1 | Seen "b" — first character matched |
| q2 | Seen "ba" — two characters matched |
| q3 | Seen "bab" — three characters matched |
| q4 | Seen "baba" — complete match (accept) |
The tricky transitions are the ones where progress is "partially lost." From q2 (seen "ba"), if you see another a, you've seen "baa" — the pattern is broken, so you go back to q0. But from q3 (seen "bab"), if you see another b, you've seen "babb" — which ends in b, so you still have the start of a potential new baba. That's why q3 goes to q1 on b, not q0.
This is like learning a song on piano. You're trying to play a specific four-note sequence. If you hit the wrong note partway through, you don't always restart from scratch — sometimes the wrong note is actually the first note of a new attempt. Your "state" (how far into the sequence you are) needs to account for these overlaps.

If you want to see this overlap-recovery idea taken to the extreme — matching many patterns at once with a single automaton — Aho–Corasick is the canonical algorithm. Same trick, just generalized:
Regular languages are defined by what DFAs can recognize
Now for the big definition. A regular language is any language that can be recognized by some DFA.
Every DFA we've built so far proves that the corresponding language is regular. The language "at least two a's" is regular because we built a DFA for it. The language "contains baba" is regular because we built a DFA for it.
Later, we'll show that regular expressions describe exactly the same class of languages. But for now, "regular language = recognized by a DFA" is our working definition.
Closure means you can't escape the class
A class of languages is closed under an operation if applying that operation to languages in the class always produces another language in the class.
This is a powerful property. It means that regular languages are self-contained under these operations. You can combine them, negate them, intersect them — and you always stay within the same class.
Think of it like a music genre. If you take two jazz songs and blend them together, you still get jazz. If you take a jazz song and play it backwards, it's... still jazz (arguably). The genre is "closed" under these operations. Regular languages work the same way — they're closed under complement, union, intersection, concatenation, and Kleene star.

Complement: just flip the accept states
This is the simplest closure proof, and it's beautiful because the construction is so clean.
Theorem: If A is a regular language, then its complement Ā is also regular.
Proof idea: Take the DFA for A. Swap all accept states with reject states. Done.
Formally: given DFA M = (Q, Σ, δ, q₀, F), construct M' = (Q, Σ, δ, q₀, Q − F). Everything stays the same except the accept states.
Why it works: Every string either ends in an accept state (and gets accepted by M) or doesn't (and gets rejected by M). Flipping which states are "accept" perfectly inverts the outcome. If M accepted a string, M' rejects it, and vice versa. So L(M') = complement of L(M).
The formal proof is a bidirectional subset argument (showing L(M') ⊆ Ā and Ā ⊆ L(M')), but the intuition is immediate: same machine, opposite verdict.
Remember the "baba" substring DFA from earlier? Its complement — "strings where baba does NOT appear" — uses the exact same states and transitions. You just make q0, q1, q2, q3 the accept states and q4 the reject state. The machine that checks for a pattern is also, with one tweak, the machine that checks for its absence.
Union: run two machines in parallel
This one is more involved. We need to show that if A₁ and A₂ are regular languages, then A₁ ∪ A₂ is also regular.
The idea: Build a new DFA that simulates both original DFAs at the same time. At every step, it tracks where both machines would be if they were each processing the same input independently.
The Cartesian product construction
Given:
- M₁ = (Q₁, Σ, δ₁, q₁, F₁) recognizing A₁
- M₂ = (Q₂, Σ, δ₂, q₂, F₂) recognizing A₂
Construct M = (Q, Σ, δ, q₀, F) where:
| Component | Definition |
|---|---|
| States Q | Q₁ × Q₂ (all ordered pairs) |
| Start state | (q₁, q₂) |
| Accept states F | {(r,s) : r ∈ F₁ OR s ∈ F₂} |
| Transition δ((r,s), a) | (δ₁(r,a), δ₂(s,a)) |
The transition function just runs both machines independently. If you're in state (r, s) and read character a, the first component goes where M₁ would send r on a, and the second component goes where M₂ would send s on a.
The accept condition captures "or" — if either component ends in an accept state of its respective machine, the combined machine accepts.
A concrete example
Say M₁ recognizes "strings starting with a" and M₂ recognizes "the string b" (just the single character b and nothing else).
M₁ has states q0, q1, q2 (q1 is accept — seen a first; q2 is trap — started with b). M₂ has states p0, p1, p2 (p1 is accept — read exactly "b"; p2 is trap).
The cross-product machine has states like (q0,p0), (q1,p2), etc. But not all pairs are reachable from the start state. You only need to draw the ones you can actually reach.
Accept states: (q1,p2) because q1 ∈ F₁, and (q2,p1) because p1 ∈ F₂. The machine accepts "strings starting with a" OR "the string b" — which is exactly A₁ ∪ A₂.
Notice that the cross-product could have 3 × 3 = 9 states, but only 4 are reachable. The unreachable ones exist in theory but don't need to be drawn.
This is like having two food critics judge the same restaurant meal simultaneously. Each critic has their own scoring rubric (their own DFA). A dish "passes" if either critic gives it a thumbs up. The Cartesian product tracks both critics' opinions at every course, and the final verdict is "pass if at least one critic approved."

Intersection comes free from complement and union
Here's a neat trick. We've shown closure under complementation and union. Can we get intersection for free?
Yes — via De Morgan's law:
A₁ ∩ A₂ = complement of (complement(A₁) ∪ complement(A₂))
Since we know regular languages are closed under complement and union, we can chain the operations:
- Complement A₁ → regular (closure under complement)
- Complement A₂ → regular (closure under complement)
- Take their union → regular (closure under union)
- Complement the result → regular (closure under complement)
Each step preserves regularity, so the final result is regular. Intersection is closed.

You could also prove this directly by modifying the Cartesian product construction — just change the accept condition from "either accepts" to "both accept": F = {(r,s) : r ∈ F₁ AND s ∈ F₂}. That gives you intersection instead of union, using the same parallel-simulation idea.
The proof structure is always the same
Every closure proof follows this pattern:

Step 3 is the creative part — the construction. Step 4 is the rigorous part — the correctness proof. For complement, the construction is trivial (flip accept states) and the proof is a simple bidirectional argument. For union, the construction is the Cartesian product and the proof tracks how the parallel simulation preserves the accept conditions.
This "prove by constructing" approach is one of the hallmarks of computability theory. You don't just show something exists — you show how to build it. And that's deeply satisfying.
What's still missing
We've shown closure under complement, union, and intersection. But what about concatenation and Kleene star?
These are harder. The Cartesian product trick doesn't work for concatenation because you'd need to know where to split the input string between the two machines. A DFA reads left to right and can't go back — so it can't try different split points.
To handle concatenation and Kleene star, we'll need a new tool: nondeterminism. That's the topic of the next post.
A few things I'm taking away
- The most effective strategy for building a DFA is assigning each state a role — a plain-language description of what that state "knows" about the input so far
- Trap states absorb inputs that have permanently violated the pattern — they're necessary because every state must have a transition for every character
- Substring matching in a DFA requires tracking partial progress and handling overlapping patterns carefully
- A regular language is defined as any language that some DFA can recognize — this is our formal starting point
- Closure under an operation means you can't escape the class by applying that operation — regular languages are remarkably self-contained
- Complementing a DFA is trivial: same machine, flip the accept states — the proof is a clean bidirectional subset argument
- The Cartesian product construction proves closure under union by running two DFAs in parallel and accepting if either would accept
- Intersection comes free from De Morgan's law (or directly, by changing the Cartesian product's accept condition from "or" to "and")
- Every closure proof follows the same structure: assume regularity → DFAs exist → construct new DFA → prove correctness
- Concatenation and Kleene star can't be handled by the Cartesian product approach — they require nondeterminism, which is next
That last point is what I'm most looking forward to. We've been working entirely in the deterministic world — every state has exactly one transition per character, no ambiguity, no choices. Nondeterminism breaks that constraint, and it turns out to be the key that unlocks the remaining closure properties without making the proofs painfully complex.
Part 3 of 3 in "Theory of Computation"