DNS-PERSIST-01 in Production: Architecture Decisions and Lessons from a Multi-Tenant SaaS Platform
Large-scale certificate automation in SaaS and IoT environments has always been fraught with DNS bottlenecks, brittle automation, and security trade-offs. When we migrated our multi-tenant SaaS platform to use DNS-PERSIST-01 for domain validation, the architecture changed in ways that surprised even our most seasoned DevOps engineers. This post walks through the real-world design, integration challenges, and operational lessons from adopting this new persistent DNS challenge model, going beyond the protocol mechanics already covered in our deep-dive on DNS-PERSIST-01.
Key Takeaways:
- See a realistic SaaS platform architecture using DNS-PERSIST-01 for ACME validation at scale
- Understand how persistent DNS TXT records change your automation and incident response model
- Learn about security hardening, delegation pitfalls, and real-world gotchas encountered in production
- Get actionable troubleshooting steps and operational tips for managing persistent DNS authorizations
Architecture Overview: Why DNS-PERSIST-01?
Our SaaS platform issues thousands of certificates per month, each bound to customer-owned domains. Previously, every issuance or renewal forced a DNS-01 challenge—requiring a dynamic TXT record update, propagation delay, and a short-lived DNS API credential with write access. This model had three pain points:
- Credential risk: Broad DNS privileges distributed to automation pipelines
- Operational drag: Frequent, time-sensitive DNS writes and cleanup jobs
- Customer friction: DNS API integration required on every tenant onboarding
DNS-PERSIST-01, as formalized in the IETF draft and
DNS-PERSIST-01, as formalized in the IETF draft and adopted by Let’s Encrypt (source), flips this on its head:
, flips this on its head:
- You set a persistent TXT record at
_validation-persist.<domain>once, delegating issuance rights to a specific CA and ACME account - No more per-issuance DNS writes—certificate automation runs without ongoing DNS changes or API credentials
- Revocation or onboarding is just a DNS update—no code change or secret rotation needed
Here’s a simplified architecture diagram for our production adoption:
- Customer Onboarding: User sets a TXT record at
_validation-persist.customer-domain.comwith our ACME account and CA ID - Certificate Service: Our backend requests certificates via ACME using DNS-PERSIST-01 challenge
- Central Monitoring: Automated checks verify the presence and correctness of all authorization records nightly
This approach sharply reduced DNS API exposure and simplified our pipeline. We no longer require tenants to grant us ongoing DNS write access, only a one-time setup.
| Validation Model | Operational Overhead | Credential Scope | Revocation Path |
|---|---|---|---|
| DNS-01 (classic) | Frequent DNS writes, cleanup, propagation waits | Broad, write access to DNS zone | Remove TXT, rotate DNS credentials |
| DNS-PERSIST-01 | One-time DNS setup, rare updates for rotation | Read-only after initial TXT is set | Remove or update TXT to disable |
For a broader ACME automation architecture comparison, see our review of IaC tools—the trade-offs are similar when it comes to credential management and automation complexity.
Further Considerations for DNS-PERSIST-01
While DNS-PERSIST-01 offers significant advantages, it is essential to consider the implications of DNS provider reliability and the potential for record drift. Regular audits of DNS records can help ensure that the persistent records remain intact and correctly configured. Additionally, organizations should evaluate their DNS provider’s capabilities to support this model effectively.
Implementation Steps: End-to-End Integration
Let’s walk through the actual deployment flow in our environment, using the official syntax and steps from the IETF draft and Let’s Encrypt’s documentation.
Step 1: ACME Account Setup
- Create your ACME account for the issuing CA (e.g., Let’s Encrypt)
- Note the account identifier issued by the CA (required for the TXT record)
Step 2: Publish the Persistent TXT Record
Your customer (or you, if managing DNS) must create a TXT record at _validation-persist.customer-domain.com:
_validation-persist.customer-domain.com. IN TXT "letsencrypt.org;acc=123456789abcdef;"
- letsencrypt.org: The CA identifier
- acc=123456789abcdef: The ACME account ID to authorize
Important: This record is static. You do not update it for each certificate issuance—only to rotate or revoke authorization.
Step 3: Initiate Certificate Request via ACME Client
Support for DNS-PERSIST-01 is emerging in tools like certbot and custom automation. The workflow is:
# Example (pseudo-code; actual client support may vary)
certbot --server https://acme-v02.api.letsencrypt.org/directory \
--challenge dns-persist-01 \
-d customer-domain.com
If your ACME client doesn’t yet support DNS-PERSIST-01, refer to the official Let’s Encrypt documentation for updates and compatible clients.
Step 4: Automated Monitoring and Drift Detection
With the record set once, it’s easy to lose track of its status. We run a nightly job:
dig TXT _validation-persist.customer-domain.com +short
# Should return: "letsencrypt.org;acc=123456789abcdef;"
Any mismatch triggers an alert—critical for incident response and compliance tracking.
Step 5: Revoking or Rotating Authorization
To revoke access, simply remove or update the TXT record:
# Remove authorization
# (delete the TXT at _validation-persist.customer-domain.com)
There’s no secret rotation or code change required—just a DNS update.
Security and Operational Tradeoffs in Practice
DNS-PERSIST-01 offloads most operational risk from the issuance pipeline to the initial DNS setup and ongoing record monitoring. Here’s what we learned:
Security Hardening Steps
- Principle of Least Privilege: No more long-lived DNS write credentials in CI/CD. After onboarding, only DNS read access is needed for monitoring.
- Delegated Authorization: Customers can authorize our CA/account for just their subdomain, instead of their whole DNS zone.
- Revocation Is Simpler: To disable our access, a customer deletes or modifies a DNS TXT. No need to rotate shared secrets or touch our infrastructure.
Operational Considerations
- Onboarding UX: The biggest friction is initial DNS setup—especially for non-technical customers. We provide cut-and-paste instructions and preflight checks.
- Record Drift: Persistent records are “set and forget”—until someone forgets. We’ve seen accidental deletions during unrelated DNS updates.
- Incident Response: If a compromise occurs, access can be revoked instantly by removing the TXT. But detection depends on monitoring.
This model fits tightly regulated or multi-tenant environments, but depends on DNS integrity. If your DNS is hijacked, all bets are off—same as with DNS-01.
For teams automating across hundreds of domains, the churn and API sprawl of DNS-01 is unmanageable. DNS-PERSIST-01 lets you scale with far less operational risk.
Edge Cases and Lessons Learned
Adopting DNS-PERSIST-01 in a real SaaS environment surfaced several practical issues not obvious from the spec:
Multi-Tenant Delegation
- Some customers wanted to delegate only a subdomain (like
app.customer-domain.com), but their DNS provider didn’t easily support per-subdomain TXT records. - Workaround: Documented provider-specific recipes and fallback to CNAME delegation when possible.
Provider Inconsistencies
- Certain DNS providers strip quotes or mangle TXT values—breaking the “CA;acc=” format required by the spec.
- Mitigation: Always verify with
digas seen above, not just via provider UI.
ACME Client Compatibility
- Not all clients support DNS-PERSIST-01 as of this writing. We contributed patches to add support in our preferred automation tool and track upstream progress closely.
Bulk Operations
- Batch onboarding is much simpler—set up records in advance, then issue dozens or hundreds of certificates without further DNS change windows.
- But: Revocation is all-or-nothing per TXT. If you need to remove authorization for a specific ACME account but not all, you must rotate the record.
For Kubernetes-heavy environments, see our GitOps automation guide for strategies to manage DNS and certificates declaratively.
Production Troubleshooting and Pro Tips
Common Errors
- Validation Fails: Usually the TXT record is missing, has an incorrect value, or hasn’t propagated. Always check with
digfrom a public resolver. - Stale Authorizations: If you rotate ACME accounts but forget to update the TXT, issuance will fail with a “not authorized” error.
- Provider Caching: Some managed DNS platforms cache TXT records for hours. Plan for up to 1hr TTL and communicate this to your customers.
Debugging Steps
# Check actual DNS record
dig TXT _validation-persist.customer-domain.com +short
# Should match expected CA and account:
# "letsencrypt.org;acc=123456789abcdef;"
- Use multiple global DNS resolvers (Google, Cloudflare, Quad9) to ensure propagation.
- Document and test every provider’s quirks—some will require escaping or formatting tweaks.
Pro Tips
- Automate nightly validation of all persistent TXT records—alert on drift, expiration, or unexpected changes.
- For incident response, maintain a playbook for quickly revoking authorization by deleting/updating the TXT.
- Maintain a list of all currently authorized ACME accounts per domain and rotate periodically as part of your security policy.
For advanced DNS troubleshooting at scale, see our approach to peer-to-peer connectivity troubleshooting, which shares similar operational patterns.
Conclusion & Next Steps
DNS-PERSIST-01 fundamentally changes both the risk model and the operational workflow for ACME-based certificate issuance in multi-tenant and automated environments. By shifting to persistent, account-bound DNS authorization, you dramatically reduce credential sprawl and unlock smoother, safer automation at scale. But success depends on robust monitoring, customer-friendly onboarding, and a deep understanding of your DNS provider’s quirks.
If you’re considering DNS-PERSIST-01, start with a single domain, automate drift detection, and plan your incident response workflow up front. For protocol details and a formal implementation guide, see our in-depth DNS-PERSIST-01 guide. For Kubernetes or cloud-native scenarios, combine this with declarative deployment strategies for end-to-end, auditable automation.
For further reading, monitor Let’s Encrypt’s official blog and the active IETF draft for evolving best practices and client compatibility updates.




