Skip to content

Conversation

@lalitb
Copy link
Member

@lalitb lalitb commented Dec 5, 2025

Adds mutual TLS (mTLS) client certificate authentication to OTLP/OTAP receivers with parallel handshake processing and timeout protection.

What's New

1. mTLS Client Certificate Verification

  • Client CA loading from file, PEM, or system pool
  • Standards-compliant X.509 validation via WebPkiClientVerifier
  • Mandatory authentication when CA is configured

2. Parallel Handshake Processing

.buffer_unordered(MAX_CONCURRENT_HANDSHAKES)  // 64 concurrent handshakes
Prevents head-of-line blocking from slow/malicious clients.

3. Handshake timeout

  tokio::time::timeout(handshake_timeout, acceptor.accept(conn))
  Default 10s, configurable. DoS protection against slowloris-style attacks.

Configuration

Enable mTLS

  receivers:
    otlp:
      tls:
        cert_file: /path/to/server.crt
        key_file: /path/to/server.key
        client_ca_file: /path/to/client-ca.crt  # NEW
        handshake_timeout: "10s"  # NEW (default: 10s)

System CA Pool

  receivers:
    otlp:
      tls:
        include_system_ca_certs_pool: true  # NEW

What's Deferred

  • Client CA certificate reload (separate PR)
  • Client-side TLS for exporters (separate PR)
  • TLS version/cipher configuration

This PR adds mutual TLS (mTLS) support to the OTLP and OTAP gRPC receivers,
enabling client certificate authentication for enhanced security.

Features:
- Client CA certificate configuration (client_ca_file, client_ca_pem)
- Optional system CA certificate pool inclusion
- Configurable handshake timeout for DoS protection
- Concurrent TLS handshake support (up to 64 parallel handshakes)
- Certificate hot-reloading with configurable reload interval

Configuration options:
- tls.cert_file/cert_pem: Server certificate
- tls.key_file/key_pem: Server private key
- tls.client_ca_file/client_ca_pem: CA for client cert verification (enables mTLS)
- tls.include_system_ca_certs_pool: Include system CAs for client verification
- tls.handshake_timeout: Timeout for TLS handshake (default: 10s)
- tls.reload_interval: Interval for certificate reload checks (default: 5m)

Tests added:
- mTLS client certificate verification tests
- Handshake timeout enforcement test
- Concurrent handshake non-blocking test
- TLS reload integration test fixes
This PR adds mutual TLS (mTLS) support to the OTLP and OTAP gRPC receivers,
enabling client certificate authentication for enhanced security.

Features:
- Client CA certificate configuration (client_ca_file, client_ca_pem)
- Optional system CA certificate pool inclusion
- Configurable handshake timeout for DoS protection
- Concurrent TLS handshake support (up to 64 parallel handshakes)
- Certificate hot-reloading with configurable reload interval

Configuration options:
- tls.cert_file/cert_pem: Server certificate
- tls.key_file/key_pem: Server private key
- tls.client_ca_file/client_ca_pem: CA for client cert verification (enables mTLS)
- tls.include_system_ca_certs_pool: Include system CAs for client verification
- tls.handshake_timeout: Timeout for TLS handshake (default: 10s)
- tls.reload_interval: Interval for certificate reload checks (default: 5m)

Tests added:
- mTLS client certificate verification tests
- Handshake timeout enforcement test
- Concurrent handshake non-blocking test
- TLS reload integration test fixes
@lalitb lalitb requested a review from a team as a code owner December 5, 2025 23:42
@github-actions github-actions bot added the rust Pull requests that update Rust code label Dec 5, 2025
@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 78.03030% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.47%. Comparing base (dfca971) to head (69437a9).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1539      +/-   ##
==========================================
- Coverage   83.50%   83.47%   -0.04%     
==========================================
  Files         428      428              
  Lines      119410   119479      +69     
==========================================
+ Hits        99713    99733      +20     
- Misses      19163    19212      +49     
  Partials      534      534              
Components Coverage Δ
otap-dataflow 84.64% <78.03%> (-0.06%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.26% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 53.50% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arc-swap = { workspace = true, optional = true }
rustls = { workspace = true, optional = true }
rustls-pemfile = { workspace = true, optional = true }
rustls-pki-types = { workspace = true, optional = true }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustls-pemfile is no longer maintained, so replaced by rustls-pki-types, which is maintained by rustls project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can either revert that as part of this PR, or I can do it in a followup after this is merged. Thanks for updating the unmaintained dependency @lalitb!

match maybe_tls_acceptor {
Some(tls_acceptor) => {
let tls_stream = create_tls_stream(listener_stream, tls_acceptor);
let tls_stream = create_tls_stream(listener_stream, tls_acceptor, handshake_timeout);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handshake_timeout is added to protect against slow/malicious clients that could hold connection slots indefinitely by not completing the TLS handshake.

Without a timeout, a malicious client could:

  • Open a TCP connection
  • Never send TLS ClientHello (or send it very slowly)
  • Hold the connection slot forever, preventing legitimate clients from connecting
  • Repeat this to exhaust server resources (DoS attack)

/// - Allows concurrent handshakes to prevent slow clients from blocking others
/// - Limits memory overhead for pending handshake state
/// - May need adjustment based on actual workload characteristics
const MAX_CONCURRENT_HANDSHAKES: usize = 64;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not mTLS-specific - this is a general TLS hardening to prevent DoS via connection flooding. Limits concurrent handshakes to 64, applying backpressure when exceeded. Without this limit, an attacker could overwhelm the server by initiating thousands of handshakes simultaneously.

Comment on lines +457 to +465
// If system CAs were requested but no certs were loaded (and no user CA provided),
// warn the user that mTLS is effectively disabled or might not work as expected.
// Note: client_ca_pem is empty here.
if config.include_system_ca_certs_pool == Some(true) {
log::warn!(
"include_system_ca_certs_pool is true, but no CA certificates were loaded. mTLS will be disabled."
);
}
builder.with_no_client_auth()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to return an error here? It seems like it would be better from a security perspective not to fail open like this

Comment on lines +125 to +127
// TLS handshake failed - log and continue
log::debug!("TLS handshake failed: {}", e);
None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is important enough to log at warn level? Or would that just create too much noise in the event that there were many failed handshakes?

Comment on lines +75 to +77
if skip_if_no_openssl() {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests look great!

Could we avoid skipping them if we used something like rcgen instead of openssl? https://docs.rs/rcgen/latest/rcgen/

I'm thinking it'd nice to ensure that these tests could always run. That way we can rely on them to catch regressions w/out worrying about if openssl accidentally got removed from some test runner.

If this is feasible, we can make the change a future PR (I don't want to block this PR from getting merged for this)

Comment on lines +142 to +143
// Allow concurrent handshakes to prevent slow/malicious clients from blocking others
.buffer_unordered(MAX_CONCURRENT_HANDSHAKES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, how this works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants