Skip to content

Always use system TLS defaults#706

Open
rhysparry wants to merge 12 commits into
mainfrom
rhys/eft-157/os-tls-config
Open

Always use system TLS defaults#706
rhysparry wants to merge 12 commits into
mainfrom
rhys/eft-157/os-tls-config

Conversation

@rhysparry

Copy link
Copy Markdown
Contributor

Background

While we were rolling out the change to adapt Halibut configuration to use system defaults we provided a mechanism for consumers of Halibut (i.e. Tentacle and Octopus) to override these defaults if necessary.

Now that we have confirmed that switching to system defaults has not caused issues, and has improved security (by allowing older TLS protocols to be disabled by the Operating System) we are removing the ability to configure these protocols directly.

Results

  • Removes the ISslConfigurationProvider and its provided implementations
  • Exposes the SslProtocols value used internally via SslConfiguration.SupportedProtocols. This value has been set to None, which "Allows the operating system to choose the best protocol to use, and to block protocols that are not secure." (see docs)
  • Updates usages of the ISslConfigurationProvider to access this static property.
  • Removes the ability to supply SSL configuration within the HalibutRuntimeBuilder
  • Resolves EFT-157

Before

  • TLS Protocols supported by Halibut could be overridden by the application.

After

  • TLS Protocols supported by Halibut are configured at the Operating System level.

How to review this PR

Quality ✔️

Pre-requisites

  • I have read How we use GitHub Issues for help deciding when and where it's appropriate to make an issue.
  • I have considered informing or consulting the right people, according to the ownership map.
  • I have considered appropriate testing for my change.

@rhysparry rhysparry force-pushed the rhys/eft-157/os-tls-config branch from cf44e2e to 3d39af8 Compare April 1, 2026 01:10
rhysparry added 10 commits April 8, 2026 09:38
…ollisions on net48

With SslProtocols.None on .NET Framework 4.8, Windows SChannel uses a per-process
TLS session cache keyed on certificate + hostname. Reusing the same static certs
(Octopus, TentacleListening, TentaclePolling) across tests in the same process causes
SChannel to incorrectly reuse session entries when connecting to localhost, producing
AuthenticationExceptions that cause ~202 test failures on net48.

Fix by generating fresh unique certificates per test in:
- LatestClientAndLatestServiceBuilder (Listening/Polling/PollingOverWebSocket factories)
- SecureClientFixture (SetUp + SecureClientClearsPoolWhenAllConnectionsCorrupt)
- ClientServerLifecycleTests (ListeningConfiguration/PollingConfiguration/ListeningThenPollingConfiguration)

Tests that explicitly call WithCertificates(...) are unaffected.
…cert tests

Add ServiceThumbprint property to ClientAndService so bad-certificate tests
can reference the service's actual cert thumbprint rather than the client's
configured trusted thumbprint (which differs in WithClientTrustingTheWrongCertificate
tests). Also fixes DiscoveryClientFixture and backwards compatibility builders.
The unique-cert-per-test fix (added to work around SChannel session-cache
collisions under SslProtocols.None on net48) was applied to all target
frameworks. On net80 this defeats SChannel TLS session resumption, so every
handshake becomes a slow full handshake. With a short receive timeout that
bleeds into the handshake (ssl.ReadTimeout), this made
ReceiveResponseTimeoutTests.WhenRpcExecutionIsWithinReceiveResponseTimeout_ButSubsequentDataIsDelayed
fail consistently on net80 Windows.

Move all of the conditional logic into a single TestCertificates helper that
generates fresh certs per test only on net48, and otherwise returns the shared
static certs (Octopus/TentacleListening/TentaclePolling) used on main. This
restores fast resumed handshakes on net80/Linux while keeping the net48
collision fix intact.
The client-only path (LatestClientBuilder.ForServiceConnectionType, used by
CreateClientOnlyTestCaseBuilder) still used the shared static Octopus cert on
net48. Under SslProtocols.None this intermittently triggers the same SChannel
session-cache collision as before, surfacing as an AuthenticationException
('a call to SSPI failed ... they do not possess a common algorithm'). That is
classified as UnknownError rather than IsNetworkError, causing
ExceptionReturnedByHalibutProxyExtensionMethodFixture.BecauseTheListeningTentacleIsNotResponding
to flake on net48 Windows.

Generate a fresh per-test client certificate on net48 (Polling/Listening) and
dispose its TmpDirectory with the client. PollingOverWebSocket keeps the Ssl
cert (bound via netsh). No-op on net80/Linux.
WhenPollingMultipleClientsWithOneService builds two client-only polling
clients plus one service that polls both. The service builder trusted a
single static thumbprint and dialled every listening client expecting it,
which only worked while all client-only builders shared the static Octopus
cert. Generating a unique client cert per test on net48 broke that contract,
failing RequestsShouldBeTakenFromAnyClient on every net48 build.

Carry each client's own certificate thumbprint from the client-only builder
through to the service builder so the polling service dials each listening
client with that client's thumbprint:
- expose LatestClient.ClientThumbprint (via IClient)
- WithListeningClient/WithListeningClients now take (uri, thumbprint)

Identical behaviour on net80/Linux (every client resolves to the shared
Octopus thumbprint); on net48 the service trusts each unique client cert.
The previous change had the combined builder dial each listening client
with that client's actual certificate thumbprint. That overrode the
service's configured serviceTrustsThumbprint, which the wrong-certificate
test helpers deliberately set to a non-matching value. As a result the
polling service accepted connections it should have rejected, breaking the
bad-certificate negative tests on every platform:
- BadCertificatesTests.FailWhenClientPresentsWrongCertificateToPollingService
- ConnectionObserverFixture.ObserveUnauthorizedPollingWebSocketConnections

Make the per-client thumbprint optional. The combined builder passes none, so
the polling loop falls back to serviceTrustsThumbprint (its previous, known-
good behaviour). Only the standalone multi-client test, which has two clients
with distinct certs and no wrong-certificate setup, passes explicit per-client
thumbprints via WithListeningClients.
Comment on lines +172 to +177
// pollingServer intentionally uses Certificates.TentacleListening rather than
// Certificates.Octopus (which server uses). This keeps the two certificates in distinct
// TLS roles within this process: Octopus is used only as a TLS server cert (by server),
// and TentacleListening is used only as a TLS client cert (here, and in RunListeningClient).
// Using the same cert in both roles would trigger an SChannel session-cache collision on
// Windows with SslProtocols.None (see declaration comment above).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment it is explained above.

// Trust the listening tentacle certificate for inbound connections.
// This runtime only accepts connections — it never makes outbound polling connections —
// keeping it in a pure TLS server role (see declaration comment above).
server.Trust(Certificates.TentacleListeningPublicThumbprint);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment here probs doesn't help


namespace Halibut.Tests.Support.BackwardsCompatibility
{
public class SchannelProbeBinaryRunner

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth explaining what this is for.

/// On other frameworks, returns the supplied <paramref name="staticCert"/> so static certificates are
/// shared (enabling TLS session resumption).
/// </summary>
public static CertAndThumbprint CertFor(CertAndThumbprint staticCert, TmpDirectory? tmpDirectory)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe maybe make CertAndThumbprint disposable and then if in net4.8 we create a new tmp dir. Build the cert in it and the disposable cleans up the temp dir. Then we don't need that TmpDir? thing to be exposed outside of this.

The return could also be a CertAndThumbprintHolder or GeneratedCertAndThumbprint if we want a new type that is disposable

or return a tuple <CertAndThumbprint, IDisposable>

{
public interface IAsyncClientSayHelloService
{
Task<string> SayHelloAsync(string name);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im am surprised the echo service isn't enough, why make this new service?

.WithHalibutTimeoutsAndLimits(new HalibutTimeoutsAndLimitsForTestsBuilder().Build())
.Build();

await using var _ = new AsyncDisposableAction(async () => await octopus.DisposeAsync());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed?

using Halibut.ServiceModel;
using Halibut.TestUtils.Contracts;

namespace Halibut.TestUtils.SampleProgram.SchannelProbe

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't the existing compat bins that run halibut work for the test?

/// </summary>
[TestFixture]
[NonParallelizable]
public class SchannelSessionCacheFixture : BaseTest

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these tests valuable do they get tested as part of the normal backwards compat tests?

@LukeButters LukeButters left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop the SchannelSessionCacheFixture, since it might not be testing anything of value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants