-
Bug
-
Resolution: Fixed
-
P3
-
11.0.21, 17.0.8.1, 17.0.8, 21
-
b26
-
Not verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8324691 | 21.0.4-oracle | Prajwal Kumaraswamy | P3 | Resolved | Fixed | b04 |
JDK-8330643 | 21.0.4 | Andrew Lu | P3 | Resolved | Fixed | b01 |
JDK-8324713 | 17.0.12-oracle | Prajwal Kumaraswamy | P3 | Resolved | Fixed | b04 |
JDK-8330651 | 17.0.12 | Andrew Lu | P3 | Resolved | Fixed | b01 |
JDK-8324711 | 11.0.24-oracle | Prajwal Kumaraswamy | P3 | Resolved | Fixed | b04 |
JDK-8330648 | 11.0.24 | Andrew Lu | P3 | Resolved | Fixed | b01 |
We recently upgrade to OpenJDK 17.0.8.1 and started observing crashes
resulting from double-frees via `gss_delete_sec_context()`.
Adding `-Djdk.spnego.cache=false` to our java invocations stops the
crashes. We believe this is due to a race condition that has long been
in `HttpURLConnection` that was mostly harmless before but which with
the addition of this commit:
16843770b5a 8303809: Dispose context in SPNEGO NegotiatorImpl
became a use-after-free / double-free bug, leading to crashes.
This happens with stock 17.0.8.1+1 from Adoptium.
Attached is a reproducer, Test.java, which you can run like this:
$ # credendials are kinit user@jgssbug.twosigma.com, password password
$ #
$ "$JAVA_HOME/bin/java" \
-Dsun.security.jgss.native=true \
-Dsun.security.jgss.lib=/usr/lib/libgssapi_krb5.so \
Test.java https://kerberos.club/tmp/jgssbug.txt
It doesn't crash every time, but it crashes often.
A typical Java stack trace upon crashing looks like:
j sun.security.jgss.wrapper.GSSLibStub.deleteContext(J)J+0 java.security.jgss@17.0.8.1
j sun.security.jgss.wrapper.NativeGSSContext.dispose()V+76 java.security.jgss@17.0.8.1
j sun.security.jgss.GSSContextImpl.dispose()V+16 java.security.jgss@17.0.8.1
j sun.net.www.protocol.http.spnego.NegotiatorImpl.disposeContext()V+11 java.security.jgss@17.0.8.1
J 4456 c1 sun.net.www.protocol.http.NegotiateAuthentication.disposeContext()V java.base@17.0.8.1 (24 bytes) @ 0x00007f909dc33834 [0x00007f909dc337c0+0x0000000
000000074]
j sun.net.www.protocol.http.HttpURLConnection.getInputStream0()Ljava/io/InputStream;+1923 java.base@17.0.8.1
j sun.net.www.protocol.http.HttpURLConnection.getInputStream()Ljava/io/InputStream;+62 java.base@17.0.8.1
j sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream()Ljava/io/InputStream;+4 java.base@17.0.8.1
j Test.lambda$main$0()V+16
j Test$$Lambda$207+0x00007f9020144208.run()V+0
j java.lang.Thread.run()V+11 java.base@17.0.8.1
v ~StubRoutines::call_stub
On the C side the crash typically happens in the allocator, typically in
`free()`:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fb6e1185535 in __GI_abort () at abort.c:79
#2 0x00007fb6e11dc648 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fb6e12e62a0 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007fb6e11e2d6a in malloc_printerr (str=str@entry=0x7fb6e12e444e "free(): invalid pointer") at malloc.c:5359
#4 0x00007fb6e11e459c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=<optimized out>) at malloc.c:4180
#5 0x00007fb582ba5b6d in krb5_free_address (context=context@entry=0x7fb5d80cbf60, val=0x7fb5d8000080) at kfree.c:62
#6 0x00007fb582b93d9e in krb5_auth_con_free (context=context@entry=0x7fb5d80cbf60, auth_context=0x7fb5d80778a0) at auth_con.c:60
#7 0x00007fb6440a3eec in krb5_gss_delete_sec_context (minor_status=0x7fb586868504, context_handle=0x7fb5d80e22d0, output_token=<optimized out>)
at delete_sec_context.c:77
#8 0x00007fb6440964b0 in gssint_delete_internal_sec_context (minor_status=0x7fb586868504, mech_type=<optimized out>,
internal_ctx=internal_ctx@entry=0x7fb5d80e22d0, output_token=0x0) at g_glue.c:606
#9 0x00007fb6440946dd in gss_delete_sec_context (minor_status=<optimized out>, context_handle=0x7fb5d80d2be8, output_token=<optimized out>)
at g_delete_sec_context.c:91
#10 0x00007fb6440b8dfe in spnego_gss_delete_sec_context (minor_status=<optimized out>, context_handle=0x7fb5d807bd70, output_token=<optimized out>)
at spnego_mech.c:2161
#11 0x00007fb6440964b0 in gssint_delete_internal_sec_context (minor_status=0x7fb586868504, mech_type=<optimized out>,
internal_ctx=internal_ctx@entry=0x7fb5d807bd70, output_token=0x0) at g_glue.c:606
#12 0x00007fb6440946dd in gss_delete_sec_context (minor_status=<optimized out>, context_handle=0x7fb586868508, output_token=<optimized out>)
at g_delete_sec_context.c:91
#13 0x00007fb6dc0559ff in Java_sun_security_jgss_wrapper_GSSLibStub_deleteContext () from /home/nico/tmp/jdk-17.0.8.1+1/lib/libj2gss.so
#14 0x00007fb6c89a653a in ?? ()
#15 0x0000000000000000 in ?? ()
We've seen other variations, but always they can be traced to
`HttpURLConnection`.
(We do maintain local patches to OpenJDK JGSS code as well. But that's
a story for another time. We do want to try again to upstream those
patches. Because we maintain such patches we developed an interim fix
where we use `synchronized (lock)` in the JGSS `dispose()` methods and
also `Reference.reachabilityFence(this)`, which worked to prevent the
crashes, but that was before we learned that `-Djdk.spnego.cache=false`
also works. In any case, stock, unpatched OpenJDK 17.0.8.1 evinces this
crasher.)
# Root cause
What 8303809 did was add calls to `dispose()` on `GSSContext` objects in
NegotiatorImpl.java in a new method called `disposeContext()` that all
HTTP authentication mechanisms now implement and which 8303809 also
added.
It took us a while to realize that the use of the `AuthCache` in
`HttpURLConnection` implies concurrent access to the cached contexts via
concurrent HTTP requests to the same origin.
Concurrent access to and disposal of those context objects then leads to
use-after-free / double-free errors when using sun.security.jgss.native.
There's really several issues going on here:
- The `dispose()` methods in 17.x in the JGSS native C GSS bindings
clases are dangerous, as are finalizers generally (and these
`dispose()` methods are called from finalizers).
This is fixed in later OpenJDK releases by making use of `Cleaner`,
so there's not much more to say about this here.
- The `AuthCache` in `src/java.base/share/classes/sun/net/www/protocol/http/`
implies concurrency, and it's not clear that concurrency is allowable
in all cases.
- In the case of RFC 4559 "Negotiate" and GSS concurrency is very much
not permitted. The GSS APIs can and should be thread-safe, but it is
still an error to invoke `initSecContext()` concurrently on the same
`GSSContext`!
GSS-API `initSecContext()`/`acceptSecContext()` are strictly
synchronous and serial.
- The `AuthCache` in `src/java.base/share/classes/sun/net/www/protocol/http/`
is likely a misfeature altogether, at least as far as sharing GSS-
like HTTP authentication mechanism contexts goes, and probably for
all other HTTP auth methods too.
It's not ok to take a reply token from a `WWW-Authenticate:` response
header and consume it together with a "context" created by a
unrelated request [to the same origin].
Every `WWW-Authenticate:`/`Authorization:` token must be matched to
the same [serial train of] request[s].
Thus if we have an RFC 4559 Negotiate exchange requiring 5 tokens, so
something like:
0: request..
1: 401 response w/ WWW-Authenticate: Negotiate
2: request w/ Authorization: Negotiate ... (continue needed)
3: 200 response w/ WWW-Authenticate: Negotiate ... (continue needed)
4: request w/ Authorization: Negotiate ... (continue needed)
5: 200 response w/ WWW-Authenticate: Negotiate ... (complete)
it would not be permissible for one of those reply tokens to be fed
to another request's GSSContext, nor would it be OK to have more than
one serial set of requests share the same GSSContext.
- The foregoing also applies to SCRAM, DIGEST-MD5, and just about any
and all HTTP authentication methods that I know. I believe that
there is no value in caching HTTP authentication mechanism contexts
on the client side.
HTTP Authentication mechanisms like SCRAM and DIGEST-MD5 are
three-message mechanisms when counting the initial challenge:
0: request...
1: 401 response w/ WWW-Authenticate: DIGEST-MD5 <challenge>
2: request w/ Authorization: DIGEST-MD5 ...
3: 200 w/ WWW-Authenticate: DIGEST-MD5 ... (complete)
There is no case in which caching and sharing a SCRAM or DIGEST-MD5
authentication context is useful because there is only one request/
response in which there is a context needed at all: between 2 and 3
inclusive in the above diagram.
Perhaps a single challenge can be shared with multiple subsequent
requests that will optimistically use SCRAM or DIGEST-MD5 knowing
that one response demanded it, but one should probably not do this,
though I could find nothing in RFC 5802 and 7677.
- Negotiate is really a bearer token system, albeit with the
possibility of using channel bindings. Meaning we never use the
resulting GSSContext to encrypt or sign any part of the HTTP requests
and responses authenticated with that context. Therefore there's
never any point in sharing a GSSContext in HTTP, not even once it's
fully-established -- there is no point in doing anything other than
disposing of a fully-established GSSContext in HTTP/Negotiate.
(Attempts to develop standards for encrypting/signing of HTTP
requests and responses at the HTTP layer rather than in TLS have
historically failed. If one attempt to standardize such a thing
succeeds, _then_ there will be something worth sharing in a cache
with care to prevent use-after-dispose.)
- Because of the preceding, in HTTP/Negotiate it is desirable to "free"
any GSSContext immediately after it is complete.
This is presumably how you came to add the `disposeContext()` method:
because those cached contexts otherwise sit there using lots of
memory for no reason; someone must have noticed. However, rather
than have an explicit `disposeContext()` these contexts should just
immediately be disposed of when complete or failed.
# Possible fixes
- Backport the use of `Cleaner` in the JGSS `dispose()` methods.
This would not be very satisfying because the `HttpURLConnection`
`AuthCache` would remain buggy, but at least the JVM wouldn't crash.
- Default `jdk.spnego.cache` to `false`.
- Rethink or remove the `AuthCache` in `HttpURLConnection`.
- Something else?
- backported by
-
JDK-8324691 HttpURLConnection cache issues leading to crashes in JGSS w/ native GSS introduced by 8303809
- Resolved
-
JDK-8324711 HttpURLConnection cache issues leading to crashes in JGSS w/ native GSS introduced by 8303809
- Resolved
-
JDK-8324713 HttpURLConnection cache issues leading to crashes in JGSS w/ native GSS introduced by 8303809
- Resolved
-
JDK-8330643 HttpURLConnection cache issues leading to crashes in JGSS w/ native GSS introduced by 8303809
- Resolved
-
JDK-8330648 HttpURLConnection cache issues leading to crashes in JGSS w/ native GSS introduced by 8303809
- Resolved
-
JDK-8330651 HttpURLConnection cache issues leading to crashes in JGSS w/ native GSS introduced by 8303809
- Resolved
- relates to
-
JDK-8303809 Dispose context in SPNEGO NegotiatorImpl
- Resolved
- links to
-
Commit openjdk/jdk11u-dev/64990670
-
Commit openjdk/jdk17u-dev/ed419ef0
-
Commit openjdk/jdk21u-dev/686399a5
-
Commit openjdk/jdk/f1a24f6d
-
Review openjdk/jdk11u-dev/2671
-
Review openjdk/jdk17u-dev/2418
-
Review openjdk/jdk21u-dev/508
-
Review openjdk/jdk/16347