In the nutshell. I work in very sensitive and security intense environment. We have decided to use ACL (instead of RBAC/ABAC) for authorization to achieve finer control over Storage account. For our Databricks service, we use only job cluster and job is run as a Service principle (orchestration of Jobs using Azure functions). When a job run as SP tries to write to delta for the first time, it works fine:
if diff_count > 0: (df_diff .write .partitionBy(partition) .format(self.trg_format.lower()) .mode(mode) .option("overwriteSchema", overwrite_flg) .save(self.trg_location))
But if I want to write for the second time, same dataset (in above case we first append and then close the interval doing something like scd-2). And already this write/append fails on:
Operation failed: "This request is not authorized to perform this operation using this permission.", 403, GET, https://[REDACTED].dfs.core.windows.net/lake?upn=false&resource=filesystem&maxResults=5000&directory=[REDACTED]&timeout=90&recursive=false, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. RequestId:82f621f0-301f-004f-60c7-9b5197000000 Time:2024-05-01T13:02:06.8819444Z"
Funny enough, the Service principle should be authorized with ACL r-x to all superordinate directories (including root) and has rwx to <dataset_dir>/Delta/ and all the subdirs like partition dirs and _delta_log and has also rwx to all the files beneath (we use set ACL recursive to <dataset_dir>). ACL example from portal
For the SP that has RBAC (Blob contributor) such a write to delta works fine, but for SP that is granted only with ACL as described above, it fails.
I was able to replicate the API call that Databricks does on the background while writing to Delta, and it is simple GET that already fails while run as SP that is authorized with ACL: Code snipped from ADB failure
If run with SP authorized with RBAC it works fine: Code snipped from ADB success
My original idea was, there might be some process of Databricks that overwrites something in the Delta log and it somehow invalidates the ACL of delta log files. But when I check, the SP has rwx for all dirs and files beneath the dataset directory (and r-x to all superordinate). So is this Microsoft bug, ACL is not designed for this kind of operations, or what is going on? Very grateful for any advice!
Update:After running MS diagnostics on Azure, recommendation is indeed something that I already have, but failure shows it is denied because RBAC is missing, so this operation seems not to coincide with security model of MSFT (where ACL should be also evaluated after RBAC), see: Details from Diagnostics