MWAA: How to sandbox the tasks from each other?

0

Hi all!...brand new MWAA user here. I'm evaluating MWAA as a solution to provide integration/workflow services for multiple tenants...and 1 risk area I'm trying to mitigate is the '/tmp' local disk storage. Tasks for multiple tenants will commonly be writing data to the /tmp dir, and it would be a really big deal if somehow they conflicted and 1 tenant's data was exposed to another tenant. These workflows will be being written by a team of, shall we say, not super-strong developers...so conflicts are likely to happen eventually, and proper tempfile cleanup is likely to be missed. Are there any common strategies or infrastructure options to deal with this risk?

I was thinking of installing a task instance mutation cluster policy hook which would "rm -rf /tmp/*" before every task run, but of course that's sketchy and would very possibly break other tasks which are running concurrently on the same worker.

To automate cleanup (so worker disks don't eventually fill up) the only other thing I can think of is to provide a library with a get_temp_file() method which generates timestamped temp file names, so we can automatically delete files >a day old (or whatever)....but of course this relies on the team to diligently use our library method rather than the standard python method, or (god forbid) hardcoding their own filenames.

Any thoughts or insights are appreciated. Thanks!

redec
질문됨 2년 전589회 조회
1개 답변
0

Airflow is not multitenant. Anyone who can write a DAG can see any other DAG or environment information. See AIP-1 for how the Airflow community is working towards multitenancy and other security improvements.

As such the only true data isolation is multiple environments. A secondary alternative is a DAG factory where you don't allow users to write DAGs directly, but rather specify their DAGs via YAML or JSON and control exactly what they can do.

AWS
John_J
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인