Skip to content

On running untrusted code in AWS Lambda

Nick Zavaritsky edited this page Jun 1, 2022 · 5 revisions

The appeal of AWS Lambda

The core feature of luajit.me is running untrusted Lua code submitted via Web UI.

AWS Lambda is appealing as it would allow to drive costs down to 0 and enable auto scaling without any effort on my part. Last but not least, it supports both amd64 and aarch64!

The need for sandboxing

Sandboxing is a must. Even though it is presumably possible to isolate a Lambda function from the rest of the infrastructure, an exploit could leave it in a weird state. This is concerning as Lambdas are reused if requests come in a short succession.

For luajit.me use case it is sufficient to sandbox a single process. I.e. being able to filter syscalls would be sufficient.

Sandboxing in AWS Lambda

As it turns out, in AWS Lambda chroot, seccomp and ptrace are failing with EPERM. This makes sandboxing rather tricky.

Another option considered was QEMU.

As KVM is not available, we are running in dynamic binary translation mode. Unlike KVM-based virtualiser, it didn't undergo security audit and is not recommended. Probably still fine for a low-profile project.

QEMU can virtualise the whole system or a single user-space process. The later is utterly insecure therefore we went for a whole system virtualisation. Here's some timing data.

Baseline

+ time luajit m.lua
real	0m 0.04s
user	0m 0.00s
sys	0m 0.00s
+ time luajit m.lua 1000
real	0m 0.35s
user	0m 0.10s
sys	0m 0.00s

Mandelbrot program, rendering 100x100 and 1000x1000 bitmaps.

Running inside a VM

+ vmwrap --no-kvm sh -c 'time luajit m.lua > /dev/null'
real	0m 2.64s
user	0m 0.17s
sys	0m 1.34s
+ vmwrap --no-kvm sh -c 'time luajit m.lua 1000 > /dev/null'
real	0m 7.61s
user	0m 5.06s
sys	0m 1.46s

Same program. Using vmwrap as a simple tool to run a workload in a QEMU VM.

Time to init a VM

Approx. 30 seconds.

Conclusion

The only working approach to sandboxing untrusted code in AWS Lambda was QEMU. The overall slowdown compared to running unsandboxed was in the range of 22-66. This is absolutely unacceptable as instant response would've been replaced with staring at a spinning wheel. I strongly believe that being blazingly fast is the important feature of luajit.me.

Long VM init time is also challenging; restoring from a QEMU snapshot could help here.

Therefore I regret to discard AWS Lambda as the prospective option for luajit.me.

Resources

Clone this wiki locally