https://github.com/AlexIoannides/pyspark-example-project/blob/13d6fb2f5fb45135499dbd1bc3f1bdac5b8451db/tests/test_etl_job.py#L64 You should use `data_transformed `not `expected_data` for actual transformation output.