By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon EMR Serverless

Sort by most recent
  • 1
  • 2
  • 12 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

EMR Serverless 6.6.0 Python SWIG Lib dependency

I'm trying to create an isolated Python virtual environment to package Python libraries necessary for a Pyspark job. I was successful to make it work by simply following these steps https://github.com/aws-samples/emr-serverless-samples/tree/main/examples/pyspark/dependencies However, there is one Python library dependency (SWIG) failing to install because it requires additional libs to be installed such as gcc gcc-c++ python3-devel. LIB: https://github.com/51Degrees/Device-Detection/tree/master/python So I added RUN yum install -y gcc gcc-c++ python3-devel to the Dockerfile image https://github.com/aws-samples/emr-serverless-samples/blob/main/examples/pyspark/dependencies/Dockerfile and it installed sucessfully and then I packaged the virtual env. However, the emr job fails with that lib python modules not being found, which makes me think that python3-devel is not present in EMR Serverless 6.6.0 Since I don't have control over the serverless environment, is any way around this? Or am I missing something? stderr ``` An error occurred while calling o198.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 0.0 failed 4 times, most recent failure: Lost task 19.3 in stage 0.0 (TID 89) ([2600:1f18:153d:6601:bfcc:6ff:50bc:240e] executor 7): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/hadoop/environment/lib64/python3.7/site-packages/FiftyOneDegrees/fiftyone_degrees_mobile_detector_v3_wrapper.py", line 15, in swig_import_helper return importlib.import_module(mname) File "/usr/lib64/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'FiftyOneDegrees._fiftyone_degrees_mobile_detector_v3_wrapper' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 619, in main process() File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 611, in process serializer.dump_stream(out_iter, outfile) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 259, in dump_stream vs = list(itertools.islice(iterator, batch)) File "./jobs.zip/jobs/parsed_events_orc_processor/etl.py", line 360, in enrich_events event['device'] = calculate_device_data(event) File "./jobs.zip/jobs/parsed_events_orc_processor/etl.py", line 152, in calculate_device_data device_data = mobile_detector.match(user_agent) File "/home/hadoop/environment/lib64/python3.7/site-packages/fiftyone_degrees/mobile_detector/__init__.py", line 225, in match else settings.DETECTION_METHOD) File "/home/hadoop/environment/lib64/python3.7/site-packages/fiftyone_degrees/mobile_detector/__init__.py", line 63, in instance cls._INSTANCES[method] = cls._METHODS[method]() File "/home/hadoop/environment/lib64/python3.7/site-packages/fiftyone_degrees/mobile_detector/__init__.py", line 98, in __init__ from FiftyOneDegrees import fiftyone_degrees_mobile_detector_v3_wrapper File "/home/hadoop/environment/lib64/python3.7/site-packages/FiftyOneDegrees/fiftyone_degrees_mobile_detector_v3_wrapper.py", line 18, in <module> _fiftyone_degrees_mobile_detector_v3_wrapper = swig_import_helper() File "/home/hadoop/environment/lib64/python3.7/site-packages/FiftyOneDegrees/fiftyone_degrees_mobile_detector_v3_wrapper.py", line 17, in swig_import_helper return importlib.import_module('_fiftyone_degrees_mobile_detector_v3_wrapper') File "/usr/lib64/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) ModuleNotFoundError: No module named '_fiftyone_degrees_mobile_detector_v3_wrapper' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:545) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:703) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:685) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:498) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:35) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hasNext(Unknown Source) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:954) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:142) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2559) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2508) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2507) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2507) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1149) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1149) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1149) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2747) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2689) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2678) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.sql.execution.adaptive.AdaptiveExecutor.checkNoFailures(AdaptiveExecutor.scala:154) at org.apache.spark.sql.execution.adaptive.AdaptiveExecutor.doRun(AdaptiveExecutor.scala:88) at org.apache.spark.sql.execution.adaptive.AdaptiveExecutor.tryRunningAndGetFuture(AdaptiveExecutor.scala:66) at org.apache.spark.sql.execution.adaptive.AdaptiveExecutor.execute(AdaptiveExecutor.scala:57) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:241) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:240) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:509) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:471) at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:3053) at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:3052) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3770) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3768) at org.apache.spark.sql.Dataset.count(Dataset.scala:3052) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/hadoop/environment/lib64/python3.7/site-packages/FiftyOneDegrees/fiftyone_degrees_mobile_detector_v3_wrapper.py", line 15, in swig_import_helper return importlib.import_module(mname) File "/usr/lib64/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'FiftyOneDegrees._fiftyone_degrees_mobile_detector_v3_wrapper' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 619, in main process() File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 611, in process serializer.dump_stream(out_iter, outfile) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 259, in dump_stream vs = list(itertools.islice(iterator, batch)) File "./jobs.zip/jobs/parsed_events_orc_processor/etl.py", line 360, in enrich_events event['device'] = calculate_device_data(event) File "./jobs.zip/jobs/parsed_events_orc_processor/etl.py", line 152, in calculate_device_data device_data = mobile_detector.match(user_agent) File "/home/hadoop/environment/lib64/python3.7/site-packages/fiftyone_degrees/mobile_detector/__init__.py", line 225, in match else settings.DETECTION_METHOD) File "/home/hadoop/environment/lib64/python3.7/site-packages/fiftyone_degrees/mobile_detector/__init__.py", line 63, in instance cls._INSTANCES[method] = cls._METHODS[method]() File "/home/hadoop/environment/lib64/python3.7/site-packages/fiftyone_degrees/mobile_detector/__init__.py", line 98, in __init__ from FiftyOneDegrees import fiftyone_degrees_mobile_detector_v3_wrapper File "/home/hadoop/environment/lib64/python3.7/site-packages/FiftyOneDegrees/fiftyone_degrees_mobile_detector_v3_wrapper.py", line 18, in <module> _fiftyone_degrees_mobile_detector_v3_wrapper = swig_import_helper() File "/home/hadoop/environment/lib64/python3.7/site-packages/FiftyOneDegrees/fiftyone_degrees_mobile_detector_v3_wrapper.py", line 17, in swig_import_helper return importlib.import_module('_fiftyone_degrees_mobile_detector_v3_wrapper') File "/usr/lib64/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) ModuleNotFoundError: No module named '_fiftyone_degrees_mobile_detector_v3_wrapper' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:545) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:703) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:685) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:498) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution ```
2
answers
0
votes
169
views
asked 3 months ago
  • 1
  • 2
  • 12 / page