Following along with this blog post I'm attempting to debug/breakpoint my glue tasks running in VS Code using amazon/aws-glue-libs:glue_libs_3.0.0_image_01
.
I can get up to the point where the job executes and I can step through the code right up until the point I try and connect to RDS to fetch data. As soon as I do I get back
An error occurred while calling o47.getDynamicFrame.
: java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at com.amazonaws.services.glue.util.JDBCUtils.loadDriver(JDBCUtils.scala:214)
at com.amazonaws.services.glue.util.JDBCUtils.loadDriver$(JDBCUtils.scala:212)
at com.amazonaws.services.glue.util.MySQLUtils$.loadDriver(JDBCUtils.scala:490)
at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:746)
at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:1006)
at com.amazonaws.services.glue.JDBCDataSource.$anonfun$getJdbcJobBookmark$1(DataSource.scala:878)
at scala.collection.MapLike.getOrElse(MapLike.scala:131)
at scala.collection.MapLike.getOrElse$(MapLike.scala:129)
at scala.collection.AbstractMap.getOrElse(Map.scala:63)
at com.amazonaws.services.glue.JDBCDataSource.getJdbcJobBookmark(DataSource.scala:878)
at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:953)
at com.amazonaws.services.glue.DataSource.getDynamicFrame(DataSource.scala:99)
at com.amazonaws.services.glue.DataSource.getDynamicFrame$(DataSource.scala:99)
at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:714)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
I'm not sure how to solve this problem. I see in the blog post its mentioned that I can pass in extra libraries, however when I look in /home/glue_user/aws-glue-libs/jars
I can see a jar named mssql-jdbc-7.0.0.jre8.jar
so I'm not so sure thats the problem. I should mention this job runs without a problem when deployed to AWS.
I'm currently starting up the amazon/aws-glue-libs:glue_libs_3.0.0_image_01
using a very basic docker-compose file
version: "3.8"
services:
glue:
container_name: "glue-local-development"
image: amazon/aws-glue-libs:glue_libs_3.0.0_image_01
ports:
- "4040:4040"
- "18080:18080"
environment:
- DISABLE_SSL=true
- AWS_PROFILE=my_profile
volumes:
- ~/.aws:/home/glue_user/.aws
- ${PWD}:/home/glue_user/workspace/
stdin_open: true
Then connecting as per the blog post. Is there something else I have to do here?
I don't think I should have to manually load in the mysql jars?
I've been stuck at this point for awhile so would really appreciate any help or suggestions people have
Edit:
Interestingly when I attempt to run amazon/aws-glue-libs:glue_libs_2.0.0_image_01
it fails with a very similar but different error
: An error occurred while calling o49.getDynamicFrame.
: java.io.FileNotFoundException: (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at com.amazonaws.glue.jdbc.commons.CustomCertificateManager.importCustomJDBCCert(CustomCertificateManager.java:127)
at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:947)
at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:734)
at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:734)
at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:747)
at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:996)
at com.amazonaws.services.glue.JDBCDataSource$$anonfun$33.apply(DataSource.scala:868)
at com.amazonaws.services.glue.JDBCDataSource$$anonfun$33.apply(DataSource.scala:868)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at com.amazonaws.services.glue.JDBCDataSource.getJdbcJobBookmark(DataSource.scala:868)
at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:943)
at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:97)
at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:707)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Seems a bit odd that given that driver already exists in the container, and I'm trying to connect to a catalog via
glue_context.create_dynamic_frame.from_catalog
which works in production, that I'd have to go and change all my code just to debug a glue job in a docker container?