By using AWS re:Post, you agree to the Terms of Use
/AWS Glue DataBrew/

Questions tagged with AWS Glue DataBrew

Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

AWS GLUE Image certificate related issue

Hello Team , I have created Docker compose file mentioned below : ``` version: "2" services: spark: image: glue/spark:latest container_name: spark ** build: ./spark** hostname: spark ports: - "8888:8888" - "4040:4040" entrypoint : sh command : -c "/home/glue_user/jupyter/jupyter_start.sh" volumes: - ../app/territoryhub-replication:/home/glue_user/workspace/jupyter_workspace ``` Docker file which is getting called in build section is as follows : ``` FROM amazon/aws-glue-libs:glue_libs_3.0.0_image_01 USER root RUN mkdir -p /root/.aws RUN echo "[default]\nregion=us-east-1" >> /root/.aws/config ``` My Docker is getting started but failing (sharing logs ) Starting Jupyter with SSL /home/glue_user/jupyter/jupyter_start.sh: line 4: livy-server: command not found [I 2022-05-12 15:41:33.032 ServerApp] jupyterlab | extension was successfully linked. [I 2022-05-12 15:41:33.044 ServerApp] nbclassic | extension was successfully linked. [I 2022-05-12 15:41:33.046 ServerApp] Writing Jupyter server cookie secret to /root/.local/share/jupyter/runtime/jupyter_cookie_secret [I 2022-05-12 15:41:33.541 ServerApp] sparkmagic | extension was found and enabled by notebook_shim. Consider moving the extension to Jupyter Server's extension paths. [I 2022-05-12 15:41:33.541 ServerApp] sparkmagic | extension was successfully linked. [I 2022-05-12 15:41:33.541 ServerApp] notebook_shim | extension was successfully linked. [W 2022-05-12 15:41:33.556 ServerApp] All authentication is disabled. Anyone who can connect to this server will be able to run code. [I 2022-05-12 15:41:33.558 ServerApp] notebook_shim | extension was successfully loaded. [I 2022-05-12 15:41:33.560 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.7/site-packages/jupyterlab [I 2022-05-12 15:41:33.560 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab [I 2022-05-12 15:41:33.565 ServerApp] jupyterlab | extension was successfully loaded. [I 2022-05-12 15:41:33.569 ServerApp] nbclassic | extension was successfully loaded. [I 2022-05-12 15:41:33.569 ServerApp] sparkmagic extension enabled! [I 2022-05-12 15:41:33.569 ServerApp] sparkmagic | extension was successfully loaded. Traceback (most recent call last): File "/usr/local/bin/jupyter-lab", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.7/site-packages/jupyter_server/extension/application.py", line 584, in launch_instance serverapp = cls.initialize_server(argv=args) File "/usr/local/lib/python3.7/site-packages/jupyter_server/extension/application.py", line 557, in initialize_server find_extensions=find_extensions, File "/usr/local/lib/python3.7/site-packages/traitlets/config/application.py", line 88, in inner return method(app, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/jupyter_server/serverapp.py", line 2421, in initialize self.init_httpserver() File "/usr/local/lib/python3.7/site-packages/jupyter_server/serverapp.py", line 2251, in init_httpserver max_buffer_size=self.max_buffer_size, File "/usr/local/lib64/python3.7/site-packages/tornado/util.py", line 288, in __new__ instance.initialize(*args, **init_kwargs) File "/usr/local/lib64/python3.7/site-packages/tornado/httpserver.py", line 191, in initialize read_chunk_size=chunk_size, File "/usr/local/lib64/python3.7/site-packages/tornado/tcpserver.py", line 134, in __init__ 'certfile "%s" does not exist' % self.ssl_options["certfile"] **ValueError: certfile "/home/glue_user/.certs/my_key_store.pem" does not exist** Please help in resolving this asap Many Thanks
1
answers
0
votes
6
views
asked 9 days ago

Data Quality Framework in AWS

I am trying to implement a data quality framework for an application which ingests data from various systems(batch, near real time, real time). Few items that I want to highlight here are: * The data pipelines widely vary and ingest very high volumes of data. They are developed using spark,python,emr clusters, kafka, Kinesis stream * Any new system that we onboard in the framework, it should be easily able to include the data quality checks with minimal coding. so some sort of metadata framework might help for ex: storing the business rules in dynamodb which can automatically run check different feeders/new data pipeline created. * Our tech stack includes AWS,Python,Spark, Java, so kindly advise related services(AWS Databrew, PyDeequ, Greatexpectations libraries, various lambda event driven services are some I want to focus) * I am also looking for some sort of audit, balance and control mechanism. Auditing the source data, balancing # of records between 2 points and have some automated mechanism to remediate(control) them. * I am looking for testing frameworks for the different data pipelines. Also for data profiling, kindly advise tools/libraries, Aws data brew, Pandas are some I am exploring. I know there wont be one specific solution, and hence appreciate all and any different ideas. A flow diagram with Audit, balance and control with automated data validation and testing mechanism for data pipelines can be very helpful.
1
answers
0
votes
11
views
asked 11 days ago

Aws glue connects to RDS PostgreSQL

I failed to connect to the RDS postgreSQL database using glue, and failed to return the following message: Check that your connection definition references your JDBC database with correct URL syntax, username, and password. The authentication type 10 is not supported. Check that you have configured the pg_hba.conf file to include the client's IP address or subnet, and that it is using an authentication scheme supported by the driver. Exiting with error code 30 My connection steps: type JDBC JDBC URL jdbc:postgresql://xxx.xxx.us-west-2.rds.amazonaws.com:5432/xxx VPC ID vpc-xxx Subnet xxx Security group sg-xxx SSL connection required false I have checked the pg database for the above configuration and there should be no problem. My glue IAM permissions: { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "rds:*", "Resource": "*", "Condition": { "BoolIfExists": { "aws:MultiFactorAuthPresent": "true" } } }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "s3:GetAccessPoint", "ec2:DescribeAddresses", "ec2:DescribeByoipCidrs", "s3:GetBucketPolicy", "glue:*", "kms:*", "s3:GetAccessPointPolicyStatus", "s3:GetBucketPolicyStatus", "s3:GetBucketPublicAccessBlock", "s3:GetMultiRegionAccessPointPolicyStatus", "rds-db:*", "s3:GetMultiRegionAccessPointPolicy", "s3:ListAccessPoints", "s3:GetMultiRegionAccessPoint", "rds-data:*", "s3:ListMultiRegionAccessPoints", "s3:GetBucketAcl", "s3:DescribeMultiRegionAccessPointOperation", "s3:PutObject", "s3:GetObject", "s3:GetAccountPublicAccessBlock", "s3:ListAllMyBuckets", "ec2:DescribeVpcs", "ec2:DescribeVpcEndpoints", "s3:GetBucketLocation", "s3:GetAccessPointPolicy" ], "Resource": "*" } ] } The error message gives three possible directions: 1 Check that your connection definition references your JDBC database with correct URL syntax, username, and password. I don't think this should be wrong 2 The authentication type 10 is not supported. I'm not exactly sure what this error means, and all my Google queries are to modify pg_hba.conf. However, RDS does not provide modifications to this file. 3 Check that you have configured the pg_hba.conf file to include the client's IP address or subnet, and that it is using an authentication scheme supported by the driver. I don't understand what this mistake means
1
answers
0
votes
11
views
asked 2 months ago
  • 1
  • 90 / page