- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
@RobertoH,
if you are reading from a relational database you can use the connection option to push down a query using the option sampleQuery as described here.
hope this helps,
The toDf() has a show method that will show only a certain number of rows. You can use that if you want to see a subset of the data. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html
Hi. I have a table with more than 300 millions rows that I need only retrieve records after a date but if I use toDf() it tries to get all records.
There's also a filter method mentioned in the documentation that you can use instead of count. Try that. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-filter
Thanks jschwar313 but it don't make a filter at sql server level. I thinking that is not really possibly.
Contenuto pertinente
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata un anno fa
Thanks. I tried it but didn't work. It makes an select * from table.
DataSource0 = glueContext.create_dynamic_frame.from_options(connection_type = "postgresql", connection_options = {"url": "jdbc:postgresql://ip:5432/db" ,"user": "xxx", "password":"xxx" ,"dbtable": "pushed_checkpoints", "query":"SELECT * FROM pushed_checkpoints where pushed_at>'2022-12-01'"} )