- Newest
- Most votes
- Most comments
In AWS Glue ETL, you're correct that the primary options for converting a string column to a date and time format are typically timestamp or date. While it may seem unusual coming from a Python/Pandas background where datetime is common, AWS Glue follows conventions more aligned with Apache Spark.
For your specific use case where you need to represent date and time but don't require second-level precision, you have a few options:
-
Use the timestamp type: Despite your concerns, the timestamp type in Glue (which is based on Spark) is actually quite versatile and doesn't have the 2038 problem you might be thinking of. It can represent dates well beyond 2035.
-
Use a string format: You can convert your string to a formatted timestamp string that includes only the precision you need. This can be done using the "Format timestamp" transformation in Glue.
-
Use separate date and time columns: If you prefer, you could split your data into separate date and time columns.
If you decide to use the timestamp type, you can control the precision when you format it for output. For example, you can format a timestamp to show only up to minutes using a pattern like "yyyy-MM-dd HH:mm".
Remember, even if you use a timestamp internally, you have control over how it's formatted when you write the data out. You can use the "Format timestamp" transformation to convert a timestamp column to a formatted string based on your specific needs.
If you still prefer to work with a datetime-like object, you might consider creating a custom UDF (User Defined Function) in PySpark. However, for most use cases, the built-in timestamp type should be sufficient and is well-integrated with Glue's transformations and optimizations.
Ultimately, while it might feel different from the Python/Pandas approach, using timestamp in Glue/Spark is a standard practice and should meet your needs for representing date and time data, including cases where you need minute-level precision without seconds.
Sources
Convert a column to timestamp type - AWS Glue
Convert a timestamp column to a formatted string - AWS Glue
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 9 months ago
