1 Answer
- Newest
- Most votes
- Most comments
0
Hi , If I understood well you would like to achieve something like:
Name | Color | Size | Value | Time |
---|---|---|---|---|
Alpha | Blue | Large | 2 | 01-04-2022 09:58:30 |
Alpha | Blue | Large | 1 | 01-04-2022 08:58:30 |
Bravo | Red | Small | 5 | 01-04-2022 09:58:30 |
Is that right?
You will only have one duplicates for any original row and the duplicates value should be increased by 1.
you could look into something like:
import pyspark.sql.functions as F
from pyspark.sql.window import Window
windowSpec = Window.partitionBy("Name", "Color", "Size", "Value").orderBy("Time_col")
df3=df.withColumn("row_num",F.row_number().over(windowSpec))
df4=df3.withColumn("Value", F.when(df3.row_num==2,df3.Value+1).otherwise(df3.Value)).drop(df3.row_num)
my ouptput:
root
|-- Name: string (nullable = true)
|-- Color: string (nullable = true)
|-- Size: string (nullable = true)
|-- Value: long (nullable = true)
|-- Time_col: string (nullable = true)
+-----+-----+-----+-----+-------------------+
|Name |Color|Size |Value|Time_col |
+-----+-----+-----+-----+-------------------+
|Alpha|Blue |Large|1 |01-04-2022 08:58:30|
|Alpha|Blue |Large|2 |01-04-2022 09:58:30|
|Bravo|Red |Small|5 |01-04-2022 09:58:30|
+-----+-----+-----+-----+-------------------+
it should work also without the Time col.
hope this helps
Relevant content
- asked 6 months ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
I need to keep the duplicate row and just increment the "Value" column by a certain amount.
| Name | Color | Size | Value | | Alpha | Blue | Large | 1 | | Alpha | Blue | Large | 2 | | Brave | Red | Small | 5 |
would be my output
Also I tried to use a function to at least start iterating rows, and it doesn't seem to work. I am basically doing this, here is a small section of my code:
So all of this is inside the if clause. All of it executes but not my function.
@bfeeny, thank you for the clarification, I am going to update my answer please check if it helps