How to use sagemaker-pyspark in batch inference

Question

am trying to execute the code below

```
ENDPOINT_NAME = "my-endpoint"
from sagemaker_pyspark import SageMakerModel
from sagemaker_pyspark import EndpointCreationPolicy
from sagemaker_pyspark.transformation.serializers import ProtobufRequestRowSerializer
from sagemaker_pyspark.transformation.deserializers import ProtobufResponseRowDeserializer
from pyspark.sql.types import StructType, StructField, MapType, StringType, IntegerType, ArrayType, FloatType

attachedModel = SageMakerModel.fromEndpoint(
    endpointName = ENDPOINT_NAME,
    requestRowSerializer=ProtobufRequestRowSerializer(
        featuresColumnName = "col1"
    ),
    responseRowDeserializer=ProtobufResponseRowDeserializer(schema=StructType([
        StructField('prediction', MapType(StringType(), FloatType()))
    ]))
)

data=SageMakerModel.transform(attachedModel, df['col1'])
```
I keep getting the below error though

```
py4j.protocol.Py4JError: An error occurred while calling o58.__getstate__. Trace:
py4j.Py4JException: Method __getstate__([]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:274)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
```

any ideas ?

Answer

Hi there,

it would seem to me that there maybe an discrepancy in when one of your outputs is returned, they should only be returned after an if else statement and defined within the if else statement. [Here](https://stackoverflow.com/questions/66711209/pandas-udf-method-getstate-does-not-exist-error) is a resource that seems to address a problem similar to yours, perhaps applying a similar strategy will produce significant results.

Hopefully this will provide some insight into your problem.

Regards NN

How to use sagemaker-pyspark in batch inference

Conteúdo relevante