I'm trying to do a partial string match on a inner join



I'm trying to do an innor join on a substring in athena. The substring of the barcode (example: 3SOLGH3413663) is the partycode. And in the partycodelijst_nanette there is a corresponding name (partycode = OLGH klantnaam = OLGHEXAMPLE). So I want to join those databases. But I can't figure out how to use the like operator in this way. Could somebody help me?

SELECT "week"("a"."waarneming_waarndt") "week" , "a"."barcd" "barcode" , SUBSTRING("a"."barcd", 3, 4) AS "partycode" , "b"."first_klantnaam" "klantnaam"

FROM (("collo_bup_nrs"."nrs" "a"

LEFT JOIN "retaildcc"."partycodelijst_nanette" "b" ON ("b"."partycode" LIKE '%' + "a"."barcd" LIKE '%'+"b"."partycode"+'%')

asked 2 years ago291 views
2 Answers
Accepted Answer

As far as I understood, your use case is to join two tables where in the condition for join is only if a part of the column's data is equal to the data of another column in the other table. If this is what you wanted to achieve, then I think you can directly make use of the substr() string function. Please note that the the correct string function supported by Athena is SUBSTR() but not SUBSTRING(). As you might already know, Athena uses presto in the backend so the supported string functions can be found here.

If you already know the position of the partycode in your barcode, then I think you can directly make use of the SUBSTR() method instead of LIKE. By this I mean that, in the ON clause of your statement you can make use of SUBSTR() to get the partycode and then compare it directly with the partycode column data of the other table. Your ON clause would look something like this:

SUBSTR("a"."barcd", 3, 4) = "b"."partycode"

If you need, you can find the other presto functions here.

I hope that this will be able to help your use case.

profile pictureAWS
answered 2 years ago
reviewed 2 years ago

The first thing I'd like to mention and the most important is that you shouldn't do (A like B like C). You should do ((A like B) and (B like C)). I don't think stringing likes is a good practice.

The second thing I'd like to mention, and it may not be a problem at all since you are using aliases, is that maybe you should rename partycode when you use it here: SUBSTRING("a"."barcd", 3, 4) AS "partycode". Maybe call it partycode2 so it's unique. Just a thought.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions