I have transcribed audio with two speakers, I chose the audio identification option but the json produced just as the tea without speakers and then a whole bunch of data on timings of each speaker but not associated to the text.
For me this renders is less than ideal.