跳至內容

Titan Multimodal Embeddings with graphs

0

Hello,

I've taken a look over Bedrock available models to find a multimodal one. The only one that I found was: Amazon Titan Multimodal Embeddings G1 model. In documentation is written: The Multimodal Embeddings G1 model is used for use cases like searching images by text, by image for similarity, or by a combination of text and image.
I granted myself access to this model (Model access -> Access status : Access granted), but I did not found it in the Management Console playgrounds. Can this be used only programmatically, or is it in a region that I haven't seen ?
Can it be used with this use case:

  • add input one picture that contains a simple graph
  • add input details about the graph content
  • ask Titan questions about the graph (ex: which is the theory about the graph type, comment the graph content with business view)

Thank you,

已提問 4 個月前檢視次數 63 次
1 個回答
0
已接受的答案

Amazon Titan Multimodal Embeddings G1 is designed for generating embeddings rather than for interactive question answering, which explains why you don't see it in the Bedrock playground. This model can only be used programmatically through the Bedrock API.

The Titan Multimodal Embeddings G1 model translates text inputs and images into numerical representations (embeddings) that capture their semantic meaning in the same semantic space. It's specifically designed for use cases like searching images by text, searching by image for similarity, or using a combination of text and image inputs.

Regarding your specific use case of analyzing graphs in images and answering questions about them:

The Titan Multimodal Embeddings G1 model is not suitable for this purpose. It doesn't provide direct question answering capabilities about image content. Instead, it generates vector embeddings that represent the semantic meaning of images and text, which are primarily useful for search and recommendation systems.

For your use case of analyzing a graph image and answering questions about it, you would need a generative multimodal model that can interpret visual content and generate text responses based on that interpretation. The Titan Multimodal Embeddings G1 model doesn't have this capability as it only produces embeddings, not text responses.

The model accepts text and image inputs and outputs vector embeddings of size 1,024 (default), 384, or 256 dimensions. These embeddings can be used for similarity matching but not for direct question answering about image content.
Sources
OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service | AWS Machine Learning Blog
Amazon Titan Multimodal Embeddings G1 model - Amazon Bedrock
Amazon Titan Multimodal Embeddings G1 - Amazon Bedrock

已回答 4 個月前
專家
已審閱 4 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。