跳至内容

Titan Multimodal Embeddings with graphs

0

Hello,

I've taken a look over Bedrock available models to find a multimodal one. The only one that I found was: Amazon Titan Multimodal Embeddings G1 model. In documentation is written: The Multimodal Embeddings G1 model is used for use cases like searching images by text, by image for similarity, or by a combination of text and image.
I granted myself access to this model (Model access -> Access status : Access granted), but I did not found it in the Management Console playgrounds. Can this be used only programmatically, or is it in a region that I haven't seen ?
Can it be used with this use case:

  • add input one picture that contains a simple graph
  • add input details about the graph content
  • ask Titan questions about the graph (ex: which is the theory about the graph type, comment the graph content with business view)

Thank you,

已提问 4 个月前56 查看次数
1 回答
0
已接受的回答

Amazon Titan Multimodal Embeddings G1 is designed for generating embeddings rather than for interactive question answering, which explains why you don't see it in the Bedrock playground. This model can only be used programmatically through the Bedrock API.

The Titan Multimodal Embeddings G1 model translates text inputs and images into numerical representations (embeddings) that capture their semantic meaning in the same semantic space. It's specifically designed for use cases like searching images by text, searching by image for similarity, or using a combination of text and image inputs.

Regarding your specific use case of analyzing graphs in images and answering questions about them:

The Titan Multimodal Embeddings G1 model is not suitable for this purpose. It doesn't provide direct question answering capabilities about image content. Instead, it generates vector embeddings that represent the semantic meaning of images and text, which are primarily useful for search and recommendation systems.

For your use case of analyzing a graph image and answering questions about it, you would need a generative multimodal model that can interpret visual content and generate text responses based on that interpretation. The Titan Multimodal Embeddings G1 model doesn't have this capability as it only produces embeddings, not text responses.

The model accepts text and image inputs and outputs vector embeddings of size 1,024 (default), 384, or 256 dimensions. These embeddings can be used for similarity matching but not for direct question answering about image content.
Sources
OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service | AWS Machine Learning Blog
Amazon Titan Multimodal Embeddings G1 model - Amazon Bedrock
Amazon Titan Multimodal Embeddings G1 - Amazon Bedrock

已回答 4 个月前
专家
已审核 4 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。