내용으로 건너뛰기

Titan Multimodal Embeddings with graphs

0

Hello,

I've taken a look over Bedrock available models to find a multimodal one. The only one that I found was: Amazon Titan Multimodal Embeddings G1 model. In documentation is written: The Multimodal Embeddings G1 model is used for use cases like searching images by text, by image for similarity, or by a combination of text and image.
I granted myself access to this model (Model access -> Access status : Access granted), but I did not found it in the Management Console playgrounds. Can this be used only programmatically, or is it in a region that I haven't seen ?
Can it be used with this use case:

  • add input one picture that contains a simple graph
  • add input details about the graph content
  • ask Titan questions about the graph (ex: which is the theory about the graph type, comment the graph content with business view)

Thank you,

1개 답변
0
수락된 답변

Amazon Titan Multimodal Embeddings G1 is designed for generating embeddings rather than for interactive question answering, which explains why you don't see it in the Bedrock playground. This model can only be used programmatically through the Bedrock API.

The Titan Multimodal Embeddings G1 model translates text inputs and images into numerical representations (embeddings) that capture their semantic meaning in the same semantic space. It's specifically designed for use cases like searching images by text, searching by image for similarity, or using a combination of text and image inputs.

Regarding your specific use case of analyzing graphs in images and answering questions about them:

The Titan Multimodal Embeddings G1 model is not suitable for this purpose. It doesn't provide direct question answering capabilities about image content. Instead, it generates vector embeddings that represent the semantic meaning of images and text, which are primarily useful for search and recommendation systems.

For your use case of analyzing a graph image and answering questions about it, you would need a generative multimodal model that can interpret visual content and generate text responses based on that interpretation. The Titan Multimodal Embeddings G1 model doesn't have this capability as it only produces embeddings, not text responses.

The model accepts text and image inputs and outputs vector embeddings of size 1,024 (default), 384, or 256 dimensions. These embeddings can be used for similarity matching but not for direct question answering about image content.
Sources
OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service | AWS Machine Learning Blog
Amazon Titan Multimodal Embeddings G1 model - Amazon Bedrock
Amazon Titan Multimodal Embeddings G1 - Amazon Bedrock

답변함 4달 전
전문가
검토됨 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

관련 콘텐츠