- Newest
- Most votes
- Most comments
To limit the scope of your SharePoint data source in Amazon Bedrock Knowledge Base to a specific set of directories, you can use metadata filtering with include/exclude regex patterns. For your specific case of targeting files in the "/Customer Success/Customers/" directory, you'll need to use the correct syntax for URL-style paths.
Here's a suggested approach:
-
In the "Metadata and Filtering" section when setting up your SharePoint data source, you can use an include pattern to target the specific directory structure.
-
Try using a regex pattern like this: ^/sites/[^/]+/Customer%20Success/Customers/.*
This pattern does the following:
- Starts with "/sites/" (typical for SharePoint URLs)
- Allows for any site name
- Targets the "Customer Success/Customers" path (note the URL encoding of spaces as %20)
- Matches any files or subdirectories within that path
-
If you want to further refine your selection, you can add additional include or exclude patterns. For example, to only include specific file types, you might add: .*.(docx|xlsx|pdf)$
-
Remember that SharePoint uses URL-encoded paths, so spaces should be represented as %20 in your regex patterns.
-
If you're still getting too many or too few results, you may need to adjust your regex pattern. Consider starting with a more permissive pattern and gradually restricting it as you confirm the correct file paths are being matched.
-
After making changes to your filtering patterns, re-sync your data source to apply the new configuration.
If you continue to have issues, you may want to examine the ingestion logs closely to understand exactly how the file paths are being represented. This can help you refine your regex patterns for more accurate filtering.
Remember that the exact syntax may vary depending on your SharePoint site structure and the specific implementation of the Bedrock Knowledge Base SharePoint connector. You might need to experiment with different patterns to find the one that works best for your particular setup.
Sources
Integrate Amazon Bedrock Knowledge Bases with Microsoft SharePoint as a data source | AWS Machine Learning Blog
The AI generated solution does not work. The suggested inclusion metadata "Try using a regex pattern like this: ^/sites/[^/]+/Customer%20Success/Customers/.*" results in 0 source files found when using the "File" type pattern. When using the "Page" type pattern, it returns every file in the library including those outside the specified files.
The source data for the answer only says "In the Metadata and Filtering section, optionally select any content types that you want to include or exclude." and does not provide any details.
The answer also does not mention the type of metadata the regex should be set to...
Relevant content
- asked 2 years ago

After doing a lot of testing I am beginning to think this is not possible.
Metadata type file: Only applies to the filename of a file in the document library. The path is not examined.
Metadata type page: Document libraries are not considered pages in the SharePoint world.
Metadata type event: Not applicable to Document libraries
Therefore, there is no type that can be used to restrict the folders considered in a document library.