How to avoid google to index cloudfront served content

0

Hello all, A customer whose cloudfront served pages are also being indexed and cached by google. So a google search for them provides 2 entries:

  1. https://www.domain.com/
  2. https://d12345.cloudfront.net/

Is there anyway to block the second cloudfront location from being cached? I've read on stackoverflow on using bots to avoid that: https://stackoverflow.com/questions/60123731/google-indexing-cloudfront-distribution AND Using DNS redirectionto avoid that: https://medium.com/tensult/how-to-do-site-redirection-using-aws-522a4002c645

Is there an easier way to do this?

AWS
Alok_T
질문됨 3년 전2430회 조회
1개 답변
0
수락된 답변

Yes, there are few things to be done to avoid this from happening:

  1. Origin needs to be locked to the outside internet, except Cloudfront IPs. It appears that Google is able to reach the Origin directly.
  2. Validate if the *.cloudfront.net URLs are a part of the publishing code anywhere. If so, replace them with the Cloudfront - fronted domain names.
  3. Put a robots.txt on Amazon S3. Something like, the www.domain.com/robots.txt

I would not recommend putting Origin redirection etc. coz there is no reason for anyone to hit the Origin directly. It is also an open backdoor for the malicious actors to get through.

Once you do #01, #02, #03; the indexing of the Origin URLs will gradually go away. Also I believe, they can also reach out to Google (after doing 1, 2, 3) to remove those URLs from their search feeds.

AWS
답변함 3년 전
profile picture
전문가
검토됨 22일 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠