Correct usage of S3 for dating app (photos and videos) - how to control access appropriately? (ie. For "private" chat media vs "public" profile media)

0

I am building a dating app (no web access, just mobile app) where I wish to store all user photos/videos in S3. This would mean I would have essentially two types of media:

  1. Profile photos/videos - these can be generally "public" - probably don't need to precisely track who can access them but I don't want people scraping them on the Internet either, so tracking if someone downloaded 50,000+ photos in a day would be good ideally to likely ban them. Otherwise perhaps no need to track or block access.

  2. Chat photos/videos - these must be "private" - only the two chat partners should be able to see these (or admin) as just like a Whatsapp chat or SMS, you can't have random other people seeing your private photos by guessing a URL.

For reference, I am handling all my user credentials and validation in my own user database with JWT (no current need for AWS user authentication otherwise).

SIMPLEST PHOTO/VIDEO HANDLING

When a user uploads a photo/video, it will be transmitted to my server, where I will resize/transcode/validate/rename it, and then I will transfer that finished file into S3 according to my folder/name convention for that user.

I would then put a link for the filename/path of that photo in that user's user profile in my database.

Thus when someone is given their user profile to view, they can use the link URL stored in there to access the photo from S3 via that filename/path. Or if I wanted temporary expiring S3 URLs, I could have them request one from my server, and generate/return it to them based on the file they want.

PROBLEMS

I'm not sure if this is the correct way to handle it - this would mean I am basically just running a mostly public bucket. Filenames will be somewhat randomized but someone could randomly access photos out of there any time.

I suppose this is not too unusual. If you go on a dating website, you can auto-crawl through profiles and see everyone's photo profiles. They are "public" too.

For minimal protection, I can create and cache time limited URLs upon request so my users only ever get 7 day valid URLs.

This might be a good enough solution for profile photos/videos. Or maybe unnecessary. It would require constant creation of new temporary URLs on an ongoing basis and checking of whether existing URLs are expired when people need to download profiles again. Maybe not worth it at all for the "public" media.

NEED TO PROTECT PRIVATE PHOTOS (CHAT)

While it is probably okay for profile photos to sit "publicly" (via auto-expiring or permanent link), if someone sends a photo to another user in a chat, certainly only those two users should be able to see it.

I wouldn't want someone to randomly guess a URL to their private chat photo they shared and access it.

Could I create specific rules for each file to say who is allowed to read it or not? Could I send a user token to S3 for setting/checking these accesses?

Or could I create an access account of some kind (IAM?) for every user of my app synced with my own account creation/validation process, then add somewhere in AWS which photos they are or aren't allowed to view, and on login to my system also grant them an AWS token for access?

ROUTE "PRIVATE" MEDIA DIRECTLY THROUGH MY SERVER

A simpler solution for keeping chat media private might be making only my server read the private/chat media from S3. My server could request it, then provide it to the correct users directly through their server validated websockets. This guarantees only the correct users can access these media files but requires two data transits (S3 > my server > user) rather than direct (S3 > user).

IDEAS?

I have never managed such a system before and I am not sure what a good convention for this type of issue would be.

Ideally, if I could somehow require some photos/videos to require my server's validation token to download from AWS, and I could track who is downloading what this way, and have certain photos/videos accessible to people with certain tokens, then I could protect the bucket from scrapers and also protect photos against inappropriate accesses.

I presume there is some way to do this, or there must be some reasonable way to manage this, but I'm not sure how.

Is there any solution? What would you do? Thanks for any help or ideas.

1 Answer
1
Accepted Answer

There are going to be many different solutions here and I'd encourage you to reach out to your local AWS Solutions Architect to discuss. But in the interests of answering the question here's what I would do - noting again that there are going to be other ways to do this.

If it were me, I would use CloudFront. Presumably access to the content is primarily going to be through your application and each request to the content is going to have some sort of authorisation header or JWT or something. Use Lambda@Edge on each request to determine if the header is authentic and that it has access to the content that is being requested and if so, allow it.

If you have public content (i.e. content that doesn't require authentication) then the Lambda@Edge function can simply approve access; or you might put those resources in a path which doesn't invoke the Lambda@Edge in order to save costs. I'd argue though that almost all access should be authenticated - you really wouldn't want random parties on the internet scraping content from the site.

That still won't stop someone with credentials doing the scraping - for that you could monitor at the CloudFront access logs - although that might be have too much lag involved. Another way would be to have the Lambda@Edge function above (or another) track the number of requests coming from a specific client (use the header again) and deny access if it looks "suspicious". You might also use WAF here and look at the number of requests from a specific IP address and block based on that.

As above: There are many different ways of doing this; these are just a few.

profile pictureAWS
EXPERT
answered 6 months ago
profile picture
EXPERT
reviewed a month ago
  • Thank you. That is incredibly helpful. My usage I imagine then might be:

    1. User signs in and is provided a JWT by my server.
    2. JWT is passed along in header of my user's HTTP request to Cloudfront.
    3. Cloudfront triggers a Edge-Lambda function that has public key for JWT hardcoded into it, validates JWT as my server's, and not expired.
    4. Edge-Lambda function then must (i) validate user has rights to given photo/video if private, and (ii) possibly block abusers/scrapers.
    5. Edge-Lambda returns photo/video to user as data presumably or temp link if permitted.
  • For validating permission to private photo/videos, rather than having Lambda send another message back to another server, I am thinking an easy solution would be to write a text file ("meta" file) for each photo/video with the same name that stores the permission for it in it.

    For example, "photo1.jpg" has "photo1.txt" written at same time/location when added to S3 by my server, and "photo1.txt" contains inside (JSON or whatever) saying "users A & B can access it" (private photo for just them). If public, something saying "all valid users can access" or no meta file at all.

  • This would save me having another database maintenance/access just for permissions, and I could also store data like height/width of photos in the meta files for reference. I am not sure if this is wise. It is just what came to mind.

    For spammers I'd have to log somewhere (main database likely) per user requests and on every lambda request, if someone is added to the "block" list refuse to return the file.

    Which means a message back to my server on each request. But just to log the request and check if user is permitted. If doing that, could also store per photo permissions in server. Either way.

  • Does that all make sense? Am I understanding correctly? Any further thoughts appreciated. Thanks again.

  • Short answer: Yes, that makes sense. Longer: The Lambda@Edge can do pretty much whatever you like but watch out for operations which are high latency - the function is being executed within the timeframe of the user request - there is a timeout of 5 seconds in any case. Using S3 is probably ok; DynamoDB may be better; and we also just launched KeyValueStore which might be helpful. Again, a chat with an AWS Solutions Architect will be very handy.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions