I am building a dating app (no web access, just mobile app) where I wish to store all user photos/videos in S3. This would mean I would have essentially two types of media:
-
Profile photos/videos - these can be generally "public" - probably don't need to precisely track who can access them but I don't want people scraping them on the Internet either, so tracking if someone downloaded 50,000+ photos in a day would be good ideally to likely ban them. Otherwise perhaps no need to track or block access.
-
Chat photos/videos - these must be "private" - only the two chat partners should be able to see these (or admin) as just like a Whatsapp chat or SMS, you can't have random other people seeing your private photos by guessing a URL.
For reference, I am handling all my user credentials and validation in my own user database with JWT (no current need for AWS user authentication otherwise).
SIMPLEST PHOTO/VIDEO HANDLING
When a user uploads a photo/video, it will be transmitted to my server, where I will resize/transcode/validate/rename it, and then I will transfer that finished file into S3 according to my folder/name convention for that user.
I would then put a link for the filename/path of that photo in that user's user profile in my database.
Thus when someone is given their user profile to view, they can use the link URL stored in there to access the photo from S3 via that filename/path. Or if I wanted temporary expiring S3 URLs, I could have them request one from my server, and generate/return it to them based on the file they want.
PROBLEMS
I'm not sure if this is the correct way to handle it - this would mean I am basically just running a mostly public bucket. Filenames will be somewhat randomized but someone could randomly access photos out of there any time.
I suppose this is not too unusual. If you go on a dating website, you can auto-crawl through profiles and see everyone's photo profiles. They are "public" too.
For minimal protection, I can create and cache time limited URLs upon request so my users only ever get 7 day valid URLs.
This might be a good enough solution for profile photos/videos. Or maybe unnecessary. It would require constant creation of new temporary URLs on an ongoing basis and checking of whether existing URLs are expired when people need to download profiles again. Maybe not worth it at all for the "public" media.
NEED TO PROTECT PRIVATE PHOTOS (CHAT)
While it is probably okay for profile photos to sit "publicly" (via auto-expiring or permanent link), if someone sends a photo to another user in a chat, certainly only those two users should be able to see it.
I wouldn't want someone to randomly guess a URL to their private chat photo they shared and access it.
Could I create specific rules for each file to say who is allowed to read it or not? Could I send a user token to S3 for setting/checking these accesses?
Or could I create an access account of some kind (IAM?) for every user of my app synced with my own account creation/validation process, then add somewhere in AWS which photos they are or aren't allowed to view, and on login to my system also grant them an AWS token for access?
ROUTE "PRIVATE" MEDIA DIRECTLY THROUGH MY SERVER
A simpler solution for keeping chat media private might be making only my server read the private/chat media from S3. My server could request it, then provide it to the correct users directly through their server validated websockets. This guarantees only the correct users can access these media files but requires two data transits (S3 > my server > user) rather than direct (S3 > user).
IDEAS?
I have never managed such a system before and I am not sure what a good convention for this type of issue would be.
Ideally, if I could somehow require some photos/videos to require my server's validation token to download from AWS, and I could track who is downloading what this way, and have certain photos/videos accessible to people with certain tokens, then I could protect the bucket from scrapers and also protect photos against inappropriate accesses.
I presume there is some way to do this, or there must be some reasonable way to manage this, but I'm not sure how.
Is there any solution? What would you do? Thanks for any help or ideas.
Thank you. That is incredibly helpful. My usage I imagine then might be:
For validating permission to private photo/videos, rather than having Lambda send another message back to another server, I am thinking an easy solution would be to write a text file ("meta" file) for each photo/video with the same name that stores the permission for it in it.
For example, "photo1.jpg" has "photo1.txt" written at same time/location when added to S3 by my server, and "photo1.txt" contains inside (JSON or whatever) saying "users A & B can access it" (private photo for just them). If public, something saying "all valid users can access" or no meta file at all.
This would save me having another database maintenance/access just for permissions, and I could also store data like height/width of photos in the meta files for reference. I am not sure if this is wise. It is just what came to mind.
For spammers I'd have to log somewhere (main database likely) per user requests and on every lambda request, if someone is added to the "block" list refuse to return the file.
Which means a message back to my server on each request. But just to log the request and check if user is permitted. If doing that, could also store per photo permissions in server. Either way.
Does that all make sense? Am I understanding correctly? Any further thoughts appreciated. Thanks again.
Short answer: Yes, that makes sense. Longer: The Lambda@Edge can do pretty much whatever you like but watch out for operations which are high latency - the function is being executed within the timeframe of the user request - there is a timeout of 5 seconds in any case. Using S3 is probably ok; DynamoDB may be better; and we also just launched KeyValueStore which might be helpful. Again, a chat with an AWS Solutions Architect will be very handy.