Browse through the questions and answers listed below or filter and sort to narrow down your results.
Instability issue on my wiki
Hi! I have a doubt. I'm creating a wiki with MediaWiki software and I'm hosting it on AWS but my wiki is always going down and showing connection error: ERR_CONNECTION_TIMED_OUT I would like to know what will be the ideal configurations on the server to run the MediaWiki stably. I saw that my CPU is overloaded sometimes. I was using t3.micro instance. What AWS server requirements to run MediaWiki well would you recommend?
Network bandwidth performance
We have deployed 2 ec2 instances in same availability zone i.e. r5.2xlarge as per this instance features it has capability of up 10 GB network performance I'm confuse about its bandwidth calculation. Can someone give clarity on below points. 1. When it says up to 10 GB network performance is that mean it will give constant speed of 10GBPS.? 2. If I want to check maximum network bandwidth limit between 2 ec2 instances (Linux) how we can measure? Thanks in advance as this may help us for long term.
AWS Backup - is hourly backups an effective approach
hello, I was reading through https://www.wellarchitectedlabs.com/reliability/200_labs/200_testing_backup_and_restore_of_data/2_configure_backup_plan/ and point 7 essentially says that hourly backups have negative performance impact on systems. My question is to get clarification on hourly backups, are they good or bad? Has someone applied hourly backups in large scale and seen poor performance of underlying systems.
Elasticache Redis 6 multi-threading enabled?
My organization is using the Redis Cluster implementation of ElastiCache 5.0.6. Redis 6 boasts multi-threading which can improve performance , but it is disabled by default. If we upgrade to ElastiCache with Redis 6.x, will it have multi-thread enabled? How many threads? Is there a way to configure multi-threading capabilities at time of migration?
Poor query performance after upgrading from Aurora PostgreSQL 12 to 13
After upgrading aws aurora postgresql 12 to 13.6 last night, the performance of the database went crazy. The cpu load is in the skies, and simple queries take minutes which were supposed to be under seconds just yesterday. I feel like this is some sort of misconfiguration, or I really hope so that some magic will resolve the case. The performance of the queries is related to the size of a database, but in a strange exponential manner. On demo data, the queries perform as they were previously, but on production data they get significantly worse. Can this not be some resource limit of the workers? I don't even know where to look, I'm just running the queries which were working yesterday. Could you help out?
Odd Athena Query Performance Issue
Hello, I often use CTE's or views to create faux parameters as a way to improve reusability of code. For example, if I had a bunch of CTE's or subqueries (or different queries/views altogether) that all used a common value in a where clause, I would simply create a view/CTE containing the values and select the values from a subquery wherever needed. This typically has no meaningful impact on performance; however, it seems to degrade execution speed quite a bit in Athena and I'm not sure why this would be the case. What is the difference between: 'select * from table where foo = 'bar' versus 'with param as (select 'bar' as p) select * from table where foo = (select p from param)'? When querying a substantial amount of data, the second pattern makes the query speed untenable. These tables are partitioned to the hour also.
App Runner actions work very slow (2-10 minutes) and deployer provides incorrect error message
App Runner actions work very slow for me. create/pause/resume may take 2-5 minutes for simple demo image (`public.ecr.aws/aws-containers/hello-app-runner:latest`) and create-service when image not found takes ~10 minutes: example #1 - 5 min to deploy hello-app image ``` 04-17-2022 05:59:55 PM [AppRunner] Service status is set to RUNNING. 04-17-2022 05:59:55 PM [AppRunner] Deployment completed successfully. 04-17-2022 05:59:44 PM [AppRunner] Successfully routed incoming traffic to application. 04-17-2022 05:58:33 PM [AppRunner] Health check is successful. Routing traffic to application. 04-17-2022 05:57:01 PM [AppRunner] Performing health check on port '8000'. 04-17-2022 05:56:51 PM [AppRunner] Provisioning instances and deploying image. 04-17-2022 05:56:42 PM [AppRunner] Successfully pulled image from ECR. 04-17-2022 05:54:56 PM [AppRunner] Service status is set to OPERATION_IN_PROGRESS. 04-17-2022 05:54:55 PM [AppRunner] Deployment started. ``` example #2 - 10 min when image not found ``` 04-17-2022 05:35:41 PM [AppRunner] Failed to pull your application image. Be sure you configure your service with a valid access role to your ECR repository. 04-17-2022 05:25:47 PM [AppRunner] Starting to pull your application image. ``` example #3 - 10 min when image not found ``` 04-17-2022 06:46:24 PM [AppRunner] Failed to pull your application image. Be sure you configure your service with a valid access role to your ECR repository. 04-17-2022 06:36:31 PM [AppRunner] Starting to pull your application image. ``` but 404 error should be detected immediately and fail much faster. because no need to retry 404 many times for 10 min, right? additionally the error message `Failed to pull your application image. Be sure you configure your service with a valid access role to your ECR repository` is very confusing. it doesn't show image name and doesn't provide the actual cause. 404 is not related to access errors like 401 or 403, correct? can App Runner actions performance and error message be improved?
JAR size affect the response time of lambda?
Hi, I'm making some experiments regarding the impact of JAR size on Lambda Cold Starts and response times. I have created 3 JARs of different sizes. | JAR Size | Response time| | --- | --- | | 10mb| 1s| | 64mb| 6s| | 100mb| 12s| In Cludwatch logs I can see that init durations for all these Lambas is on average ~620ms, but for every lambda the response time is different and I don't know why. The code is the same(there is no logic -> all of them are returning a String), differences in size are due to different maven dependencies that I added. All of them are from AWS SDK. Is this normal?
DynamoDB first request is very slow
Hi, During building a simple serverless CRUD API using API Gateway, Lambda(Java 8) and DynamoDB. I came across a problem related to a slow response to the first DynamoDB request. Generally, responses are in the 10s of milliseconds however when I'm executing my lambda function for the first time I am seeing response time around 7s. In my lambda function I'm using Java SDK v2 I also used all the practices from this post: [https://aws.amazon.com/blogs/developer/tuning-the-aws-java-sdk-2-x-to-reduce-startup-time] I did some tests and here are the results of the Lambda execution with DynamoDB request: --- || Avg Response Time | | --- | --- | --- | |Lambda 512mb| 7s | |Lambda 1024mb| 5s | |Lambda 2048mb| 3.5s | |Creating only DynamoDB Client 512mb Lambda | 3s | Note: All of these responses are from cold start Lambda. The avg time for the cold start of my Java Lambdas is 1.5s. Are there any official resources that are saying that the first DynamoDB request will be so slow? Thanks
AWS Glue - Glue Jobs - Glue 2.0: Worker Types
Hey guys, I've got several questions regarding **Glue 2.0** worker types for AWS Glue Jobs. I have gone through this documentation https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html as Im trying to figure out the type and amount of workers I need for my jobs, and the pricing as well, and I still have some questions left. 1. How many DPUs is 1 `Standard` worker type equivalent to? 2. How are resources being sliced by the second executor provided by a single `standard` worker? 3. Whats the difference between having 2 executors (standard) and 1 executor (G.1X)? In what situation should I use one over the other? 4. Assuming 1 Standard Worker = 1 DPU (Question 1 might answer this one too), am I being charge the same as a G.1X worker? 5. Documentation mentions that, while using Glue 2.0, you need to specify a `worker type` and `number of workers`, does this mean that if I am using a `standard` worker type, all workers (and executors) are going to be active in the execution? Docs specify that while using Glue 1.0, you just need to provide a `Max Number` so Im assuming not all of workers are necessarily active here. Really Appreciate the help guys, Regards
Setting MKL_NUM_THREADS to be more than 16 for m5 instances
Hey, I have a 32-core EC2 linux m5 instance. My python installed via anaconda. I notice that my numpy cannot use more than 16 cores. Looks like my numpy uses libmkl_rt.so: ``` : np.show_config() blas_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/ec2-user/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/ec2-user/anaconda3/include'] blas_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/ec2-user/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/ec2-user/anaconda3/include'] lapack_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/ec2-user/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/ec2-user/anaconda3/include'] lapack_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/ec2-user/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/ec2-user/anaconda3/include'] ``` When I tried to set MKL_NUM_THREADS below 16, it works ``` (base) ec2-user@ip-172-31-18-3:~$ export MKL_NUM_THREADS=12 && python -c "import ctypes; mkl_rt = ctypes.CDLL('libmkl_rt.so'); print (mkl_rt.mkl_get_max_threads())" 12 ``` When I tried to set it to 24, it stops at 16 ``` (base) ec2-user@ip-172-31-18-3:~$ export MKL_NUM_THREADS=24 && python -c "import ctypes; mkl_rt = ctypes.CDLL('libmkl_rt.so'); print (mkl_rt.mkl_get_max_threads())" 16 ``` But I do have 32 cores ``` In : os.cpu_count() Out: 32 ``` Is there any other settings I need to check? Thanks, Bill
Performance issue : Querying Athena using Boto3
I am querying Athena using Boto3 from python script. Most of the queries return more than 1000 records. As a result, I am using 'NextToken' in 'get_query_results' for fetching subsequent records. I observed each query takes significant long time (in minutes) to complete ( fetching 1171 records took more than a minute). Surprisingly, the data size is small - in KB. I am not sure if this behavior is a result of wrong usage on my part OR are there any underlying technical reasons. Any specific areas that I should look at? Appreciate any help with this.