I am trying to load data from an S3 bucket folder into a snowflake table using lambda. I set up an S3 trigger where my files are ingested and formed an integration between lambda and snowflake to load the incoming files into my s3 into a snowflake table.
The problem is that there are 1000 files being pulled into my S3 folder and a congestion is forming between my snowflake and lambda. I'm looking for a way that when 1000 files are ingested into my S3 bucket, the lambda should process one file at a time. After loading the first file, only it processes the next file in sequence. For example: Either receive confirmation from Snowflake or receive confirmation on it.
You can configure an AWS Lambda function to have reserved concurrency - AWS Lambda:
By setting Reserved Concurrency = 1
, only one Lambda function instance will run at a time. Depending on your Snowflake configuration, you may choose to increase this to 2 or 3 to process files faster without overwhelming Snowflake.
You can also configure the batch size , which is the maximum number of events passed to the function. If S3 sends multiple files to the same Lambda instance, your code can loop through the events and process multiple files on each invocation.
I'm just slightly concerned that if you create a large number of objects and the Lambda function is limited by the concurrency number of 1, S3's call to the Lambda may cause a timeout after multiple retries. If so, you should:
This way, messages will be queued safely instead of (possibly) timing out due to a large file backlog.
The above is the detailed content of Load data from S3 into Snowflake and call lambdas in 1 by 1 order. For more information, please follow other related articles on the PHP Chinese website!