AWS Lambda 查询 Redshift Serverless

标签：Serverless status redshift format Redshift AWS query logger data

在应用程序中，经常在Lambda 中调用redshift data api 去查询 redshift serverless 的数据，以下描述具体实现过程：

1：给Lambda 创建一个执行Lambda的IAM Role，并具有访问redshift serverless 权限，同时需要指定获取临时凭证的策略（参考：授予对 Amazon Redshift 数据 API 的访问权限）

因为在Lambda 执行的Role 将被转化为redshift serverless 对应的用户，加速role 名是：test-redshift-role，则对应的用户是：IAMR:test-redshift-role，下面是创建的测试IAM Role

2：创建Lambda脚本，在脚本中使用redshift data api 访问redshift serverless：

import time
import traceback
import boto3
import logging
from collections import OrderedDict

print('boto3 version: ' + boto3.__version__)
logger = logging.getLogger()
logger.setLevel(logging.INFO)


def lambda_handler(event, context):
    # input parameters passed from the caller event
    # cluster identifier for the Amazon Redshift cluster
    redshift_workgroup_name = event['redshift_workgroup_name']
    # database name for the Amazon Redshift cluster
    redshift_database_name = event['redshift_database']
    # IAM Role of Amazon Redshift cluster having access to S3
    redshift_iam_role = event['redshift_iam_role']
    # run_type can be either asynchronous or synchronous; try tweaking based on your requirement
    run_type = event['run_type']

    
    sql_statements = OrderedDict()
    res = OrderedDict()

    if run_type != "synchronous" and run_type != "asynchronous":
        raise Exception(
            "Invalid Event run_type. \n run_type has to be synchronous or asynchronous.")

    isSynchronous = True if run_type == "synchronous" else False

    # initiate redshift-data redshift_data_api_client in boto3
    redshift_data_api_client = boto3.client('redshift-data')


    sql_statements['SELECT'] = "select top 10 * from \"taxlot\".\"public\".\"lot\" where accountnumber = '7LD701824' and cusip = '256219106';"
    
    logger.info("Running sql queries in {} mode!\n".format(run_type))

    try:
        for command, query in sql_statements.items():
            logging.info("Example of {} command :".format(command))
            res[command + " STATUS: "] = execute_sql_data_api(redshift_data_api_client, redshift_database_name, command, query,redshift_workgroup_name, isSynchronous)

    except Exception as e:
        raise Exception(str(e) + "\n" + traceback.format_exc())
    return res


def execute_sql_data_api(redshift_data_api_client, redshift_database_name, command, query, redshift_workgroup_name, isSynchronous):

    MAX_WAIT_CYCLES = 20
    attempts = 0
    # Calling Redshift Data API with executeStatement()
    res = redshift_data_api_client.execute_statement(Database=redshift_database_name,  Sql=query, WorkgroupName=redshift_workgroup_name)
    logger.info("Query result{}!\n".format(res))
    
    query_id = res["Id"]
    desc = redshift_data_api_client.describe_statement(Id=query_id)
    query_status = desc["Status"]
    logger.info("Query status: {} .... for query-->{}".format(query_status, query))
    logger.info("Query status: {} ".format(desc))    
    done = False

    # Wait until query is finished or max cycles limit has been reached.
    while not done and isSynchronous and attempts < MAX_WAIT_CYCLES:
        attempts += 1
        logger.info("Query cycle: {} ".format(attempts)) 
        time.sleep(1)
        desc = redshift_data_api_client.describe_statement(Id=query_id)
        query_status = desc["Status"]
        logger.info("Query status: {} ".format(query_status))  
        
        if query_status == "FAILED":
            raise Exception('SQL query failed:' +
                            query_id + ": " + desc["Error"])

        elif query_status == "FINISHED":
            logger.info("query status is: {} for query id: {} and command: {}".format(query_status, query_id, command))
            done = True
            # print result if there is a result (typically from Select statement)
            if desc['HasResultSet']:
                response = redshift_data_api_client.get_statement_result(Id=query_id)
                logger.info("Printing response of {} query --> {}".format(command, response['Records']))
        else:
            logger.info(
                "Current working... query status is: {} ".format(query_status))

    # Timeout Precaution
    if done == False and attempts >= MAX_WAIT_CYCLES and isSynchronous:
        logger.info("Limit for MAX_WAIT_CYCLES has been reached before the query was able to finish. We have exited out of the while-loop. You may increase the limit accordingly. \n")
        raise Exception("query status is: {} for query id: {} and command: {}".format(
            query_status, query_id, command))

    return query_status

测试时传入参数：

至此，Lambda部分设置完，但在执行时将报权限拒绝错误，因为此时，Lambda 的执行Role：test-redshift-role 在数据库中对应的用户 IAMR:test-redshift-role 是没有访问权限的，所以需要在Redshift Serverless console中继续下面的第3步

"errorMessage": "SQL query failed:77d160f1-40ef-476b-bdc5-b5d64432e96d: ERROR: permission denied for relation lot\nTraceback (most recent call last):\n  File \"/var/task/lambda_function.py\", line 44

3：给IAM Role设置访问权限，此处是将用户加入到一个group中，并授予这个group访问全部表的select 权限，如下图所示

或者直接赋予IAM Role 相应的权限

grant all on table "taxlot"."public"."lot" to "IAMR:test-redshift-role";

设置完后，再回到lambda，执行redshift serverless 查询成功。

标签：Serverless,status,redshift,format,Redshift,AWS,query,logger,data
From： https://www.cnblogs.com/benfly/p/17120616.html

AWS Lambda 查询 Redshift Serverless

相关文章

赞助商

阅读排行