为鉴权/血缘解析SQL语句
背景:公司的数仓是集中式数仓,大部分业务公用hive环境,少部分业务通过数仓透出的文件进行数据交换。
问题:所有业务都可以访问所有的hive表,急需要权限管控。那解析各个业务提交的SQL文件就成了首要任务。
解决方案
1.Jsqlparser是一个java的jar包,可以解析简单的SQL语句,但是不能解析特殊语法函数等
2.druid是阿里的连接池服务,也提供了解析SQL的工具类入口,能够解析mysql,hive,clickhouse,hbase等十几种SQL,但是它解析出来的还是语法树的结构,需要不断递归进行循环遍历
3.SparkSqlParser是spark的工具类,可以将SQL直接转换为逻辑执行计划的JSON格式,解析难度不大
下面把每一个方法做一个简单的例子
Jsqlparser
pom文件配置
<dependency>
<groupId>com.github.jsqlparser</groupId>
<artifactId>jsqlparser</artifactId>
<version>1.4</version>
</dependency>
CODE (功能解析出来查询的字段名称)
import com.alibaba.fastjson.JSON;
import net.sf.jsqlparser.JSQLParserException;
import net.sf.jsqlparser.expression.Alias;
import net.sf.jsqlparser.expression.Expression;
import net.sf.jsqlparser.expression.Function;
import net.sf.jsqlparser.parser.CCJSqlParserManager;
import net.sf.jsqlparser.parser.SimpleNode;
import net.sf.jsqlparser.schema.Column;
import net.sf.jsqlparser.statement.Statement;
import net.sf.jsqlparser.statement.select.*;
import net.sf.jsqlparser.statement.create.table.CreateTable;
import net.sf.jsqlparser.util.TablesNamesFinder;
import org.apache.commons.lang.StringUtils;
import java.io.Reader;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Jsqlparser {
public static void main(String[] args) throws JSQLParserException {
String sql = "SELECT name,SUM(CASE WHEN sb.sblb = '1' THEN 1 ELSE 0 END) AS 待验证, SUM(CASE WHEN sb.sblb = '2' THEN 1 ELSE 0 END) AS 通过,SUM(CASE WHEN sb.sblb = '3' THEN 1 ELSE 0 END) AS 失效 FROM SBMP_SBXX sb JOIN FR_ZTJCXX ztjc ON ztjc.qylx is NOT NULL AND ztjc.qyzt = '1' AND ztjc.ssgq IS NOT NULL AND ztjc.ID = sb.ztid WHERE sb.rdsj = 'XXX'";
List<String> result = new ArrayList<String>();
CCJSqlParserManager parserManager = new CCJSqlParserManager();
Statement statement = parserManager.parse(new StringReader(sql));
if (statement instanceof Select) {
Select selectStatement = (Select) statement;
List<SelectItem> selectItemlist = new ArrayList<SelectItem>();
if (selectStatement.getSelectBody() instanceof PlainSelect) {
PlainSelect selectBody = (PlainSelect) selectStatement.getSelectBody();
selectItemlist = selectBody.getSelectItems();
} else if (selectStatement.getSelectBody() instanceof SetOperationList) {
SetOperationList operationList = (SetOperationList) selectStatement.getSelectBody();
List<SelectBody> list = operationList.getSelects();
for (SelectBody a : list) {
if (a instanceof PlainSelect) {
PlainSelect selectBody = (PlainSelect) a;
selectItemlist.addAll(selectBody.getSelectItems());
}
}
}
SelectItem selectItem = null;
SelectExpressionItem selectExpressionItem = null;
AllTableColumns allTableColumns = null;
Alias alias = null;
SimpleNode node = null;
if (selectItemlist != null) {
for (int i = 0; i < selectItemlist.size(); i++) {
selectItem = selectItemlist.get(i);
if (selectItem instanceof SelectExpressionItem) {
selectExpressionItem = (SelectExpressionItem) selectItemlist.get(i);
alias = selectExpressionItem.getAlias();
node = selectExpressionItem.getExpression().getASTNode();
String columnName = "";
if (node != null) {
Object value = node.jjtGetValue();
if (value instanceof Column) {
columnName = ((Column) value).getColumnName();
} else if (value instanceof Function) {
columnName = ((Function) value).toString();
} else {
// 增加对select 'aaa' from table; 的支持
if (value != null) {
columnName = value.toString();
columnName = columnName.replace("'", "");
columnName = columnName.replace("\"", "");
}
}
}
if (alias != null) {
columnName = alias.getName();
}
if (!result.contains(columnName)) {
result.add(columnName);
}
} else if (selectItem instanceof AllTableColumns) {
allTableColumns = (AllTableColumns) selectItemlist.get(i);
if (!result.contains(allTableColumns.toString())) {
result.add(allTableColumns.toString());
}
} else {
if (!result.contains(selectItem.toString())) {
result.add(selectItem.toString());
}
}
}
}
}
System.out.println(result.toString());
}
}
Druid
pom文件配置
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.2.6</version>
</dependency>
CODE(功能解析出来insert的名称和select表名称,但是select表名称还在query里面很深)
import com.alibaba.druid.sql.SQLUtils;
import com.alibaba.druid.sql.ast.SQLObject;
import com.alibaba.druid.sql.ast.SQLStatement;
import com.alibaba.druid.sql.ast.statement.*;
import java.util.List;
public class DruidTest {
public static void main(String[] args) {
String selectSql = "INSERT OVERWRITE TABLE table PARTITION(dt='${dt}') SELECT /*+ REPARTITION(10) */ * FROM ( SELECT /*+ BROADCAST(b) */ * FROM ( SELECT * FROM data WHERE dt='${dt}' ) a inner JOIN ( SELECT * FROM con_tabl1 ) UNION ALL ( SELECT * FROM con_tabl2) UNION ALL ( SELECT pkg,cate1_gp AS cate FROM con_tabl3 ) ) b ON a.pkg=b.pkg ) GROUP BY gazj )";// 待解析 SQL
// 新建 Parser
// 解析 SQL 语句
List<SQLStatement> stmtList = SQLUtils.parseStatements(selectSql, "hive");
// 遍历解析结果,根据不同的语句类型做相应的处理
for (SQLStatement stmt : stmtList) {
if (stmt instanceof SQLSelectStatement) {
// 处理 SELECT 语句
SQLSelectStatement selectStmt = (SQLSelectStatement) stmt;
System.out.println("The select table name is: " + selectStmt.getSelect().getQueryBlock().getFrom().findTableSource(0));
}
else if (stmt instanceof SQLInsertStatement) {
// 处理 INSERT 语句
SQLInsertStatement insertStmt = (SQLInsertStatement) stmt;
System.out.println("The insert table name is: " + insertStmt.getTableSource().getName());
List<SQLObject> children = insertStmt.getChildren();
System.out.println(children);
SQLSelect query = insertStmt.getQuery();
System.out.println(query);
}
else if (stmt instanceof SQLUpdateStatement) {
// 处理 UPDATE 语句
SQLUpdateStatement updateStmt = (SQLUpdateStatement) stmt;
System.out.println("The update table name is: " + updateStmt.getFrom().findTableSource(0));
}
else if (stmt instanceof SQLDeleteStatement) {
// 处理 DELETE 语句
SQLDeleteStatement deleteStmt = (SQLDeleteStatement) stmt;
System.out.println("The delete table name is: " + deleteStmt.getTableName().getSimpleName());
}
}
}
}
SparkSqlParser
pom文件配置
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
<version>4.7.1</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.62</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
CODE(功能解析出来insert的名称和select表名称,扔进去source_table_list,target_table_list)
大部分的方式是spark源码里面测试SQL的语法,其实可以略掉
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import org.apache.spark.sql.execution.SparkSqlParser;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Sql_type{
private String type;
private ArrayList<String> source_table_list = new ArrayList<String>();
private ArrayList<String>target_table_list = new ArrayList<String>();
public String getType() {return type;}
public void setType(String type) {if(this.type == null){this.type = type;}}
public ArrayList<String> getSource_table_list() {return source_table_list;}
public void add_Source_table_list(String source_table) {if(!source_table_list.contains(source_table)){this.source_table_list.add(source_table);}}
public ArrayList<String> getTarget_table_list() {return target_table_list;}
public void add_Target_table_list(String target_table) {if(!target_table_list.contains(target_table)){target_table_list.add(target_table);}}
public void reserve_target_source(){
ArrayList<String> reserve = new ArrayList<String>();
reserve = source_table_list;
source_table_list = target_table_list;
target_table_list =reserve;
}
public String toString() {
return "Sql_type{type='" + type + '\'' +", source_table_list=" + source_table_list +", target_table_list=" + target_table_list + '}';
}
}
public class TestSparkSqlParser {
public static String parseDbAndTable(String create_table_name) {
Pattern pattern = Pattern.compile("\\[(.*)\\]");
Matcher matcher = pattern.matcher(create_table_name);
String db_table = "";
if (matcher.find()) {
String content = matcher.group(1);
String[] parts = content.split(", ");
if(parts.length==2){
db_table = parts[0]+"."+parts[1];
}
if(parts.length==1){
db_table = parts[0];
}
if(parts.length>=3){
db_table = content;
}
}
return db_table;
}
public static void reserve_table_list(String jsonString,Sql_type sql_type) {
ArrayList<String> reserve_table_list = new ArrayList<String>();
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.AddColumns");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.AddPartitions");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.AlterColumn");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.AnalyzeColumn");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.AnalyzeTable");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.CacheTable");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DeleteFromTable");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DescribeColumn");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DescribeFunction");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DescribeFunction");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DescribeNamespace");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DescribeRelation");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DropColumns");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DropFunction");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DropNamespace");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DropPartitions");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DropTable");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.DropView");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.LoadData");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.RecoverPartitions");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.RenameColumn");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.RenamePartitions");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.RepairTable");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.SetNamespaceLocation");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.SetNamespaceProperties");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.SetTableLocation");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.SetTableProperties");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.SetTableProperties");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.SetTableSerDeProperties");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.SetViewProperties");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.ShowColumns");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.ShowCreateTable");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.ShowPartitions");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.ShowTableProperties");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.TruncatePartition");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.TruncateTable");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.UncacheTable");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.UnsetTableProperties");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.UnsetTableProperties");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.UnsetViewProperties");
reserve_table_list.add("org.apache.spark.sql.catalyst.plans.logical.UpdateTable");
if(reserve_table_list.contains(jsonString)){
sql_type.reserve_target_source();
}
}
public static void parse_special_source_target(Sql_type sql_type, JSONArray json){
if(json.getJSONObject(0).getString("class").equals("org.apache.spark.sql.catalyst.plans.logical.AlterViewAs")){
for (int i = 0; i < json.size(); i++) {
JSONObject jsonObject = json.getJSONObject(i);
String class_String = jsonObject.getString("class");
if(class_String.equals("org.apache.spark.sql.catalyst.plans.logical.AlterViewAs")){
sql_type.setType("ALTERVIEWAS");
}
if(class_String.equals("org.apache.spark.sql.catalyst.analysis.UnresolvedView")){
sql_type.add_Target_table_list(parseDbAndTable(jsonObject.getString("multipartIdentifier")));
}
if(class_String.equals("org.apache.spark.sql.catalyst.analysis.UnresolvedRelation")){
sql_type.add_Source_table_list(parseDbAndTable(jsonObject.getString("multipartIdentifier")));
}
}
}
}
public static void parse_normal_source_target(Sql_type sql_type, JSONArray json){
for (int i = 0; i < json.size(); i++) {
JSONObject jsonObject = json.getJSONObject(i);
String class_String = jsonObject.getString("class");
switch (class_String) {
case "org.apache.spark.sql.catalyst.plans.logical.AddColumns": sql_type.setType("ALTER_TABLE_ADDCOLUMNS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.AddPartitions": sql_type.setType("ADD_PARTITIONS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.Aggregate": sql_type.setType("SELECT"); break;
case "org.apache.spark.sql.catalyst.plans.logical.AlterColumn": sql_type.setType("ALTER_TABLE_ADDCOLUMNS_ALTERCOLUMN"); break;
case "org.apache.spark.sql.catalyst.plans.logical.AnalyzeColumn": sql_type.setType("ANALYZE_COLUMN"); break; //统计列分布信息
case "org.apache.spark.sql.catalyst.plans.logical.AnalyzeTable": sql_type.setType("ANALYZE_TABLE"); break; //统计表分布信息
case "org.apache.spark.sql.catalyst.plans.logical.CacheTable": sql_type.setType("CACHE_TABLE"); break;
case "org.apache.spark.sql.catalyst.plans.logical.CacheTableAsSelect": sql_type.setType("CACHE_TABLE_ASSELECT");
sql_type.add_Target_table_list(jsonObject.getString("tempViewName"));break;
case "org.apache.spark.sql.catalyst.plans.logical.CreateFunction": sql_type.setType("ALTER_FUNCTION_AS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.CreateTable":sql_type.setType("CREATE_TABLE"); break;
case "org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect": sql_type.setType("CREATE_TABLE_ASSELECT"); break;
case "org.apache.spark.sql.catalyst.plans.logical.CreateView": sql_type.setType("ALTER_VIEW"); break;
case "org.apache.spark.sql.execution.command.CreateViewCommand":
sql_type.setType("CREATE_VIEW_COMMAND"); sql_type.add_Target_table_list(jsonObject.getJSONObject("name").getString("table"));break;
case "org.apache.spark.sql.catalyst.plans.logical.DeleteFromTable": sql_type.setType("DELETE_TABLE"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DescribeColumn": sql_type.setType("DESC_TABLE_COLUMN"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DescribeFunction": sql_type.setType("DESCRIBE_FUNCTION"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DescribeNamespace": sql_type.setType("DESC_DB"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DescribeRelation": sql_type.setType("DESC_TABLE"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DropColumns": sql_type.setType("ALTER_TABLE_DROPCOLUMNS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DropFunction": sql_type.setType("DROP_FUNCTION"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DropNamespace": sql_type.setType("DROP_NAMESPACE"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DropPartitions": sql_type.setType("DROP_PARTITIONS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DropTable": sql_type.setType("DROP_TABLE"); break;
case "org.apache.spark.sql.catalyst.plans.logical.DropView": sql_type.setType("DROP_VIEW"); break;
case "org.apache.spark.sql.catalyst.plans.logical.InsertIntoDir": sql_type.setType("INSERT_INTO_DIR"); break;
case "org.apache.spark.sql.catalyst.plans.logical.InsertIntoStatement": sql_type.setType("INSERT_INTO_TABLE");
sql_type.add_Target_table_list(parseDbAndTable(jsonObject.getJSONArray("table").getJSONObject(0).getString("multipartIdentifier")));break;
case "org.apache.spark.sql.catalyst.plans.logical.LoadData": sql_type.setType("LOAD_DATA"); break;
case "org.apache.spark.sql.catalyst.plans.logical.MergeIntoTable": sql_type.setType("MERGE_TABLE"); break; //未进行处理
case "org.apache.spark.sql.catalyst.plans.logical.Project": sql_type.setType("SELECT"); break;
case "org.apache.spark.sql.catalyst.plans.logical.RecoverPartitions": sql_type.setType("RECOVER_PARTITIONS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.RenameColumn": sql_type.setType("ALTER_TABLE_RENAMECOLUMN"); break;
case "org.apache.spark.sql.catalyst.plans.logical.RenamePartitions": sql_type.setType("RENAME_PARTITIONS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.RenameTable": sql_type.setType("RENAME_TABLE");
sql_type.add_Target_table_list(parseDbAndTable(jsonObject.getString("newName")));break;
case "org.apache.spark.sql.catalyst.plans.logical.RepairTable": sql_type.setType("REPAIR_TABLE"); break; //修复分区
case "org.apache.spark.sql.catalyst.plans.logical.ReplaceTable": sql_type.setType("REPLACE_TABLE"); break;
case "org.apache.spark.sql.catalyst.plans.logical.ReplaceTableAsSelect": sql_type.setType("REPLACE_TABLE_ASSELECT"); break;
case "org.apache.spark.sql.catalyst.plans.logical.SetNamespaceLocation": sql_type.setType("ALTER_NAMESPACE_SETLOCATION"); break;
case "org.apache.spark.sql.catalyst.plans.logical.SetNamespaceProperties": sql_type.setType("ALTER_NAMESPACE_PROPERTIES"); break;
case "org.apache.spark.sql.catalyst.plans.logical.SetTableLocation": sql_type.setType("ALTER_TABLE_SETLOCATION"); break;
case "org.apache.spark.sql.catalyst.plans.logical.SetTableProperties": sql_type.setType("ALTER_TABLE_PROPERTIES"); break;
case "org.apache.spark.sql.catalyst.plans.logical.SetTableSerDeProperties": sql_type.setType("ALTER_TABLE_SETTABLESERDEPROPERTIES"); break;
case "org.apache.spark.sql.catalyst.plans.logical.SetViewProperties": sql_type.setType("ALTER_VIEW_PROPERTIES"); break;
case "org.apache.spark.sql.catalyst.plans.logical.ShowColumns": sql_type.setType("SHOW_PARTITIONS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.ShowCreateTable": sql_type.setType("SHOW_TABLES"); break;
case "org.apache.spark.sql.catalyst.plans.logical.ShowFunctions": sql_type.setType("SHOW_FUNCTIONS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.ShowPartitions": sql_type.setType("SHOW_PARTITIONS"); break;
case "org.apache.spark.sql.catalyst.plans.logical.ShowTableProperties": sql_type.setType("SHOW_TABLEPROPERTIES"); break;
case "org.apache.spark.sql.catalyst.plans.logical.ShowTables": sql_type.setType("SHOW_TABLES"); break; //未进行处理
case "org.apache.spark.sql.catalyst.plans.logical.ShowTableExtended": sql_type.setType("SHOW_TABLES"); break; //未进行处理
case "org.apache.spark.sql.catalyst.plans.logical.TruncatePartition": sql_type.setType("TRUNCATE_PARTITION"); break;
case "org.apache.spark.sql.catalyst.plans.logical.TruncateTable": sql_type.setType("TRUNCATE_TABLE"); break;
case "org.apache.spark.sql.catalyst.plans.logical.UncacheTable": sql_type.setType("UNCACHE_TABLE"); break;
case "org.apache.spark.sql.catalyst.plans.logical.UnresolvedWith":
sql_type.setType("UNRESOLVED_WITH"); break;
case "org.apache.spark.sql.catalyst.plans.logical.UnsetTableProperties": sql_type.setType("DROP_TABLE_PROPERTIES"); break;
case "org.apache.spark.sql.catalyst.plans.logical.UnsetViewProperties": sql_type.setType("DROP_VIEW_PROPERTIES"); break;
case "org.apache.spark.sql.catalyst.plans.logical.UpdateTable": sql_type.setType("DELETE_TABLE"); break;
case "org.apache.spark.sql.catalyst.analysis.UnresolvedTableOrView":
case "org.apache.spark.sql.catalyst.analysis.UnresolvedRelation":
case "org.apache.spark.sql.catalyst.analysis.UnresolvedView":
case "org.apache.spark.sql.catalyst.analysis.UnresolvedTable":
case "org.apache.spark.sql.catalyst.analysis.UnresolvedNamespace":
case "org.apache.spark.sql.catalyst.analysis.UnresolvedFunc":
sql_type.add_Source_table_list(parseDbAndTable(jsonObject.getString("multipartIdentifier")));break;
case "org.apache.spark.sql.catalyst.plans.logical.CreateNamespace": sql_type.setType("CREATE_NAMESPACE");break;
case "org.apache.spark.sql.catalyst.analysis.UnresolvedDBObjectName":
sql_type.add_Target_table_list(parseDbAndTable(jsonObject.getString("nameParts")));break;
}
}
}
public static void parseLogicalPlan(String jsonString) {
JSONArray json = JSON.parseArray(jsonString);
Sql_type sql_type = new Sql_type();
parse_special_source_target(sql_type,json);//创建视图单独处理,格式不一样
parse_normal_source_target(sql_type,json); //集中处理其他数据格式
reserve_table_list(json.getJSONObject(0).getString("class"),sql_type);//反转target_source
System.out.println(sql_type.toString());
}
public static void test_sql() {
SparkSqlParser sparkSqlParser = new SparkSqlParser();
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE my_tab(a INT COMMENT 'test', b STRING NOT NULL) USING parquet").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE my_tab(a INT COMMENT 'test', b STRING NOT NULL) USING parquet").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE IF NOT EXISTS my_tab(a INT, b STRING) USING parquet").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE my_tab(a INT comment 'test', b STRING) USING parquet PARTITIONED BY (a)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE my_tab (a INT, b STRING, ts TIMESTAMP) USING parquet PARTITIONED BY (a,bucket(16, b),years(ts),months(ts),days(ts),hours(ts),foo(a, \"bar\", 34))").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE my_tab(a INT, b STRING) USING parquet CLUSTERED BY (a) SORTED BY (b) INTO 5 BUCKETS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE my_tab(a INT, b STRING) USING parquet COMMENT 'abc'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE my_tab(a INT, b STRING) USING parquet TBLPROPERTIES('test' = 'test')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE my_tab(a INT, b STRING) USING parquet LOCATION '/tmp/file'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE 1m.2g(a INT) USING parquet").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TABLE IF NOT EXISTS mydb.page_view USING parquet COMMENT 'This is the staging page view table' LOCATION '/user/external/page_view' TBLPROPERTIES ('p1'='v1', 'p2'='v2') AS SELECT * FROM mydb.src").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("REPLACE TABLE my_tab(a INT, b STRING) USING parquet TBLPROPERTIES('test' = 'test')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("REPLACE TABLE my_tab(a INT, b STRING) USING parquet COMMENT 'abc'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("REPLACE TABLE my_tab(a INT, b STRING) USING parquet CLUSTERED BY (a) SORTED BY (b) INTO 5 BUCKETS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("REPLACE TABLE my_tab (a INT, b STRING, ts TIMESTAMP) USING parquet PARTITIONED BY (a,bucket(16, b),years(ts),months(ts),days(ts),hours(ts),foo(a, \"bar\", 34))").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("REPLACE TABLE my_tab(a INT comment 'test', b STRING) USING parquet PARTITIONED BY (a)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("REPLACE TABLE my_tab(a INT, b STRING) USING parquet LOCATION '/tmp/file'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("REPLACE TABLE 1m.2g(a INT) USING parquet").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("REPLACE TABLE my_tab(a INT COMMENT 'test', b STRING NOT NULL) USING parquet").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("REPLACE TABLE my_tab as select * from aa").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE NAMESPACE IF NOT EXISTS a.b.c WITH PROPERTIES ('a'='a', 'b'='b', 'c'='c') COMMENT 'namespace_comment' LOCATION '/home/user/db'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE DATABASE IF NOT EXISTS a.b.c WITH DBPROPERTIES ('a'='a', 'b'='b', 'c'='c') COMMENT 'namespace_comment' LOCATION '/home/user/db'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE NAMESPACE a.b.c WITH PROPERTIES('key_without_value'='x', 'key_with_value'='x')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE NAMESPACE IF NOT EXISTS a.b.c WITH PROPERTIES ('a'='a', 'b'='b', 'c'='c') ").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE NAMESPACE b.c LOCATION '/home/user/db' WITH PROPERTIES ('a'=1, 'b'=0.1, 'c'=TRUE)").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP TABLE IF EXISTS db.tab").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP TABLE IF EXISTS tab").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP TABLE db.tab").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP TABLE tab PURGE").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP VIEW IF EXISTS db.view").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP VIEW IF EXISTS view").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP VIEW db.view").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP VIEW view").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER VIEW table_name SET TBLPROPERTIES ('test' = 'test','comment' = 'new_comment')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER VIEW table_name UNSET TBLPROPERTIES ('comment', 'test')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER VIEW table_name UNSET TBLPROPERTIES IF EXISTS ('comment', 'test')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER VIEW a.b.c RENAME TO x.y.z").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name SET TBLPROPERTIES ('test' = 'test','comment' = 'new_comment')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name UNSET TBLPROPERTIES ('comment', 'test')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name UNSET TBLPROPERTIES IF EXISTS ('comment', 'test')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMN x int").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMNS x int, y string").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMNS x int").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMNS (x int)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMNS (x int COMMENT 'doc')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMN x int NOT NULL").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMN x int COMMENT 'doc'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMN x int FIRST").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMN x int AFTER y").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMN x.y.z int COMMENT 'doc'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ADD COLUMN x.y.z int COMMENT 'doc', a.b string FIRST").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c SET LOCATION 'new location'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c PARTITION(ds='2017-06-10') SET LOCATION 'new location'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name RENAME COLUMN a.b.c TO d").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ALTER COLUMN a.b.c TYPE bigint").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name CHANGE COLUMN a.b.c TYPE bigint").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name CHANGE COLUMN a.b.c COMMENT 'new comment'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name CHANGE COLUMN a.b.c FIRST").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ALTER COLUMN a.b.c SET NOT NULL").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name ALTER COLUMN a.b.c DROP NOT NULL").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name DROP COLUMN a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name DROP COLUMN x, y, a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name CHANGE COLUMN a.b.c c INT").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name CHANGE COLUMN a.b.c c INT COMMENT 'new_comment'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name CHANGE COLUMN a.b.c c INT AFTER other_col").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c RENAME TO x.y.z").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE t col").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE t `abc.xyz`").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE t abc.xyz").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE t `a.b`.`x.y`").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE TABLE t col").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE TABLE EXTENDED t col").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE TABLE FORMATTED t col").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE DATABASE EXTENDED a.b").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE DATABASE a.b").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("describe t").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("describe table t").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("describe table extended t").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("describe table formatted t").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT INTO TABLE testcat.ns1.ns2.tbl SELECT * FROM source").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT INTO testcat.ns1.ns2.tbl SELECT * FROM source").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT INTO TABLE testcat.ns1.ns2.tbl SELECT * FROM testcat2.db.tbl").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT INTO testcat.ns1.ns2.tbl PARTITION (p1 = 3, p2) SELECT * FROM source").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT INTO TABLE testcat.ns1.ns2.tbl PARTITION (p1 = 3) SELECT * FROM source").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT OVERWRITE TABLE testcat.ns1.ns2.tbl SELECT /*+ REPARTITION(100) */* FROM source").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT OVERWRITE TABLE testcat.ns1.ns2.tbl SELECT * FROM source").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT OVERWRITE testcat.ns1.ns2.tbl SELECT * FROM source").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT OVERWRITE TABLE testcat.ns1.ns2.tbl PARTITION (p1 = 3, p2) SELECT * FROM source").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT OVERWRITE DIRECTORY '/tmp/file' USING parquet SELECT 1 as a").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT OVERWRITE DIRECTORY '/tmp/file' USING json SELECT 1 as a").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("INSERT OVERWRITE DIRECTORY '/tmp/file' USING json SELECT * FROM source").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("DELETE FROM testcat.ns1.ns2.tbl").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DELETE FROM testcat.ns1.ns2.tbl AS t WHERE t.a = 2").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("UPDATE testcat.ns1.ns2.tbl SET a='Robert', b=32").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("UPDATE testcat.ns1.ns2.tbl AS t SET t.a='Robert', t.b=32 WHERE t.c=2").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("MERGE INTO testcat1.ns1.ns2.tbl AS target USING testcat2.ns1.ns2.tbl AS source ON target.col1 = source.col1 WHEN MATCHED AND (target.col2='delete') THEN DELETE WHEN MATCHED AND (target.col2='update') THEN UPDATE SET target.col2 = source.col2 WHEN NOT MATCHED AND (target.col2='insert') THEN INSERT (target.col1, target.col2) values (source.col1, source.col2)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("MERGE INTO testcat1.ns1.ns2.tbl AS target USING (SELECT * FROM testcat2.ns1.ns2.tbl) AS source ON target.col1 = source.col1 WHEN MATCHED AND (target.col2='delete') THEN DELETE WHEN MATCHED AND (target.col2='update') THEN UPDATE SET target.col2 = source.col2 WHEN NOT MATCHED AND (target.col2='insert') THEN INSERT (target.col1, target.col2) values (source.col1, source.col2)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("MERGE INTO testcat1.ns1.ns2.tbl AS target USING (WITH s as (SELECT * FROM testcat2.ns1.ns2.tbl) SELECT * FROM s) AS source ON target.col1 = source.col1 WHEN MATCHED AND (target.col2='delete') THEN DELETE WHEN MATCHED AND (target.col2='update') THEN UPDATE SET target.col2 = source.col2 WHEN NOT MATCHED AND (target.col2='insert') THEN INSERT (target.col1, target.col2) values (source.col1, source.col2)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("MERGE INTO testcat1.ns1.ns2.tbl AS target USING testcat2.ns1.ns2.tbl AS source ON target.col1 = source.col1 WHEN MATCHED THEN UPDATE SET target.col2 = source.col2 WHEN NOT MATCHED THEN INSERT (target.col1, target.col2) values (source.col1, source.col2)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("MERGE INTO testcat1.ns1.ns2.tbl AS target USING testcat2.ns1.ns2.tbl AS source ON target.col1 = source.col1 WHEN MATCHED AND (target.col2='delete') THEN DELETE WHEN MATCHED AND (target.col2='update') THEN UPDATE SET * WHEN NOT MATCHED AND (target.col2='insert') THEN INSERT *").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("MERGE INTO testcat1.ns1.ns2.tbl AS target USING testcat2.ns1.ns2.tbl AS source ON target.col1 = source.col1 WHEN MATCHED AND (target.col2='delete') THEN DELETE WHEN MATCHED AND (target.col2='update1') THEN UPDATE SET target.col2 = source.col2 WHEN MATCHED AND (target.col2='update2') THEN UPDATE SET target.col2 = source.col2 WHEN NOT MATCHED AND (target.col2='insert') THEN INSERT (target.col1, target.col2) values (source.col1, source.col2)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("MERGE INTO testcat1.ns1.ns2.tbl AS target USING testcat2.ns1.ns2.tbl AS source ON target.col1 = source.col1 WHEN MATCHED AND (target.col2='delete') THEN DELETE WHEN MATCHED AND (target.col2='update1') THEN UPDATE SET target.col2 = source.col2 WHEN NOT MATCHED AND (target.col2='insert1') THEN INSERT (target.col1, target.col2) values (source.col1, source.col2) WHEN NOT MATCHED AND (target.col2='insert2') THEN INSERT (target.col1, target.col2) values (source.col1, source.col2)").toJSON());
parseLogicalPlan(sparkSqlParser.parsePlan("INSERT OVERWRITE TABLE tranadm.mid_adm_audc_credit_risk_control_dd_02 PARTITION(dt='${dt}') SELECT /*+ REPARTITION(10) */ gazj,loan_app_remain_days,sum_1d_loan_app_use_num,social_app_remain_days, sum_30d_loan_app_use_days,sum_30d_loan_app_use_num,max_30d_loan_app_use_days, (sum_30d_loan_app_use_days+sum_31to90d_loan_app_use_days) AS sum_90d_loan_app_use_days, sum_90d_loan_app_use_num,max_90d_loan_app_use_days, (sum_30d_loan_app_use_days+sum_31to90d_loan_app_use_days+sum_91to150d_loan_app_use_days+sum_151to180d_loan_app_use_days) AS sum_180d_loan_app_use_days, sum_180d_loan_app_use_num,max_180d_loan_app_use_days,finance_app_remain_days,read_edu_app_remain_days,max_180d_loan_app_retain_days FROM ( SELECT gazj, MIN(IF(cate='贷款',remain_days,NULL)) AS loan_app_remain_days, COUNT(DISTINCT (IF(rev_1d_is_active AND cate='贷款',pkg,NULL))) AS sum_1d_loan_app_use_num, MIN(IF(cate='社交',remain_days,NULL)) AS social_app_remain_days, LENGTH( REPLACE( CONV( AGGREGATE( COLLECT_SET(IF(cate='贷款',30d_trace_bit_num,0)),CAST(0 AS BIGINT),(acc,x) -> x | acc ),10,2 ),'0','' ) ) AS sum_30d_loan_app_use_days, COUNT(DISTINCT (IF(sum_30d_is_active AND cate='贷款',pkg,NULL))) AS sum_30d_loan_app_use_num, MAX(IF(cate='贷款',sum_30d_use_days,0)) AS max_30d_loan_app_use_days, LENGTH( REPLACE( CONV( AGGREGATE( COLLECT_SET(IF(cate='贷款',31to90d_trace_bit_num,0)),CAST(0 AS BIGINT),(acc,x) -> x | acc ),10,2 ),'0','' ) ) AS sum_31to90d_loan_app_use_days, COUNT(DISTINCT (IF(sum_90d_is_active AND cate='贷款',pkg,NULL))) AS sum_90d_loan_app_use_num, MAX(IF(cate='贷款',sum_90d_use_days,0)) AS max_90d_loan_app_use_days, LENGTH( REPLACE( CONV( AGGREGATE( COLLECT_SET(IF(cate='贷款',91to150d_trace_bit_num,0)),CAST(0 AS BIGINT),(acc,x) -> x | acc ),10,2 ),'0','' ) ) AS sum_91to150d_loan_app_use_days, LENGTH( REPLACE( CONV( AGGREGATE( COLLECT_SET(IF(cate='贷款',151to180d_trace_bit_num,0)),CAST(0 AS BIGINT),(acc,x) -> x | acc ),10,2 ),'0','' ) ) AS sum_151to180d_loan_app_use_days, COUNT(DISTINCT (IF(sum_180d_is_active AND cate='贷款',pkg,NULL))) AS sum_180d_loan_app_use_num, MAX(IF(cate='贷款',sum_180d_use_days,0)) AS max_180d_loan_app_use_days, MIN(IF(cate='Finance',remain_days,NULL)) AS finance_app_remain_days, MIN(IF(cate='Books & Reference',remain_days,NULL)) AS read_edu_app_remain_days, MAX(IF(cate='贷款' AND INSTR(REVERSE(180d_active_trace),'1')!=0,LENGTH(180d_active_trace)-INSTR(REVERSE(180d_active_trace),'1')+1,NULL)) AS max_180d_loan_app_retain_days FROM ( SELECT /*+ BROADCAST(b) */ a.gazj, a.pkg, b.cate, INSTR(active_trace,'1')-1 AS remain_days, IF(INSTR(active_trace,'1')=1,TRUE,FALSE) AS rev_1d_is_active, CAST(CONV(RPAD(SUBSTR(active_trace,1,30),30,'0'),2,10) AS BIGINT) AS 30d_trace_bit_num, IF(INSTR(SUBSTR(active_trace,1,30),'1')=0,FALSE,TRUE) AS sum_30d_is_active, LENGTH(REPLACE(SUBSTR(active_trace,1,30),'0','')) AS sum_30d_use_days, CAST(CONV(RPAD(SUBSTR(active_trace,31,60),60,'0'),2,10) AS BIGINT) AS 31to90d_trace_bit_num, CAST(CONV(RPAD(SUBSTR(active_trace,91,60),60,'0'),2,10) AS BIGINT) AS 91to150d_trace_bit_num, CAST(CONV(RPAD(SUBSTR(active_trace,151,30),30,'0'),2,10) AS BIGINT) AS 151to180d_trace_bit_num, IF(INSTR(SUBSTR(active_trace,1,90),'1')=0,FALSE,TRUE) AS sum_90d_is_active, LENGTH(REPLACE(SUBSTR(active_trace,1,90),'0','')) AS sum_90d_use_days, IF(INSTR(SUBSTR(active_trace,1,180),'1')=0,FALSE,TRUE) AS sum_180d_is_active, LENGTH(REPLACE(SUBSTR(active_trace,1,180),'0','')) AS sum_180d_use_days, SUBSTR(active_trace,1,180) AS 180d_active_trace FROM ( SELECT gazj,pkg,REVERSE(active_trace) AS active_trace FROM trandw.dws_log_app_active_trace_dd WHERE dt='${dt}' ) a inner JOIN ( SELECT pkg,cate FROM ( SELECT DISTINCT pkg,app_cate_zh AS cate FROM trandw.cfg_audc_app WHERE app_cate_zh='贷款' ) UNION ALL ( SELECT pkg,cate1_zh AS cate FROM trandw.dim_pub_app WHERE cate1_zh='社交' ) UNION ALL ( SELECT pkg,cate1_gp AS cate FROM trandw.dim_pub_app WHERE cate1_gp IN ('Finance','Books & Reference') ) ) b ON a.pkg=b.pkg ) GROUP BY gazj )").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLES").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLES '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLES LIKE '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLES FROM testcat.ns1.ns2.tbl").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLES IN testcat.ns1.ns2.tbl").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLES IN ns1 '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLES IN ns1 LIKE '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLE EXTENDED LIKE '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLE EXTENDED FROM testcat.ns1.ns2 LIKE '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLE EXTENDED IN testcat.ns1.ns2 LIKE '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLE EXTENDED LIKE '*test*' PARTITION(ds='2008-04-09', hr=11)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLE EXTENDED FROM testcat.ns1.ns2 LIKE '*test*' PARTITION(ds='2008-04-09')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TABLE EXTENDED IN testcat.ns1.ns2 LIKE '*test*' PARTITION(ds='2008-04-09')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW VIEWS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW VIEWS '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW VIEWS LIKE '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW VIEWS FROM testcat.ns1.ns2.tbl").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW VIEWS IN testcat.ns1.ns2.tbl").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW VIEWS IN ns1 '*test*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW VIEWS IN ns1 LIKE '*test*'").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW DATABASES").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW DATABASES LIKE 'defau*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW DATABASES FROM testcat.ns1.ns2").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW DATABASES IN testcat.ns1.ns2").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW NAMESPACES").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW NAMESPACES FROM testcat.ns1.ns2").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW NAMESPACES IN testcat.ns1.ns2").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW NAMESPACES IN testcat.ns1 LIKE '*pattern*'").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW CREATE TABLE a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW PARTITIONS t1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW PARTITIONS db1.t1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW PARTITIONS t1 PARTITION(partcol1='partvalue', partcol2='partvalue')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW PARTITIONS a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW PARTITIONS a.b.c PARTITION(ds='2017-06-10')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW COLUMNS FROM t1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW COLUMNS IN db1.t1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW COLUMNS FROM t1 IN db1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW COLUMNS FROM db1.t1 IN db1").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP NAMESPACE a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP NAMESPACE IF EXISTS a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP NAMESPACE IF EXISTS a.b.c RESTRICT").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP NAMESPACE IF EXISTS a.b.c CASCADE").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP NAMESPACE a.b.c CASCADE").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER DATABASE a.b.c SET PROPERTIES ('a'='a', 'b'='b', 'c'='c')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER SCHEMA a.b.c SET PROPERTIES ('a'='a')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER NAMESPACE a.b.c SET PROPERTIES ('b'='b')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER DATABASE a.b.c SET DBPROPERTIES ('a'='a', 'b'='b', 'c'='c')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER SCHEMA a.b.c SET DBPROPERTIES ('a'='a')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER NAMESPACE a.b.c SET DBPROPERTIES ('b'='b')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER DATABASE a.b.c SET LOCATION '/home/user/db'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER SCHEMA a.b.c SET LOCATION '/home/user/db'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER NAMESPACE a.b.c SET LOCATION '/home/user/db'").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("analyze table a.b.c compute statistics").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("analyze table a.b.c compute statistics noscan").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("analyze table a.b.c partition (a) compute statistics nOscAn").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds='2008-04-09', hr=11) COMPUTE STATISTICS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds='2008-04-09', hr=11) COMPUTE STATISTICS noscan").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds='2008-04-09') COMPUTE STATISTICS noscan").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds='2008-04-09', hr) COMPUTE STATISTICS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds='2008-04-09', hr) COMPUTE STATISTICS noscan").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds, hr=11) COMPUTE STATISTICS noscan").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds, hr) COMPUTE STATISTICS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds, hr) COMPUTE STATISTICS noscan").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c COMPUTE STATISTICS FOR COLUMNS key, value").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds='2017-06-10') COMPUTE STATISTICS FOR COLUMNS key, value").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ANALYZE TABLE a.b.c PARTITION(ds='2017-06-10') COMPUTE STATISTICS FOR ALL COLUMNS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("MSCK REPAIR TABLE a.b.c").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("LOAD DATA INPATH 'filepath' INTO TABLE a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("LOAD DATA LOCAL INPATH 'filepath' INTO TABLE a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("LOAD DATA LOCAL INPATH 'filepath' OVERWRITE INTO TABLE a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("LOAD DATA LOCAL INPATH 'filepath' OVERWRITE INTO TABLE a.b.c PARTITION(ds='2017-06-10')").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("CACHE TABLE data as select * from aaaa").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CACHE LAZY TABLE a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CACHE LAZY TABLE a.b.c OPTIONS('storageLevel' 'DISK_ONLY')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("UNCACHE TABLE a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("UNCACHE TABLE IF EXISTS a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("TRUNCATE TABLE a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("TRUNCATE TABLE a.b.c PARTITION(ds='2017-06-10')").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c RECOVER PARTITIONS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c ADD IF NOT EXISTS PARTITION (dt='2008-08-08', country='us') LOCATION 'location1' PARTITION (dt='2009-09-09', country='uk')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c ADD PARTITION (dt='2008-08-08') LOCATION 'loc'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name PARTITION (dt='2008-08-08', country='us') RENAME TO PARTITION (dt='2008-09-09', country='uk')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c PARTITION (ds='2017-06-10') RENAME TO PARTITION (ds='2018-06-10')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name DROP IF EXISTS PARTITION (dt='2008-08-08', country='us'), PARTITION (dt='2009-09-09', country='uk')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name DROP PARTITION (dt='2008-08-08', country='us'), PARTITION (dt='2009-09-09', country='uk')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c DROP IF EXISTS PARTITION (ds='2017-06-10')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name SET SERDE 'org.apache.class'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name SET SERDE 'org.apache.class' WITH SERDEPROPERTIES ('columns'='foo,bar', 'field.delim' = ',')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name SET SERDEPROPERTIES ('columns'='foo,bar', 'field.delim' = ',')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name PARTITION (test=1, dt='2008-08-08', country='us') SET SERDE 'org.apache.class' WITH SERDEPROPERTIES ('columns'='foo,bar', 'field.delim' = ',')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE table_name PARTITION (test=1, dt='2008-08-08', country='us') SET SERDEPROPERTIES ('columns'='foo,bar', 'field.delim' = ',')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c SET SERDE 'org.apache.class' WITH SERDEPROPERTIES ('columns'='foo,bar', 'field.delim' = ',')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER TABLE a.b.c PARTITION (test=1, dt='2008-08-08', country='us') SET SERDEPROPERTIES ('columns'='foo,bar', 'field.delim' = ',')").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER VIEW a.b.c AS SELECT 1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("ALTER VIEW a.b.c AS SELECT * from data").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE VIEW view1 AS SELECT * FROM tab1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE TEMPORARY VIEW data AS SELECT * FROM tab1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE OR REPLACE VIEW view1 (col1, col3 COMMENT 'hello') TBLPROPERTIES('prop1Key'='prop1Val') COMMENT 'BLABLA' AS SELECT * FROM tab1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TBLPROPERTIES a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW TBLPROPERTIES a.b.c('propKey1')").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("DESC FUNCTION a").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE FUNCTION a").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE FUNCTION a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DESCRIBE FUNCTION EXTENDED a.b.c").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW FUNCTIONS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW USER FUNCTIONS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW SYSTEM FUNCTIONS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW ALL FUNCTIONS").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW FUNCTIONS LIKE 'funct*'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SHOW FUNCTIONS LIKE a.b.c").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP FUNCTION a").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP FUNCTION a.b.c").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("DROP FUNCTION IF EXISTS a.b.c").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE FUNCTION a as 'fun'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE FUNCTION a.b.c as 'fun'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE OR REPLACE FUNCTION a.b.c as 'fun'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE FUNCTION IF NOT EXISTS a.b.c as 'fun'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE FUNCTION a as 'fun' USING JAR 'j'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE FUNCTION a as 'fun' USING ARCHIVE 'a'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE FUNCTION a as 'fun' USING FILE 'f'").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("CREATE FUNCTION a as 'fun' USING JAR 'j', ARCHIVE 'a', FILE 'f'").toJSON());
//
//
// parseLogicalPlan(sparkSqlParser.parsePlan("with cte2 as (select * from a) INSERT OVERWRITE TABLE testcat.ns1.ns2.tbl select * from cte1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("with cte2 as (select * from a) select * from cte1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SELECT a, b, count(distinct a,b) as c FROM d GROUP BY a, b").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("select * from t1 cross join t2 join t3 on t3.id = t1.id join t4 on t4.id = t1.id").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("select * from t1 inner join (t2 inner join t3 on col3 = col2) on col3 = col1").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("select * from t1 inner join (t2 inner join t3) on col3 = col2").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("select * from t1 inner join (t2 inner join t3 on col3 = col2)").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("select * from t1, t3 join t2 on t1.col1 = t2.col2").toJSON());
//
// parseLogicalPlan(sparkSqlParser.parsePlan("SELECT /*+ REPARTITION(100) */ * FROM t").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SELECT /*+ COALESCE(10) */ * FROM t").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SELECT /*+ INDEX(t, emp_job_ix) */ * FROM t").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SELECT /*+ REPARTITION(c) */ * FROM t").toJSON());
// parseLogicalPlan(sparkSqlParser.parsePlan("SELECT /*+ REPARTITION(100, c) */ * FROM t").toJSON());
}
public static int find_next_string(String find_string,String[] lines,int start_index,StringBuilder sb,String type) {
int return_index = 0;
for (int i = start_index; i < lines.length; i++) {
String line = lines[i];
if(line.contains(find_string)){
return_index = i;
break;
}
else if(type.equals("append")){
sb.append(line.replaceAll(System.getProperty("line.separator"),"")+" ");
}
}
return return_index+1;
}
public static void replace_note() {
//
// 示例字符串
String input = " --This is a comment"+System.getProperty("line.separator")+
" set hive.exec.dynamic.partition.mode=nonstrict"+System.getProperty("line.separator")+
";"+System.getProperty("line.separator")+" SELECT name, age FROM "+System.getProperty("line.separator")+"persons WHERE age > 18; " +
"/* This is a comment with multiple "+System.getProperty("line.separator")+
"lines " + "*/ SELECT gender, COUNT(*) "+System.getProperty("line.separator")+"FROM persons GROUP BY gender; ";
String[] lines = input.split(" ");
StringBuilder sb = new StringBuilder();
for (int i = 0; i < lines.length; ) {
String line = lines[i];
if (line.contains("--")) {
i = find_next_string(System.getProperty("line.separator"),lines,i,sb,"drop");continue;
} else if (line.contains("/*")) {
i = find_next_string("*/",lines,i,sb,"drop");continue;// 如果是多行注释则一直跳过直到结束
} else if (line.equals("set")) {
i = find_next_string(";",lines,i,sb,"drop");continue;// 如果是set语句则一直跳过直到结束
} else if (line.trim().length()>0) {
sb.append(line.replaceAll(System.getProperty("line.separator"),"")).append(" ");// 否则加入结果中
i++;
}else{
i++;
}
}
// 输出最终结果,去掉多余空格和换行
System.out.println(sb.toString());
}
public static void main(String[] args) {
replace_note();//去掉注释,set等方法,SparkSqlParser还不支持set 参数
test_sql();//解析SQL方法
}
}
总结
标签:语句,parsePlan,toJSON,sql,SQL,sparkSqlParser,parseLogicalPlan,table,解析 From: https://www.cnblogs.com/wuxiaolong4/p/17436421.html上面已经将from的表装进source_table_list,insert的表target_table_list
在加上getType类型的参数,后面就可以结合getType进行权限的判断和血缘了
进行权限校验的时候有一个优化点:可以将SQL记录一个参数替换之前进行校验,避免每天因日期变化而不断检验浪费