首页 > 其他分享 >词库热更新

词库热更新

时间:2024-01-10 09:45:09浏览次数:37  
标签:jdbc word String null 更新 词库 logger NULL

作为搜索服务的使用者,我希望系统能够提供基于界面操作的,灵活的自定义热词、停用词、同义词的词典管理功能,便于用户自定义扩展符合自己业务场景的词项,进而提高搜索的准确度。

实现方案

  • elasticsearch-analysis-ik插件改造,使用关系型数据库存储热词、停用词。
  • elasticsearch-analysis-dynamic-synonym插件改造,使用关系型数据库存储同义词。
  • 新增词项管理功能,用户可以通过界面编辑或导入符合自己业务的热词、停用词、同义词。

elasticsearch-analysis-ik插件改造

修改ES IK插件的源码,使之能够从MySQL表中定时拉取词库的更新。

表结构

CREATE TABLE `es_extra_mainword`  (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '唯一标识符',
  `main_word` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '热词',
  `is_deleted` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否已删除',
  `create_user` varchar(64) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '创建者',
  `create_time` datetime(0) NULL DEFAULT NULL COMMENT '创建时间',
  `update_user` varchar(64) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '更新者',
  `update_time` datetime(0) NULL DEFAULT NULL COMMENT '更新时间',
  PRIMARY KEY (`id`)
) ENGINE = InnoDB AUTO_INCREMENT = 25 CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '扩展主词库' ROW_FORMAT = Dynamic;
 
CREATE TABLE `es_extra_stopword`  (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '唯一标识符',
  `stop_word` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '停用词',
  `is_deleted` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否已删除',
  `create_user` varchar(64) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '创建者',
  `create_time` datetime(0) NULL DEFAULT NULL COMMENT '创建时间',
  `update_user` varchar(64) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '更新者',
  `update_time` datetime(0) NULL DEFAULT NULL COMMENT '更新时间',
  PRIMARY KEY (`id`)
) ENGINE = InnoDB AUTO_INCREMENT = 25 CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '扩展停用词库' ROW_FORMAT = Dynamic;

配置修改

新增配置文件jdbc.properties

jdbc.url=jdbc:mysql://localhost:3306/test?useAffectedRows=true&characterEncoding=UTF-8&autoReconnect=true&zeroDateTimeBehavior=convertToNull&useUnicode=true&serverTimezone=GMT%2B8&allowMultiQueries=true
jdbc.username=root
jdbc.password=root
jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.update.main.dic.sql=SELECT * FROM `es_extra_main` WHERE update_time > ? order by update_time asc
jdbc.update.stopword.sql=SELECT * FROM `es_extra_stopword` WHERE update_time > ? order by update_time asc
jdbc.update.interval=10

修改POM文件,添加数据库连接驱动

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <version>42.2.18</version>
</dependency>
 

修改src/main/assemblies/plugin.xml,将 驱动的依赖写入,否则打成 zip 后会没有驱动的 jar 包。

<dependencySets>
    <dependencySet>
        <outputDirectory/>
        <useProjectArtifact>true</useProjectArtifact>
        <useTransitiveFiltering>true</useTransitiveFiltering>
        <excludes>
            <exclude>org.elasticsearch:elasticsearch</exclude>
        </excludes>
    </dependencySet>
    <dependencySet>
        <outputDirectory/>
        <useProjectArtifact>true</useProjectArtifact>
        <useTransitiveFiltering>true</useTransitiveFiltering>
        <includes>
            <include>org.apache.httpcomponents:httpclient</include>
            <!--这里 -->
            <include>org.postgresq:postgresql</include>
        </includes>
    </dependencySet>
</dependencySets>

修改src/main/resources/plugin-security.policy,添加permission java.lang.RuntimePermission "setContextClassLoader";,否则会因为权限问题抛出以下异常。

grant {
  // needed because of the hot reload functionality
  permission java.net.SocketPermission "*", "connect,resolve";
  permission java.lang.RuntimePermission "setContextClassLoader";
};
 

代码改造

修改 Dictionary

在构造方法中加载 jdbc.properties 文件
将 getProperty()改为 public
添加了几个方法,用于增删词条
initial()启动自己实现的数据库监控线程

private Dictionary(Configuration cfg) {
    this.configuration = cfg;
    this.props = new Properties();
    this.conf_dir = cfg.getEnvironment().configFile().resolve(AnalysisIkPlugin.PLUGIN_NAME);
    Path configFile = conf_dir.resolve(FILE_NAME);
 
    InputStream input = null;
    try {
        logger.info("try load config from {}", configFile);
        input = new FileInputStream(configFile.toFile());
    } catch (FileNotFoundException e) {
        conf_dir = cfg.getConfigInPluginDir();
        configFile = conf_dir.resolve(FILE_NAME);
        try {
            logger.info("try load config from {}", configFile);
            input = new FileInputStream(configFile.toFile());
        } catch (FileNotFoundException ex) {
            // We should report origin exception
            logger.error("ik-analyzer", e);
        }
    }
    if (input != null) {
        try {
            props.loadFromXML(input);
        } catch (IOException e) {
            logger.error("ik-analyzer", e);
        }
    }
 
    // 加载 jdbc.properties 文件
    loadJdbcProperties();
}
 
public String getProperty(String key){
    if(props!=null){
        return props.getProperty(key);
    }
    return null;
}
 
/**
 * 加载新词条
 */
public static void addWord(String word) {
    singleton._MainDict.fillSegment(word.trim().toLowerCase().toCharArray());
}
 
/**
 * 移除(屏蔽)词条
 */
public static void disableWord(String word) {
    singleton._MainDict.disableSegment(word.trim().toLowerCase().toCharArray());
}
 
/**
 * 加载新停用词
 */
public static void addStopword(String word) {
    singleton._StopWords.fillSegment(word.trim().toLowerCase().toCharArray());
}
 
/**
 * 移除(屏蔽)停用词
 */
public static void disableStopword(String word) {
    singleton._StopWords.disableSegment(word.trim().toLowerCase().toCharArray());
}
 
/**
 * 加载 jdbc.properties
 */
public void loadJdbcProperties() {
    Path file = PathUtils.get(getDictRoot(), DatabaseMonitor.PATH_JDBC_PROPERTIES);
    try {
        props.load(new FileInputStream(file.toFile()));
        logger.info("====================================properties====================================");
        for (Map.Entry<Object, Object> entry : props.entrySet()) {
            logger.info("{}: {}", entry.getKey(), entry.getValue());
        }
        logger.info("====================================properties====================================");
    } catch (IOException e) {
        logger.error("failed to read file: " + DatabaseMonitor.PATH_JDBC_PROPERTIES, e);
    }
}
 
public static synchronized void initial(Configuration cfg) {
    if (singleton == null) {
        synchronized (Dictionary.class) {
            if (singleton == null) {
 
                singleton = new Dictionary(cfg);
                singleton.loadMainDict();
                singleton.loadSurnameDict();
                singleton.loadQuantifierDict();
                singleton.loadSuffixDict();
                singleton.loadPrepDict();
                singleton.loadStopWordDict();
 
                if(cfg.isEnableRemoteDict()){
                    for (String location : singleton.getRemoteExtDictionarys()) {
                        pool.scheduleAtFixedRate(new Monitor(location), 10, 60, TimeUnit.SECONDS);
                    }
                    for (String location : singleton.getRemoteExtStopWordDictionarys()) {
                        pool.scheduleAtFixedRate(new Monitor(location), 10, 60, TimeUnit.SECONDS);
                    }
                }
                 
                // 建立数据库监控线程
                pool.scheduleAtFixedRate(new DatabaseMonitor(), 10, Long.parseLong(getSingleton().getProperty(DatabaseMonitor.JDBC_UPDATE_INTERVAL)), TimeUnit.SECONDS);
            }
        }
    }
}

新增DatabaseMonitor

lastUpdateTimeOfMainDic、lastUpdateTimeOfStopword 记录上次处理的最后一条的updateTime
查出上次处理之后新增或删除的记录
循环判断 is_deleted 字段,为true则添加词条,false则删除词条

package org.wltea.analyzer.dic;
 
import org.apache.logging.log4j.Logger;
import org.elasticsearch.SpecialPermission;
import org.wltea.analyzer.help.ESPluginLoggerFactory;
 
import java.security.AccessController;
import java.security.PrivilegedAction;
import java.sql.*;
import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.LocalTime;
 
/**
 * 通过 mysql 更新词典
 *
 * @author 赵丙双
 */
public class DatabaseMonitor implements Runnable {
 
    private static final Logger logger = ESPluginLoggerFactory.getLogger(DatabaseMonitor.class.getName());
    public static final String PATH_JDBC_PROPERTIES = "jdbc.properties";
 
    private static final String JDBC_URL = "jdbc.url";
    private static final String JDBC_USERNAME = "jdbc.username";
    private static final String JDBC_PASSWORD = "jdbc.password";
    private static final String JDBC_DRIVER = "jdbc.driver";
    private static final String SQL_UPDATE_MAIN_DIC = "jdbc.update.main.dic.sql";
    private static final String SQL_UPDATE_STOPWORD = "jdbc.update.stopword.sql";
    /**
     * 更新间隔
     */
    public final static  String JDBC_UPDATE_INTERVAL = "jdbc.update.interval";
 
    private static final Timestamp DEFAULT_LAST_UPDATE = Timestamp.valueOf(LocalDateTime.of(LocalDate.of(2020, 1, 1), LocalTime.MIN));
 
    private static Timestamp lastUpdateTimeOfMainDic = null;
 
    private static Timestamp lastUpdateTimeOfStopword = null;
 
    public String getUrl() {
        return Dictionary.getSingleton().getProperty(JDBC_URL);
    }
 
    public String getUsername() {
        return Dictionary.getSingleton().getProperty(JDBC_USERNAME);
    }
 
    public String getPassword() {
        return Dictionary.getSingleton().getProperty(JDBC_PASSWORD);
    }
 
    public String getDriver() {
        return Dictionary.getSingleton().getProperty(JDBC_DRIVER);
    }
 
    public String getUpdateMainDicSql() {
        return Dictionary.getSingleton().getProperty(SQL_UPDATE_MAIN_DIC);
    }
 
    public String getUpdateStopwordSql() {
        return Dictionary.getSingleton().getProperty(SQL_UPDATE_STOPWORD);
    }
 
    /**
     * 加载MySQL驱动
     */
    public DatabaseMonitor() {
        SpecialPermission.check();
        AccessController.doPrivileged((PrivilegedAction<Void>) () -> {
            try {
                Class.forName(getDriver());
            } catch (ClassNotFoundException e) {
                logger.error("mysql jdbc driver not found", e);
            }
            return null;
        });
 
 
    }
 
    @Override
    public void run() {
        SpecialPermission.check();
        AccessController.doPrivileged((PrivilegedAction<Void>) () -> {
            Connection conn = getConnection();
 
            // 更新主词典
            updateMainDic(conn);
            // 更新停用词
            updateStopword(conn);
            closeConnection(conn);
             
            return null;
        });
 
    }
 
    public Connection getConnection() {
        Connection connection = null;
        try {
            connection = DriverManager.getConnection(getUrl(), getUsername(), getPassword());
        } catch (SQLException e) {
            logger.error("failed to get connection", e);
        }
        return connection;
    }
 
    public void closeConnection(Connection conn) {
        if (conn != null) {
            try {
                conn.close();
            } catch (SQLException e) {
                logger.error("failed to close Connection", e);
            }
        }
    }
 
    public void closeRsAndPs(ResultSet rs, PreparedStatement ps) {
        if (rs != null) {
            try {
                rs.close();
            } catch (SQLException e) {
                logger.error("failed to close ResultSet", e);
            }
        }
 
        if (ps != null) {
            try {
                ps.close();
            } catch (SQLException e) {
                logger.error("failed to close PreparedStatement", e);
            }
        }
 
    }
 
    /**
     * 主词典
     */
    public synchronized void updateMainDic(Connection conn) {
 
        logger.info("start update main dic");
        int numberOfAddWords = 0;
        int numberOfDisableWords = 0;
        PreparedStatement ps = null;
        ResultSet rs = null;
 
        try {
            String sql = getUpdateMainDicSql();
 
            Timestamp param = lastUpdateTimeOfMainDic == null ? DEFAULT_LAST_UPDATE : lastUpdateTimeOfMainDic;
 
            logger.info("param: " + param);
 
            ps = conn.prepareStatement(sql);
            ps.setTimestamp(1, param);
 
            rs = ps.executeQuery();
 
            while (rs.next()) {
                String word = rs.getString("word");
                word = word.trim();
 
                if (word.isEmpty()) {
                    continue;
                }
 
                lastUpdateTimeOfMainDic = rs.getTimestamp("update_time");
 
                if (rs.getBoolean("is_deleted")) {
                    logger.info("[main dic] disable word: {}", word);
                    // 删除
                    Dictionary.disableWord(word);
                    numberOfDisableWords++;
                } else {
                    logger.info("[main dic] add word: {}", word);
                    // 添加
                    Dictionary.addWord(word);
                    numberOfAddWords++;
                }
            }
 
            logger.info("end update main dic -> addWord: {}, disableWord: {}", numberOfAddWords, numberOfDisableWords);
 
        } catch (SQLException e) {
            logger.error("failed to update main_dic", e);
            // 关闭 ResultSet、PreparedStatement
            closeRsAndPs(rs, ps);
        }
    }
 
    /**
     * 停用词
     */
    public synchronized void updateStopword(Connection conn) {
 
        logger.info("start update stopword");
 
        int numberOfAddWords = 0;
        int numberOfDisableWords = 0;
        PreparedStatement ps = null;
        ResultSet rs = null;
        try {
            String sql = getUpdateStopwordSql();
 
            Timestamp param = lastUpdateTimeOfStopword == null ? DEFAULT_LAST_UPDATE : lastUpdateTimeOfStopword;
 
            logger.info("param: " + param);
 
            ps = conn.prepareStatement(sql);
            ps.setTimestamp(1, param);
 
            rs = ps.executeQuery();
 
            while (rs.next()) {
                String word = rs.getString("word");
                word = word.trim();
 
 
                if (word.isEmpty()) {
                    continue;
                }
 
                lastUpdateTimeOfStopword = rs.getTimestamp("update_time");
 
                if (rs.getBoolean("is_deleted")) {
                    logger.info("[stopword] disable word: {}", word);
 
                    // 删除
                    Dictionary.disableStopword(word);
                    numberOfDisableWords++;
                } else {
                    logger.info("[stopword] add word: {}", word);
                    // 添加
                    Dictionary.addStopword(word);
                    numberOfAddWords++;
                }
            }
 
            logger.info("end update stopword -> addWord: {}, disableWord: {}", numberOfAddWords, numberOfDisableWords);
 
        } catch (SQLException e) {
            logger.error("failed to update main_dic", e);
        } finally {
            // 关闭 ResultSet、PreparedStatement
            closeRsAndPs(rs, ps);
        }
    }
}

打包测试

直接mvn package,然后在 elasticsearch-analysis-ik/target/releases目录中找到 elasticsearch-analysis-ik-6.7.2.zip 压缩包,直接解压到 ES 自己的 plugins 目录即可。

elasticsearch-analysis-dynamic-synonym插件改造

新增数据表存储同义词,修改插件源码,动态获数据库中的同义词。

表结构

CREATE TABLE `es_extra_synonymword`  (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '唯一标识符',
  `synonym_word` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '同义词',
  `is_deleted` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否已删除',
  `create_user` varchar(64) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '创建者',
  `create_time` datetime(0) NULL DEFAULT NULL COMMENT '创建时间',
  `update_user` varchar(64) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '更新者',
  `update_time` datetime(0) NULL DEFAULT NULL COMMENT '更新时间',
  PRIMARY KEY (`id`)
) ENGINE = InnoDB AUTO_INCREMENT = 25 CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '扩展同义词库' ROW_FORMAT = Dynamic;

配置修改

新增配置文件jdbc-reload.properties

jdbc.url=jdbc:mysql://127.0.0.1:13306/test?serverTimezone=GMT&autoReconnect=true&useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useAffectedRows=true&useSSL=false
jdbc.user=root
jdbc.password=123456
# 查询同义词信息
jdbc.reload.synonym.sql=select synonym_docs as words from gw_es_lexicon_synonym where del_flag = 0;
# 查询数据库同义词在数据库版本号
jdbc.reload.swith.synonym.version=SELECT swith_state FROM gw_swith where swith_code = 'synonym_doc'

修改pom文件,新增数据库连接驱动

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <version>42.2.18</version>
</dependency>

修改plugin.xml

<?xml version="1.0"?>
<assembly>
    <id>-</id>
    <formats>
        <format>zip</format>
    </formats>
    <includeBaseDirectory>false</includeBaseDirectory>
 
    <fileSets>
        <fileSet>
            <directory>${project.basedir}/config</directory>
            <outputDirectory>config</outputDirectory>
        </fileSet>
    </fileSets>
 
    <files>
        <file>
            <source>${project.basedir}/src/main/resources/plugin-descriptor.properties</source>
            <outputDirectory>/</outputDirectory>
            <filtered>true</filtered>
        </file>
        <file>
            <source>${project.basedir}/src/main/resources/plugin-security.policy</source>
            <outputDirectory>/</outputDirectory>
            <filtered>true</filtered>
        </file>
    </files>
...略...
</assembly>

代码改造

新增MySqlRemoteSynonymFile文件

package com.bellszhu.elasticsearch.plugin.synonym.analysis;
 
import com.bellszhu.elasticsearch.plugin.DynamicSynonymPlugin;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.synonym.SynonymMap;
import org.elasticsearch.common.io.PathUtils;
import org.elasticsearch.env.Environment;
 
import java.io.*;
import java.nio.file.Path;
import java.sql.*;
import java.util.ArrayList;
import java.util.Properties;
 
/**
 * 加载MySql远程同义词
 * @author huangjiayao
 */
public class MySqlRemoteSynonymFile implements SynonymFile{
 
    /**
     * 数据库配置文件名
     */
    private final static String DB_PROPERTIES = "jdbc-reload.properties";
    private static Logger logger = LogManager.getLogger("dynamic-synonym");
 
    private String format;
 
    private boolean expand;
 
    private boolean lenient;
 
    private Analyzer analyzer;
 
    private Environment env;
 
    // 数据库配置
    private String location;
 
    // 数据库地址
    private static final String jdbcUrl = "jdbc.url";
    // 数据库用户名
    private static final String jdbcUser = "jdbc.user";
    // 数据库密码
    private static final String jdbcPassword = "jdbc.password";
 
    /**
     * 当前节点的同义词版本号
     */
    private long thisSynonymVersion = -1L;
 
    private Connection connection = null;
 
    private Statement statement = null;
 
    private Properties props;
 
    private Path conf_dir;
 
    MySqlRemoteSynonymFile(Environment env, Analyzer analyzer,
                           boolean expand, boolean lenient, String format, String location) {
        this.analyzer = analyzer;
        this.expand = expand;
        this.format = format;
        this.lenient = lenient;
        this.env = env;
        this.location = location;
        this.props = new Properties();
 
        //读取当前 jar 包存放的路径
        Path filePath = PathUtils.get(new File(DynamicSynonymPlugin.class.getProtectionDomain().getCodeSource()
                .getLocation().getPath())
                .getParent(), "config")
                .toAbsolutePath();
        this.conf_dir = filePath.resolve(DB_PROPERTIES);
 
        //判断文件是否存在
        File configFile = conf_dir.toFile();
        InputStream input = null;
        try {
            input = new FileInputStream(configFile);
        } catch (FileNotFoundException e) {
            logger.info("jdbc-reload.properties 数据库配置文件没有找到, " + e);
        }
        if (input != null) {
            try {
                props.load(input);
            } catch (IOException e) {
                logger.error("数据库配置文件 jdbc-reload.properties 加载失败," + e);
            }
        }
        isNeedReloadSynonymMap();
    }
 
    /**
     * 加载同义词词典至SynonymMap中
     * @return SynonymMap
     */
    @Override
    public SynonymMap reloadSynonymMap() {
        try {
            logger.info("start reload local synonym from {}.", location);
            Reader rulesReader = getReader();
            SynonymMap.Builder parser = RemoteSynonymFile.getSynonymParser(rulesReader, format, expand, lenient, analyzer);
            return parser.build();
        } catch (Exception e) {
            logger.error("reload local synonym {} error!", e, location);
            throw new IllegalArgumentException(
                    "could not reload local synonyms file to build synonyms", e);
        }
    }
 
    /**
     * 判断是否需要进行重新加载
     * @return true or false
     */
    @Override
    public boolean isNeedReloadSynonymMap() {
        try {
            Long mysqlVersion = getMySqlSynonymVersion();
            if (thisSynonymVersion < mysqlVersion) {
                thisSynonymVersion = mysqlVersion;
                return true;
            }
        } catch (Exception e) {
            logger.error(e);
        }
        return false;
    }
 
    /**
     * 获取MySql中同义词版本号信息
     * 用于判断同义词是否需要进行重新加载
     *
     * @return getLastModify
     */
    public Long getMySqlSynonymVersion() {
        ResultSet resultSet = null;
        Long mysqlSynonymVersion = 0L;
        try {
            if (connection == null || statement == null) {
//                Class.forName(props.getProperty("jdbc.driver"));
                statement = getConnection(props, connection);
            }
            resultSet = statement.executeQuery(props.getProperty("jdbc.reload.swith.synonym.version"));
            while (resultSet.next()) {
                mysqlSynonymVersion = resultSet.getLong("swith_state");
                logger.info("当前MySql同义词版本号为:{}, 当前节点同义词库版本号为:{}", mysqlSynonymVersion, thisSynonymVersion);
            }
        } catch (SQLException e) {
            e.printStackTrace();
        } finally {
            try {
                if (resultSet != null) {
                    resultSet.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
        return mysqlSynonymVersion;
    }
 
    /**
     * 查询数据库中的同义词
     * @return DBData
     */
    public ArrayList<String> getDBData() {
        ArrayList<String> arrayList = new ArrayList<>();
        ResultSet resultSet = null;
        try {
            if (connection == null || statement == null) {
//                Class.forName(props.getProperty("jdbc.driver"));
                statement = getConnection(props, connection);
            }
            logger.info("正在执行SQL查询同义词列表,SQL:{}", props.getProperty("jdbc.reload.synonym.sql"));
            resultSet = statement.executeQuery(props.getProperty("jdbc.reload.synonym.sql"));
            while (resultSet.next()) {
                String theWord = resultSet.getString("words");
                arrayList.add(theWord);
            }
        } catch (SQLException e) {
            logger.error(e);
        } finally {
            try {
                if (resultSet != null) {
                    resultSet.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }
 
        }
        return arrayList;
    }
 
    /**
     * 同义词库的加载
     * @return Reader
     */
    @Override
    public Reader getReader() {
 
        StringBuffer sb = new StringBuffer();
        try {
            ArrayList<String> dbData = getDBData();
            for (int i = 0; i < dbData.size(); i++) {
                logger.info("正在加载同义词:{}", dbData.get(i));
                // 获取一行一行的记录,每一条记录都包含多个词,形成一个词组,词与词之间使用英文逗号分割
                sb.append(dbData.get(i))
                        .append(System.getProperty("line.separator"));
            }
        } catch (Exception e) {
            logger.error("同义词加载失败");
        }
        return new StringReader(sb.toString());
    }
 
    /**
     * 获取数据库可执行连接
     * @param props
     * @param conn
     * @throws SQLException
     */
    private static Statement getConnection(Properties props, Connection conn) throws SQLException {
        conn = DriverManager.getConnection(
                props.getProperty(jdbcUrl),
                props.getProperty(jdbcUser),
                props.getProperty(jdbcPassword));
        return conn.createStatement();
    }
}

修改DynamicSynonymTokenFilterFactory类中的getSynonymFile(Analyzer analyzer)方法,对其稍加改动,自定义一个类型,触发调用数据库的查询

SynonymFile getSynonymFile(Analyzer analyzer) {
        try {
            SynonymFile synonymFile;
            if ("fromMySql".equals(location)) {
                synonymFile = new MySqlRemoteSynonymFile(environment, analyzer, expand, lenient, format, location);
            }else if (location.startsWith("http://") || location.startsWith("https://")) {
                synonymFile = new RemoteSynonymFile(
                        environment, analyzer, expand, lenient,  format, location);
            } else {
                synonymFile = new LocalSynonymFile(
                        environment, analyzer, expand, lenient, format, location);
            }
            if (scheduledFuture == null) {
                scheduledFuture = pool.scheduleAtFixedRate(new Monitor(synonymFile),
                                interval, interval, TimeUnit.SECONDS);
            }
            return synonymFile;
        } catch (Exception e) {
            logger.error("failed to get synonyms: " + location, e);
            throw new IllegalArgumentException("failed to get synonyms : " + location, e);
        }
    }

打包测试

开始进行源码编译,使用maven依次执行 clean、compile、package,然后在编译后的targer/releases目录下找到编译后的插件安装包文件.zip;

将其拷贝到ES的安装目录下的\plugins\dynamic-synonym目录下并解压后删除压缩包;然后将jdbc驱动拷贝到当前目录下。

自定义分析器测试即可。

PUT synonyms_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index": {
      "analysis": {
        "filter": {
          "mysql_synonym": {
            "type": "dynamic_synonym",
            "synonyms_path": "fromMySql",
            "interval": 30
          }
        },
        "analyzer": {
          "ik_syno": {
            "type": "custom",
            "tokenizer": "ik_smart",
            "filter": [
              "mysql_synonym"
            ]
          },
          "ik_syno_max": {
            "type": "custom",
            "tokenizer": "ik_max_word",
            "filter": [
              "mysql_synonym"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "ik_syno_max",
        "search_analyzer": "ik_syno"
      },
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

参考资料

https://blog.csdn.net/qq_20919883/article/details/110502496

https://zhuanlan.zhihu.com/p/381936025

https://gitee.com/ykos/elasticsearch-analysis-ik/commits/master

https://github.com/YRREALLYCUTE/elasticsearch-analysis-dynamic-synonym-mysql

标签:jdbc,word,String,null,更新,词库,logger,NULL
From: https://www.cnblogs.com/libin2015/p/17955857

相关文章

  • 分享苹果共享证书 - 持续更新中...
     声明:所有苹果共享证书均来自网络整理共享日期:2024-01-09共享证书:ANBANGINSURANCELTD.下载地址:https://www.sharesign.cn/cert.html 共享日期:2024-01-07共享证书:VIETNAMPOSTSANDTELECOMMUNICATIONSGROUP-HOCHIMINHCITYTELECOMBRANCH下载地址:https://www.......
  • 千呼新零售2.0会员导入 解决商家会员数据迁移、会员数据更新等问题,表格导入、去重检测
    会员导入功能在许多场景中都具有显著的优势,以下是一些主要的优势:提高效率:手动输入和整理会员信息既耗时又容易出错。通过会员导入功能,用户可以快速、准确地导入大量会员数据,节省了大量时间和人力。数据完整性:手动输入数据时,由于人为错误或疏忽,可能会导致数据不完整或不一致。使用会......
  • adfs证书更新
    adfs更换服务通信证书1.将pfx证书安装到所有adfs服务器上,位置:证书\计算机\个人2.右击证书>所有任务>管理私钥>添加,将ADFS部署过程中添加的ADFS服账户赋权,读取权限即可查看adfs服务,可以看到所用的服务账户3.通过powershell命令设置新证书:dircert:\LocalMachine\My#获取新证书......
  • CSS常用效果制作(持续更新)
    当掌握前面的那些基础知识后,现在我们需要对我们所学知识进行练习所以,让我们来练习制作一些炫酷的界面吧。1.制作一个三角形<!DOCTYPEhtml><htmllang="en"><head><metacharset="UTF-8"><metaname="viewport"content="width=device-width,initial-sc......
  • Gin 开发环境下实现代码的热更新部署
    前言在开发过程中,实时的热更新和快速部署是提高开发效率和代码调试的重要因素。热更新部署介绍热更新(HotReload)是一种开发技术,它使开发人员能够在不重启应用程序的情况下实时更新代码。通常,在传统的开发过程中,当我们对代码进行修改后,需要重新编译和重启应用程序才能看到修改的......
  • 百度网盘(百度云)SVIP超级会员共享账号每日更新(2024.01.05)
    一、百度网盘SVIP超级会员共享账号可能很多人不懂这个共享账号是什么意思,小编在这里给大家做一下解答。我们多知道百度网盘很大的用处就是类似U盘,不同的人把文件上传到百度网盘,别人可以直接下载,避免了U盘的物理载体,直接在网上就实现文件传输。百度网盘SVIP会员可以让自己百度账......
  • ctfshow-misc详解(持续更新中)
    杂项签到题目是个损坏的压缩包,考点:伪加密修改如下:保存解压得到flagflag{79ddfa61bda03defa7bfd8d702a656e4}misc2题目描述:偶然发现我竟然还有个软盘,勾起了我的回忆。我的解答:随便选一个虚拟机,然后编辑虚拟机设置然后添加选择软盘驱动器选择使用软盘映......
  • 【THM】Burp Suite:Extensions(Burp Suite扩展·更新版)-学习
    本文相关的TryHackMe实验房间链接:https://tryhackme.com/room/burpsuiteextensions本文相关内容:了解如何使用Extensions模块来扩展BurpSuite的功能。简介在本文中,我们将学习BurpSuite的Extensions(扩展)功能模块,该功能允许开发人员为Burp框架创建附加模块。虽然在本文中并......
  • aspnetcore使用websocket实时更新商品信息
    先演示一下效果,再展示代码逻辑。中间几次调用过程省略。。。暂时只用到了下面四个项目1.产品展示页面中第一次通过接口去获取数据库的列表数据///<summary>///获取指定的商品目录///</summary>///<paramname="pageSize"></param>///<paramname="pageIndex"></p......
  • 视频智能分析/云存储平台EasyCVR接入海康SDK,通道名称未自动更新该如何解决?
    视频监控GB28181平台EasyCVR能在复杂的网络环境中,将分散的各类视频资源进行统一汇聚、整合、集中管理,在视频监控播放上,TSINGSEE青犀视频安防监控汇聚平台可支持1、4、9、16个画面窗口播放,可同时播放多路视频流,也能支持视频定时轮播。视频监控汇聚平台EasyCVR支持多种播放协议,包括:H......