为何插入10万数据只需2秒

mac2024-05-13  34

文章目录

一、前言二、问题1、url后面useServerPrepStmts是什么?2、url后面rewriteBatchedStatements是什么?3、这两个参数对语句执行有什么影响?4、这两个参数能带来多大的性能提升?5、psts.addBatch();如果把这一段注释掉,SQL就不会执行了,为什么? 三、总结四、疑问五、参考

一、前言

    为了验证不同SQL在大数据量下的执行性能,需要往数据库批量插入几十万条数据。因为这是一个很普遍的需求,所以网上应该会有现成的代码。在一番搜索后,找到了下面的代码,这段代码实现了插入10万条数据只需2秒钟的功能。

package com.wave.checkin.wavecheckin.utils; import java.io.BufferedReader; import java.io.IOException; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.SQLException; import java.util.Date; public class MysqlBatchUtil { private String sql = "INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?)"; private String charset = "utf-8"; private String connectStr = "jdbc:mysql://localhost:3306/test"; private String username = "root"; private String password = ""; private void doStore() throws ClassNotFoundException, SQLException, IOException { Class.forName("com.mysql.jdbc.Driver"); //此处是测试高效批次插入,去掉之后执行时普通批次插入 connectStr += "?useUnicode=true&characterEncoding=utf8&useServerPrepStmts=true&rewriteBatchedStatements=true&serverTimezone=GMT"; Connection conn = (Connection) DriverManager.getConnection(connectStr, username, password); // 设置手动提交 conn.setAutoCommit(false); int count = 0; PreparedStatement psts = conn.prepareStatement(sql); String line = null; Date begin = new Date(); long time = System.currentTimeMillis(); for (int i = 0; i <= 100000; i++) { psts.setString(1, i + "var"); psts.setInt(2, i); psts.setDate(3, new java.sql.Date(time)); psts.setString(4, "1"); // 加入批量处理 psts.addBatch(); count++; } // 执行批量处理 psts.executeBatch(); // 提交 conn.commit(); Date end = new Date(); System.out.println("数量=" + count); System.out.println("运行时间=" + (end.getTime() - begin.getTime())); conn.close(); } public static void main(String[] args) { try { new MysqlBatchUtil().doStore(); } catch (Exception e) { e.printStackTrace(); } } }

    虽说功能已经实现,但是原理还是得弄明白。

二、问题

    为了探究代码背后的运行逻辑,我提出了几个问题:

1、url后面useServerPrepStmts是什么?

    mysql官方文档搜索useServerPrepStmts,找到了下面一段话:

Two variants of prepared statements are implemented by Connector/J, the client-side and the server-side prepared statements. Client-side prepared statements are used by default because early MySQL versions did not support the prepared statement feature or had problems with its implementation. Server-side prepared statements and binary-encoded result sets are used when the server supports them. To enable usage of server-side prepared statements, set useServerPrepStmts=true.

    大意就是 Connector/J(也就是JBDC)预编译分为客户端预编译和服务端预编译,默认是使用客户端预编译,因为早期版本MYSQL不支持预编译或者这功能有问题。如果要使用服务器预编译,就设置useServerPrepStmts=true.。     那么,什么是服务器预编译呢。继续查。

MySQL 8.0 provides support for server-side prepared statements. This support takes advantage of the efficient client/server binary protocol. Using prepared statements with placeholders for parameter values has the following benefits:     1. Less overhead for parsing the statement each time it is executed. Typically, database applications process large volumes of almost-identical statements, with only changes to literal or variable values in clauses such as WHERE for queries and deletes, SET for updates, and VALUES for inserts.     2. Protection against SQL injection attacks. The parameter values can contain unescaped SQL quote and delimiter characters.

    大意就是MYSQL8.0支持服务端预编译语句。这类语句将参数用占位符替代,这样做的好处有以下两点:

减少每次语句执行时的语法解析。举个例子,select * from user where state = ?,这是一个预编译语句,只会解析一次,之后无论?传入什么参数都不需要再进行语法解析,达到“一次编译、多次运行"的效果;对于普通语句,只要SQL不是百分百一样,都需要进行语法解析。防止SQL注入攻击。 参数值可以包含未转义的SQL引号和分隔符。

    另外补充一点,MySQL Server 4.1之前的版本是不支持预编译的,而Connector/J(也就是JBDC)在5.0.5以后的版本,默认是没有开启服务端预编译功能的。

2、url后面rewriteBatchedStatements是什么?

    mysql官方文档搜索rewriteBatchedStatements,找到了下面一段话:

Should the driver use multiqueries (irregardless of the setting of “allowMultiQueries”) as well as rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() is called? Notice that this has the potential for SQL injection if using plain java.sql.Statements and your code doesn’t sanitize input correctly. Notice that for prepared statements, server-side prepared statements can not currently take advantage of this rewrite option, and that if you don’t specify stream lengths when using PreparedStatement.set*Stream(), the driver won’t be able to determine the optimum number of parameters per batch and you might receive an error from the driver that the resultant packet is too large. Statement.getGeneratedKeys() for these rewritten statements only works when the entire batch includes INSERT statements. Please be aware using rewriteBatchedStatements=true with INSERT … ON DUPLICATE KEY UPDATE that for rewritten statement server returns only one value as sum of all affected (or found) rows in batch and it isn’t possible to map it correctly to initial statements; in this case driver returns 0 as a result of each batch statement if total count was 0, and the Statement.SUCCESS_NO_INFO as a result of each batch statement if total count was > 0.

    翻译过来就是,在executeBatch()方法被执行时,是否使用多查询(无论是否设置allowMultiQueries属性)以及将用于插入的预编译语句重写为多值插入?请注意,如果使用java.sql.Statements并且没有对输入进行校验,那么就这有可能遭到SQL注入攻击。请注意,服务器预编译语句当前无法利用此重写选项,并且如果在使用PreparedStatement.set * Stream()时未指定流长度,则驱动程序将无法确定最佳的每批参数数量,您可能会从驱动程序收到错误消息,提示结果包太大。这些重写语句的Statement.getGeneratedKeys()仅在整个批处理都包含INSERT语句时才起作用。请注意,在INSERT上使用rewriteBatchedStatements = true。     从这段话我们可以了解到,rewriteBatchedStatements对服务端无效,所以是作用于客户端的。当这个参数设置为true,在executeBatch()方法被执行时,预编译语句会重写成多值插入再传给服务端,而不是一条条SQL传给服务端。

3、这两个参数对语句执行有什么影响?

    现在知道了rewriteBatchedStatements用于重写客户端预编译语句,useServerPrepStmts用于开启服务端预编译功能。那么,现在就开始验证吧。笔者选用mysql5.7(已支持预编译)+Connector/J 8.0.18(默认未开启服务端预编译)。     第一种情况,不加上rewriteBatchedStatements和useServerPrepStmts,循环插入两条数据,通过mysql通用日志查看语句执行情况。

?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT 2019-11-01T01:43:51.832520Z 79 Query SHOW WARNINGS 2019-11-01T01:43:51.836262Z 79 Query SET NAMES utf8mb4 2019-11-01T01:43:51.836420Z 79 Query SET character_set_results = NULL 2019-11-01T01:43:51.836653Z 79 Query SET autocommit=1 2019-11-01T01:43:51.841252Z 79 Query SET autocommit=0 2019-11-01T01:43:51.865821Z 79 Query SELECT @@session.transaction_read_only 2019-11-01T01:43:51.866303Z 79 Query INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1') 2019-11-01T01:43:51.891904Z 79 Query INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('1var',1,'2019-11-01','1') 2019-11-01T01:43:51.892136Z 79 Query commit 2019-11-01T01:43:51.977174Z 79 Query rollback 2019-11-01T01:43:51.981484Z 79 Quit

    可以看出,服务端没有进行预编译,使用Query命令执行了两条新增SQL语句。     第二种情况,设置useServerPrepStmts=true。

?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT&useServerPrepStmts=true 2019-11-01T02:23:46.145078Z 89 Query SHOW WARNINGS 2019-11-01T02:23:46.148454Z 89 Query SET NAMES utf8mb4 2019-11-01T02:23:46.148588Z 89 Query SET character_set_results = NULL 2019-11-01T02:23:46.148809Z 89 Query SET autocommit=1 2019-11-01T02:23:46.153224Z 89 Query SET autocommit=0 2019-11-01T02:23:46.166311Z 89 Prepare INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?) 2019-11-01T02:23:46.171129Z 89 Query SELECT @@session.transaction_read_only 2019-11-01T02:23:46.171392Z 89 Query SELECT @@session.transaction_read_only 2019-11-01T02:23:46.175887Z 89 Execute INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1') 2019-11-01T02:23:46.204478Z 89 Execute INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('1var',1,'2019-11-01','1') 2019-11-01T02:23:46.204878Z 89 Query commit 2019-11-01T02:23:46.487619Z 89 Query rollback 2019-11-01T02:23:46.492423Z 89 Quit

    可以看出,服务端进行预编译Prepare,使用Execute命令执行了两条新增SQL语句。     第三种情况,设置rewriteBatchedStatements=true。

?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT&rewriteBatchedStatements=true 2019-11-01T02:40:19.644778Z 90 Query SHOW WARNINGS 2019-11-01T02:40:19.650422Z 90 Query SET NAMES utf8mb4 2019-11-01T02:40:19.650650Z 90 Query SET character_set_results = NULL 2019-11-01T02:40:19.650932Z 90 Query SET autocommit=1 2019-11-01T02:40:19.655743Z 90 Query SET autocommit=0 2019-11-01T02:40:19.680848Z 90 Query SELECT @@session.transaction_read_only 2019-11-01T02:40:19.681998Z 90 Query INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1'),('1var',1,'2019-11-01','1') 2019-11-01T02:40:19.700368Z 90 Query commit 2019-11-01T02:40:19.725935Z 90 Query rollback 2019-11-01T02:40:19.730085Z 90 Quit

    可以看出,服务端进行没有进行预编译,使用Query命令执行了一条多值的新增SQL语句。     第四种情况,同时设置rewriteBatchedStatements=true和useServerPrepStmts=true。

?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT&rewriteBatchedStatements=true&useServerPrepStmts=true 2019-11-01T02:42:25.799594Z 93 Query SHOW WARNINGS 2019-11-01T02:42:25.803008Z 93 Query SET NAMES utf8mb4 2019-11-01T02:42:25.803138Z 93 Query SET character_set_results = NULL 2019-11-01T02:42:25.803354Z 93 Query SET autocommit=1 2019-11-01T02:42:25.807759Z 93 Query SET autocommit=0 2019-11-01T02:42:25.822828Z 93 Prepare INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?) 2019-11-01T02:42:25.826277Z 93 Query SELECT @@session.transaction_read_only 2019-11-01T02:42:25.827500Z 93 Prepare INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?),(?,?,?,?) 2019-11-01T02:42:25.832597Z 93 Execute INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1'),('1var',1,'2019-11-01','1') 2019-11-01T02:42:25.851448Z 93 Close stmt 2019-11-01T02:42:25.851605Z 93 Query commit 2019-11-01T02:42:25.880361Z 93 Query rollback 2019-11-01T02:42:25.884408Z 93 Quit

    可以看出,服务端进行进行了两次预编译Prepare,使用Execute命令执行了一条多值SQL语句。

4、这两个参数能带来多大的性能提升?

    分别对上面提到的四种情况做测试,每一种情况下插入10万条数据,执行5次,记录时间,结果如下:

第一种情况,不设置rewriteBatchedStatements和useServerPrepStmts。 10934 8849 8195 11846 9388 第二种情况,设置useServerPrepStmts=true9837 7045 7553 7195 7931 第三种情况,设置rewriteBatchedStatements=true2799 1481 1456 1180 1543 第四种情况,同时设置rewriteBatchedStatements=true和useServerPrepStmts=true4211 1974 1867 2322 2381

    不难看出,只设置rewriteBatchedStatements=true带来的性能提升是很可观的,而只设置useServerPrepStmts=true只带来了一点点的提升。很有意思的一点是,同时设置rewriteBatchedStatements=true和useServerPrepStmts=true比只设置rewriteBatchedStatements=true的性能要略差一点。     结合mysql通用日志进行分析,可以得到以下结论:     1) 批量插入大量数据时设置rewriteBatchedStatements=true可以重写SQL语句,将多条SQL合并成一条SQL,再交由服务端处理,从而大大减少执行时间。     2) 只设置useServerPrepStmts=true,可以略微提升性能,是因为Prepare-Execute的执行模式要比单一的Query更快。     3) 为什么同时设置rewriteBatchedStatements=true和useServerPrepStmts=true比只设置rewriteBatchedStatements=true的性能要略差一点,是因为Prepare本身是有开销的,在只需要执行一条SQL的时候,这种开销相对来说会比较大。

5、psts.addBatch();如果把这一段注释掉,SQL就不会执行了,为什么?
for (int i = 0; i <= 100000; i++) { psts.setString(1, i + "var"); psts.setInt(2, i); psts.setDate(3, new java.sql.Date(time)); psts.setString(4, "1"); // 加入批量处理 psts.addBatch(); count++; } public void addBatch() throws SQLException { try { synchronized(this.checkClosed().getConnectionMutex()) { QueryBindings<?> queryBindings = ((PreparedQuery)this.query).getQueryBindings(); queryBindings.checkAllParametersSet(); this.query.addBatch(queryBindings.clone()); // } } catch (CJException var6) { throw SQLExceptionsMapping.translateException(var6, this.getExceptionInterceptor()); } } AbstractQuery类: public void addBatch(Object batch) { if (this.batchedArgs == null) { this.batchedArgs = new ArrayList(); } this.batchedArgs.add(batch); //this.batchedArgs为protect List<Object> batchedArgs; }

    总的来说,psts.addBatch(); 所做的就是把每一条新增SQL的参数给存到批量参数列表中,在调用psts.executeBatch方法时,将批量参数列表发送给服务端。

三、总结

    之所以插入10万数据只需2秒,主要原因是URL设置了rewriteBatchedStatements=true。

四、疑问

    在做性能测试的时候,每一种情况的第一次运行都会比后面几次要慢,但是MYSQL通用日志是一模一样的,这就很奇怪了。有时间的话再另外写一篇博客去探究下。

五、参考

MYSQL官方文档

最新回复(0)