如何在mysql中实现数据去重_mysql数据去重实战方法

MySQL数据去重需按场景选择方法:查重用GROUP BY或DISTINCT;删重推荐ROW_NUMBER()(8.0+)或自连接;预防重复须加唯一索引并配合INSERT IGNORE/ON DUPLICATE KEY UPDATE。

MySQL中数据去重不能靠“一键清除”,得根据场景选对方法:是临时查重、保留一条、彻底删重,还是避免重复写入。核心思路就两条:用GROUP BYDISTINCT查出唯一值;用ROW_NUMBER()(8.0+)或自连接/子查询删掉冗余行。

查重:快速找出重复记录

先确认哪些字段组合存在重复,再决定怎么处理。常用写法:

  • 统计重复次数SELECT name, email, COUNT(*) FROM users GROUP BY name, email HAVING COUNT(*) > 1;
  • 查出所有重复行的完整数据SELECT * FROM users WHERE (name, email) IN (SELECT name, email FROM users GROUP BY name, email HAVING COUNT(*) > 1);

去重保留一条:删除多余但留最新/最旧的一条

适用于已有重复,需清理历史数据。推荐用窗口函数(MySQL 8.0+)更安全清晰:

  • 保留id最大(通常是最新的)DELETE t1 FROM users t1 INNER JOIN users t2 WHERE t1.name = t2.name AND t1.email = t2.email AND t1.id
  • 用窗口函数精准控制(推荐)DELETE FROM users WHERE id IN (SELECT id FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY name, email ORDER BY id DESC) rn FROM users) t WHERE rn > 1);

查询时自动去重:不改原表,只取唯一结果

适合报表、接口等只读场景,简单高效:

  • 基础去重SELECT DISTINCT name, email FROM users;
  • 配合聚合取关键信息SELECT name, email, MAX(created_at) as latest_time FROM users GROUP BY name, email;

预防重复:从源头避免写入重复数据

比事后清理更重要。关键在约束和逻辑:

  • 加唯一索引(最有效)ALTER TABLE users ADD UNIQUE INDEX uk_name_email (name, email); 插入重复时直接报错
  • INSERT IGNOREON DUPLICATE KEY UPDATE 处理冲突,例如:INSERT INTO users (name, email) VALUES ('张三','z@x.com') ON DUPLICATE KEY UPDATE updated_at = NOW();
  • 业务层校验+数据库约束双保险,比如注册前先SELECT检查,再插入,同时表上仍有唯一索引兜底