corsasport.co.uk
 

Corsa Sport » Message Board » Off Day » Geek Day » Mysql Gurus - Duplicate Rows Question


New Topic

New Poll
  Subscribe | Add to Favourites

You are not logged in and may not post or reply to messages. Please log in or create a new account or mail us about fixing an existing one - register@corsasport.co.uk

There are also many more features available when you are logged in such as private messages, buddy list, location services, post search and more.


Author Mysql Gurus - Duplicate Rows Question
Rob_Quads
Member

Registered: 29th Mar 01
Location: southampton
User status: Offline
20th Sep 11 at 09:04   View User's Profile U2U Member Reply With Quote

I have a mysql database with a table with the following structure...

CREATE TABLE `perf_v8_ga_results` (
`Build` varchar(30) DEFAULT NULL,
`Host` varchar(50) DEFAULT NULL,
`OS` varchar(30) DEFAULT NULL,
`MQ` varchar(30) DEFAULT NULL,
`DB` varchar(30) DEFAULT NULL,
`Start` datetime DEFAULT NULL,
`End` datetime DEFAULT NULL,
`TestCase` varchar(100) DEFAULT NULL,
`State` varchar(30) DEFAULT NULL,
`Ignore` varchar(30) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT 'NO',
`Persistence` varchar(10) DEFAULT NULL,
`MsgSize` int(11) DEFAULT NULL,
`MsgPerSec` float DEFAULT NULL,
`MsgPerSec_FullStdDeviation` float NOT NULL,
`MsgPerSec_IQRange` float NOT NULL,
`Server_Machine_CPU_Pct` float DEFAULT NULL,
`Server_Machine_CPU_Pct_FullStdDeviation` float NOT NULL,
`Server_Machine_CPU_Pct_IQRange` float NOT NULL,
`CPUCostPerMsg` float DEFAULT NULL,
`CPUS` int(11) DEFAULT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `index2` (`Host`,`Start`,`End`,`Build`),
KEY `IGNORE` (`Ignore`)
)

For some reason the unique key does not seem to be working as I have a number of entries where the Host/Start/End/Build are all the same.

a) Anyone know why the unique key might not be working?

b) Anyone know a query which will delete all the duplicates (without using a temp table as it needs to be done on the live DB)

Seem stuff around the following but it does not seem to match anything

SELECT *
FROM perf_v8_ga_results, perf_v8_ga_results AS vtable
WHERE vtable.id > perf_v8_ga_results.id
AND perf_v8_ga_results.Host = vtable.Host
AND perf_v8_ga_results.Start = vtable.Start
AND perf_v8_ga_results.End = vtable.End
AND perf_v8_ga_results.Build = vtable.Build

Cheers
Ian
Site Administrator

Avatar

Registered: 28th Aug 99
Location: Liverpool
User status: Offline
20th Sep 11 at 10:18   View Garage View User's Profile U2U Member Reply With Quote

Is index2 a compound key?

Give me some rows of data that you have the issue with.
Rob_Quads
Member

Registered: 29th Mar 01
Location: southampton
User status: Offline
20th Sep 11 at 11:51   View User's Profile U2U Member Reply With Quote

How do I check if its a compound key?

Here is some data


INSERT INTO `perf_v8_ga_results` (`Build`, `Host`, `OS`, `MQ`, `DB`, `Start`, `End`, `TestCase`, `State`, `Ignore`, `Persistence`, `MsgSize`, `MsgPerSec`, `MsgPerSec_FullStdDeviation`, `MsgPerSec_IQRange`, `Server_Machine_CPU_Pct`, `Server_Machine_CPU_Pct_FullStdDeviation`, `Server_Machine_CPU_Pct_IQRange`, `CPUCostPerMsg`, `CPUS`, `id`) VALUES
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 05:59:49', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.47, 0.85, 1, 37.78, 1.61, 1.75, 67.25, 4, 710081),
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 05:59:49', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.47, 0.85, 1, 37.78, 1.61, 1.75, 67.25, 4, 710748),
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 065:15', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.47, 0.98, 1, 37.44, 3.01, 3.75, 66.65, 4, 710082),
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 065:15', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.47, 0.98, 1, 37.44, 3.01, 3.75, 66.65, 4, 710749),
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 068:31', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.36, 0.98, 1, 37.17, 4.98, 3, 66.49, 4, 710083),
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 068:31', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.36, 0.98, 1, 37.17, 4.98, 3, 66.49, 4, 710750),
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 06:11:46', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.18, 0.99, 1, 37.78, 1.96, 3.75, 68.13, 4, 710084),
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 06:11:46', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.18, 0.99, 1, 37.78, 1.96, 3.75, 68.13, 4, 710751),
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 06:152', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.1, 0.74, 1, 36.28, 1.34, 2.75, 65.67, 4, 710085),
('S000-L110914.1_P', 'lime11.hursley.ibm.com', 'WIN2008R2_SVR_GOLD', 'MQ_7_0_1_5', 'DB29_5_0', '2011-09-18 06:152', NULL, 'REG_A_V6LARGEMESSAGE', 'PASSED', 'NO', 'NONE', 204800, 22.1, 0.74, 1, 36.28, 1.34, 2.75, 65.67, 4, 710752);
Ian
Site Administrator

Avatar

Registered: 28th Aug 99
Location: Liverpool
User status: Offline
20th Sep 11 at 18:42   View Garage View User's Profile U2U Member Reply With Quote

Compound means in the index relies on all the fields together as one to make the decision on duplication.

So if all four fields are the same you have a duplicate, but if you have identical rows but for example the date stamp is different, that is not a duplicate row, even though some of the fields named in the index are duplicated.

You can count the duplicates using GROUP BY

SELECT host, start, end, build, COUNT(*) FROM table GROUP BY host, start, end, build

You can remove duplicates using CREATE TABLE AS SELECT DISTINCT field field field

There is no easy way to remove all duplicates with DELETE statements on the table, without caching the locations of the duplicates in a bit of programming. It can be done programmatically but the simpler solution is a new table based on a DISTINCT query on the old one.
Rob_Quads
Member

Registered: 29th Mar 01
Location: southampton
User status: Offline
21st Sep 11 at 10:21   View User's Profile U2U Member Reply With Quote

My understanding was that the index was compound - thats why I want it as I don't want the table to include multiple entries of the same result.

Not quite sure why its not working.

 
New Topic

New Poll

  Related Threads Author Forum Replies Views Last Post
Server Help Jodi_the_g Geek Day 7 1431
19th Jan 06 at 15:18
by Jodi_the_g
 
My SQL-Nearly there! Tom J Geek Day 31 3132
25th Dec 06 at 23:32
by Ian
 
php and mysql help please bigD21 Geek Day 4 422
30th Aug 08 at 16:46
by bigD21
 
PHP help again Dan Lewis Geek Day 25 1357
3rd Feb 09 at 21:26
by Dan Lewis
 
back work out Tom G Sports Day 9 312
15th Aug 11 at 12:19
by sc0ott
 

Corsa Sport » Message Board » Off Day » Geek Day » Mysql Gurus - Duplicate Rows Question 28 database queries in 0.0124571 seconds