NOT IN 与 NOT EXSIST 排除NULL

论坛 期权论坛 脚本     
已经匿名di用户   2022-5-29 19:11   930   0

原文出处: http://blog.csdn.net/dba_huangzj/article/details/31374037 转载请引用

之前在论坛中见到一个针对in/exists的讨论,原帖懒得找了,这里介绍一下最近的学习小结:

NOT IN和NOT EIXTS在对允许为null的列查询时会有一定的风险。特别是NOT IN,如果子查询包含了最少一个NULL,会出现非预期的结果。下面做一个演示。

  1. IF OBJECT_ID('ShipmentItems', 'U') IS NOT NULL
  2. DROP TABLE dbo.ShipmentItems;
  3. GO
  4. CREATE TABLE dbo.ShipmentItems
  5. (
  6. ShipmentBarcode VARCHAR(30) NOT NULL ,
  7. Description VARCHAR(100) NULL ,
  8. Barcode VARCHAR(30) NOT NULL
  9. );
  10. GO
  11. INSERT INTO dbo.ShipmentItems
  12. ( ShipmentBarcode ,
  13. Barcode ,
  14. Description
  15. )
  16. SELECT '123456' ,
  17. '1010203' ,
  18. 'Some cool widget'
  19. UNION ALL
  20. SELECT '123654' ,
  21. '1010203' ,
  22. 'Some cool widget'
  23. UNION ALL
  24. SELECT '123654' ,
  25. '1010204' ,
  26. 'Some cool stuff for some gadget';
  27. GO
  28. -- retrieve all the items from shipment 123654
  29. -- that are not shipped in shipment 123456
  30. SELECT Barcode
  31. FROM dbo.ShipmentItems
  32. WHERE ShipmentBarcode = '123654'
  33. AND Barcode NOT IN ( SELECT Barcode
  34. FROM dbo.ShipmentItems
  35. WHERE ShipmentBarcode = '123456' );
  36. /*
  37. Barcode
  38. ------------------------------
  39. 1010204
  40. */
IF OBJECT_ID('ShipmentItems', 'U') IS NOT NULL 
    DROP TABLE dbo.ShipmentItems; 
GO 
CREATE TABLE dbo.ShipmentItems 
    ( 
      ShipmentBarcode VARCHAR(30) NOT NULL , 
      Description VARCHAR(100) NULL , 
      Barcode VARCHAR(30) NOT NULL 
    ); 
GO 
INSERT  INTO dbo.ShipmentItems 
        ( ShipmentBarcode , 
          Barcode , 
          Description 
        ) 
        SELECT  '123456' , 
                '1010203' , 
                'Some cool widget' 
        UNION ALL 
        SELECT  '123654' , 
                '1010203' , 
                'Some cool widget' 
        UNION ALL 
        SELECT  '123654' , 
                '1010204' , 
                'Some cool stuff for some gadget'; 
GO 
-- retrieve all the items from shipment 123654 
-- that are not shipped in shipment 123456 
SELECT  Barcode 
FROM    dbo.ShipmentItems 
WHERE   ShipmentBarcode = '123654' 
        AND Barcode NOT IN ( SELECT Barcode 
                             FROM   dbo.ShipmentItems 
                             WHERE  ShipmentBarcode = '123456' ); 
/* 
Barcode 
------------------------------ 
1010204
*/


可以看出得到了期待结果。下面看看修改表结构,允许列为null的情况:

  1. ALTER TABLE dbo.ShipmentItems
  2. ALTER COLUMN Barcode VARCHAR(30) NULL;
  3. INSERT INTO dbo.ShipmentItems
  4. ( ShipmentBarcode ,
  5. Barcode ,
  6. Description
  7. )
  8. SELECT '123456' ,
  9. NULL ,
  10. 'Users manual for some gadget';
  11. GO
  12. SELECT Barcode
  13. FROM dbo.ShipmentItems
  14. WHERE ShipmentBarcode = '123654'
  15. AND Barcode NOT IN ( SELECT Barcode
  16. FROM dbo.ShipmentItems
  17. WHERE ShipmentBarcode = '123456' );
  18. /*
  19. Barcode
  20. ------------------------------
  21. */
ALTER TABLE dbo.ShipmentItems 
ALTER COLUMN Barcode VARCHAR(30) NULL; 
INSERT  INTO dbo.ShipmentItems 
        ( ShipmentBarcode , 
          Barcode , 
          Description 
        ) 
        SELECT  '123456' , 
                NULL , 
                'Users manual for some gadget'; 
GO 
SELECT  Barcode 
FROM    dbo.ShipmentItems 
WHERE   ShipmentBarcode = '123654' 
        AND Barcode NOT IN ( SELECT Barcode 
                             FROM   dbo.ShipmentItems 
                             WHERE  ShipmentBarcode = '123456' ); 
/* 
Barcode 
------------------------------
*/


很多人会觉得这是一个bug,有时候能查出数据,有时候却不能。但是实际上不是bug,当NOT IN子句返回最少一个NULL时,查询会返回空,下面的语句能更好地说明这个想法:

  1. SELECT CASE WHEN 1 NOT IN ( 2, 3 ) THEN 'True'
  2. ELSE 'Unknown or False'
  3. END ,
  4. CASE WHEN 1 NOT IN ( 2, 3, NULL ) THEN 'True'
  5. ELSE 'Unknown or False'
  6. END;
  7. /*
  8. ---- ----------------
  9. True Unknown or False
  10. */
SELECT  CASE WHEN 1 NOT IN ( 2, 3 ) THEN 'True' 
             ELSE 'Unknown or False' 
        END , 
        CASE WHEN 1 NOT IN ( 2, 3, NULL ) THEN 'True' 
             ELSE 'Unknown or False' 
        END; 
/* 
---- ---------------- 
True Unknown or False 
*/


实际上,由于IN的本质是OR操作,所以:


  1. SELECT CASE WHEN 1 IN ( 1, 2, NULL ) THEN 'True'
  2. ELSE 'Unknown or False'
  3. END ;
SELECT  CASE WHEN 1 IN ( 1, 2, NULL ) THEN 'True' 
             ELSE 'Unknown or False' 
        END ;

中,1 in 1,也就是为TRUE,所以返回true,这个语句的逻辑实际上是:

  1. SELECT CASE WHEN ( 1 = 1 )
  2. OR ( 1 = 2 )
  3. OR ( 1 = NULL ) THEN 'True'
  4. ELSE 'Unknown or False'
  5. END ;
SELECT  CASE WHEN ( 1 = 1 ) 
                  OR ( 1 = 2 ) 
                  OR ( 1 = NULL ) THEN 'True' 
             ELSE 'Unknown or False' 
        END ;
 


当使用NOT IN 时,如下面的语句:

  1. SELECT CASE WHEN 1 NOT IN ( 1, 2, NULL ) THEN 'True'
  2. ELSE 'Unknown or False'
  3. END ;
SELECT  CASE WHEN 1 NOT IN ( 1, 2, NULL ) THEN 'True' 
             ELSE 'Unknown or False' 
        END ;


会转变成:

  1. SELECT CASE WHEN NOT ( ( 1 = 1 )
  2. OR ( 1 = 2 )
  3. OR ( 1 = NULL )
  4. ) THEN 'True'
  5. ELSE 'Unknown or False' END ;
SELECT  CASE WHEN NOT ( ( 1 = 1 ) 
                        OR ( 1 = 2 ) 
                        OR ( 1 = NULL ) 
                      ) THEN 'True' 
             ELSE 'Unknown or False' END ;


根据离散数学的概念,可以转换为:

  1. SELECT CASE WHEN ( ( 1 <> 1 )
  2. AND ( 1 <> 2 )
  3. AND ( 1 <> NULL )
  4. ) THEN 'True'
  5. ELSE 'Unknown or False'
  6. END ;
SELECT  CASE WHEN ( ( 1 <> 1 ) 
                    AND ( 1 <> 2 ) 
                    AND ( 1 <> NULL ) 
                  ) THEN 'True' 
             ELSE 'Unknown or False' 
        END ;


谓词有短路特性,即在AND条件中,只要有一个条件为false,整个条件都为false,而1<>1是为false,所以后面的也不需要判断了,直接返回else部分。即使是1<>null,根据集合论的特性,NULL和实际数据的对比总是返回unknown,所以也是为false。如果你非要用NOT IN ,请确保子查询永远不会有NULL返回。或者需要额外处理去除NULL,比如:

  1. SELECT Barcode
  2. FROM dbo.ShipmentItems
  3. WHERE ShipmentBarcode = '123654'
  4. AND Barcode NOT IN ( SELECT Barcode
  5. FROM dbo.ShipmentItems
  6. WHERE ShipmentBarcode = '123456'
  7. AND Barcode IS NOT NULL ) ;
SELECT  Barcode 
FROM    dbo.ShipmentItems 
WHERE   ShipmentBarcode = '123654' 
  AND Barcode NOT IN ( SELECT Barcode 
                       FROM   dbo.ShipmentItems 
                       WHERE  ShipmentBarcode = '123456' 
                         AND Barcode IS NOT NULL ) ;


还有一种方法就是改写语句,用NOT EXISTS来等价替换:

  1. SELECT i.Barcode
  2. FROM dbo.ShipmentItems AS i
  3. WHERE i.ShipmentBarcode = '123654'
  4. AND NOT EXISTS ( SELECT *
  5. FROM dbo.ShipmentItems AS i1
  6. WHERE i1.ShipmentBarcode = '123456'
  7. AND i1.Barcode = i.Barcode );
  8. /*
  9. Barcode
  10. ------------------------------
  11. 1010204
  12. */
SELECT  i.Barcode 
FROM    dbo.ShipmentItems AS i 
WHERE   i.ShipmentBarcode = '123654' 
        AND NOT EXISTS ( SELECT * 
                         FROM   dbo.ShipmentItems AS i1 
                         WHERE  i1.ShipmentBarcode = '123456' 
                                AND i1.Barcode = i.Barcode ); 
/* 
Barcode 
------------------------------ 
1010204
*/


另外,基于SARG要求,一般不建议用NOT IN/NOT EXISTS这种反向扫描,避免影响性能。还有一个选择使用IN/EXISTS的要点,就是多列匹配的问题,在T-SQL中,多列同时匹配要用EXISTS,而单列匹配可以用EXISTS/IN。可能可以用其他写法来实现IN的多列匹配,但是一般我个人会选择使用EXISTS来匹配多列。


原文出自:CSDN博客:黄钊吉的博客

分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:81
帖子:4969
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP