SQL syntax Language 

How to combine the values of two columns into one without use of UNION and JOIN?

    by S.Moiseenko (2008-08-22)

The problems of this kind are frequently discussed on the web sites of different forums. By the way, it is still not clear for me, why in such cases it is additionally required that UNION and/or JOIN clauses should not be used. I can only suppose that these are the problems, which are usually asked at an interview for a job.

Let me sidetrack a little bit. As far as I can guess, this question would be answered by Joe Celko in the following way: error of design is quite evident, as one attribute is split into two. But let us put aside the issues of designing and further consider the solution of such problem.

Let's create a test table and populate it with some data:


CREATE TABLE T (
col1 INT
, col2 INT
)
GO
INSERT INTO T
SELECT 1, 1
UNION ALL SELECT 1, 3
UNION ALL SELECT NULL, NULL
UNION ALL SELECT NULL, 2
GO

Thus, there is T table, which contains two columns of data of the same type:


SELECT col1, col2
FROM T
col1     col2
 1        1
 1        3
NULL     NULL
NULL      2

It is necessary to get the following result:

col 
1
1
NULL
NULL
1
3
NULL
2

I know of three methods of solution employing standard means of interactive SQL language.

1. UNION ALL

The solution is quite evident and not calling for any comments. It should be noted however that UNION cannot be used for solving of such a problem as it eliminates the duplicates.


SELECT col1 col FROM T
UNION ALL
SELECT col2 FROM T

2. FULL JOIN

In order to preserve the duplicates from various columns, let us make FULL JOIN using the wittingly false predicate, say, 1 = 2:


SELECT T.col1,T1.col2
FROM T FULL JOIN T AS T1 ON 1=2

Results:

col1  col2
  1   NULL
  1   NULL
NULL  NULL
NULL  NULL
NULL   1
NULL   3
NULL  NULL
NULL   2

Then we use COALESCE function, which will produce the desirable results:


SELECT COALESCE(T.col1,T1.col2) col
FROM T FULL JOIN T AS T1 ON 1=2

3. UNPIVOT

Constructions PIVOT and UNPIVOT were presented in the last releases of SQL standard and have been implemented in SQL Server beginning with 2005 release. The first of them enables us to present in a row the values of the column, and the second one will enable us to make the reverse operation:


SELECT col
FROM
   (SELECT col1, col2
   FROM T) p
UNPIVOT
   (col FOR xxx IN
      (col1, col2)
)AS unpvt

The values in columns col1 and col2 are grouped into one column (col) of the supplementary table unpvt. However there is a certain peculiar feature in the use of PIVOT and UNPIVOT clauses - they do not take into account NULL values. The result of the last query will be the following:

col
1
1
1
3
2

One can overcome such a difficulty in solution of this problem by replacing the NULL value by a pseudo-value on the entrance of UNPIVOT operator, i.e. by a value that could not be found in the initial data, and further on the reverse transformation should be made:


SELECT  NULLIF(col,777)
FROM
   (SELECT COALESCE(col1,777) col1, COALESCE(col2,777) col2
   FROM T) p
UNPIVOT
   (col FOR xxx IN
      (col1, col2)
)AS unpvt

In this case COALESCE(col1,777) replaces NULL values in the column col1 by 777, while the NULLIF(col,777) function makes a reverse transformation.

The last solution provides us with the necessary result, however it is faulty for the following reason - the value 777 may sooner or later come up in the data, and it will lead to false results. In order to eliminate this error, one can use the value of another type, which cannot be found in the integer column, i.e., 'x' symbol. Naturally to employ such a method, one should convert integer columns into character string ones for the sake of compatibility, when needed making the reverse transformation of the final result:


SELECT  CAST(NULLIF(col,'x') AS INT)
FROM
   (SELECT COALESCE(CAST(col1 AS VARCHAR),'x') col1,
           COALESCE(CAST(col2 AS VARCHAR),'x') col2
   FROM T) p
UNPIVOT
   (col FOR xxx IN
      (col1, col2)
)AS unpvt

Let me say a few words about efficiency of the presented solutions. According to the query execution plan, the major costs are spent on reading of data (Table scan operation). For the first two solutions (with use of UNION and FULL JOIN) the scanning procedure is carried out twice, while for the last one (UNPIVOT) only once, that accounts for double advantage in terms of performance.


DROP TABLE T


Comments

N.Petrov (aka sql chuvak) 08-28-2008

There is one more variant which I use:


SELECT
CASE a WHEN 1 THEN col1 ELSE col2 END col
FROM T, (SELECT 1 a UNION ALL SELECT 2) B

The Cartesian product of the T table with derived table of 2 rows gives "doubling" (each row of T is presented 2 times, for =1 and =2). For the first case we take value from col1, and for the second - from col2.

Here, certainly, there is both union and join but, in my opinion, the title's question means single scanning the table.





Previous | Index | Next


Home SELECT exercises (rating stages) DML exercises Developers