Home » SQL » Membandingkan Dua Sumber Data Pada SQL (JOIN)

Dalam mengolah data bersifat kompleks diperlukan sebuah analisa terhadap data yang sekarang dengan data yang terdahulu biasanya data ini diperlukan oleh market analis, data ini digunakan untuk membandingkan peningkatan atau penurunan data.

Sebagai contoh ada dua sumber data dari tabel, dengan syarat harus ada data pembanding yang dijadikan keyword. Untuk lebih jelasnya saya ambil dari artikel IBM sebagai berikut:

Whereas the most overwhelming attribute of a WHERE clause predicate applied to the preserved-row table is that DB2 can apply the predicate before or after the join; a WHERE clause predicate applied to the NULL-supplying table has a very different impact, in that it causes DB2 to simplify the join if the predicate negates NULLs introduced by the outer join.
To explain what I mean by this, see Figure 13 , which shows an example of a WHERE clause predicate that references the NULL-supplying table.

Left Outer Join -- join simplification

A WHERE clause predicate must evaluate to TRUE for a row to qualify. If there is an unmatched row in the join, the columns from the NULL-supplying table are NULL. When DB2 compares the WHERE clause predicate D.DEPTNAME NOT LIKE '%CENTER%' to a NULL, the result is neither TRUE nor FALSE, but UNKNOWN. This row is not returned because it does not evaluate to TRUE. Thus, NULLs that are supplied by the left outer join are negated by the WHERE clause predicate. This causes DB2 to determine that a left outer join is unnecessary, and it causes the query to be rewritten as an inner join, which may or may not be what you intended when you coded the query.
The advantage to having DB2 rewrite the left outer as an inner join is that performance might improve. The predicate that is applied to table D can now be applied before the join, rather than after, since it now references a non-NULL-supplying table. Preserved-row and NULL-supplying are unrelated terms for inner joins; given that there will never be an unmatched row that is returned in the join, that is, both tables are non-NULL-supplying.
Preserving NULLs
If this outer join simplification does not give you the results you wanted -- that is, you require that the NULL (or optional) rows be returned -- then use OR D.DEPTNAME IS NULL to preserve the NULLs in the answer set.
An example of this is shown in Figure 14 .

Left Outer Join -- NULL preservation

DB2 must apply the WHERE clause predicate after the join, because it is not known until after the join whether the row has been matched (and thus the first part of the predicate -- D.DEPTNAME NOT LIKE '%CENTER%' -- applies) or unmatched (and thus the second part of the predicate -- OR D.DEPTNAME IS NULL -- applies).
Before-join predicates
What happens if you choose to code the WHERE clause predicate to be applied by DB2 upon the NULL-supplying table before the join?
If you do this, there is no WHERE clause predicate to limit rows on the preserved-row table or in the final result. You are only limiting rows from the NULL-supplying table. Figure 15 shows the result of this.

Predicates on NULL-supplying table

Figure 16 shows that if you recode the before-join predicate to an ON clause predicate, the same results are returned.

Predicates on NULL-supplying table -- simplified

Predicates on NULL-supplying table -- simplified

Whether the rows from the NULL-supplying table are filtered before or during the join is a matter for performance only, and DB2 decides this based on the join method used. For producing the correct result, either method is valid. Because the row does not match in the join, removing the row before the join does not affect the output.
DB2 is able (from Version 6 on) to merge any unnecessary nested table expressions such as those on the NULL-supplying table ( Figure 15 ) to be rewritten as a single query block and applied as a during-join or before-join predicate ( Figure 16 ).

Back to top

Full outer join NULL-supplying table predicates

Outer join simplification
The rule for outer join simplification that is applicable to left and right outer joins is also valid for full joins. A WHERE clause predicate applied to the NULL-supplying table that causes the NULLs to be negated causes DB2 to simplify the join. Because both tables supply NULLs, this clause negates NULLs regardless of which table it is applied to.
Figure 17 shows an example of a WHERE clause predicate applied to a full outer join, which causes DB2 to simplify the join.
Full Outer Join -- join simplification

Without join simplification, the predicate is applied as a totally-after-join predicate. Given that the optimizer can determine that the predicate negates NULLs, it can rewrite the query as a left outer join. A full outer join can be rewritten as either a left or right outer join, depending on the table join sequence chosen.
Rewriting the query as a left outer join means that the WHERE predicate can now be applied as a before-join predicate, resulting in fewer rows being joined. If the WHERE clause predicates negate the NULLs from both tables, then simplification causes the query to be rewritten as an inner join. Executing the query as an inner join allows predicates on both tables to be applied before the join.
Join simplification can be identified by the JOIN_TYPE column of the explain (plan table) output. A value of 'L' indicates that the join has been simplified to a left outer join (as there is no runtime right outer join), and a 'blank' indicates an inner join.
Preserving NULLs
If you require that the NULLs from both tables be returned in the result set, alter the WHERE clause predicate to ensure that they are not removed.
Figure 18 shows an example of a WHERE clause predicate that does preserves NULLs. COALESCE returns the first value in the list that is not NULL. Thus, the WHERE comparison is always against an actual value (unless the column is defined to allow NULLs).

Full Outer Join -- NULL preservation

In the example shown in Figure 18 , the WHERE clause predicate is applied totally after the join, because the WHERE predicate depends on columns from both tables.
Is there a better alternative?
Before-join predicates
You can get better performance by coding the query to apply these predicates before the join (as shown in Figure 19 ).

Full Outer Join -- NULL preservation alternative

Full Outer Join -- NULL preservation alternative

Back to top

Be careful when coding multiple outer joins

Missing rows
In Part one , I mentioned that obtaining the correct result is the major focus when writing any SQL statement. And with multiple joins involving outer joins, it is very easy to accidentally lose rows based upon the source of the join predicates.
Of course, I have already explained that columns from a NULL-supplied table will be NULL for an unmatched row. If a column from a NULL-supplied table is referenced as a join predicate in a subsequent join, then NULL will never equate, and thus a further match will not be made. Without interrogating the data, there may be no indication that these rows are missing, because it is possible that outer join simplification was not necessary by DB2. Outer join simplification is at least identifiable in the plan table (based on JOIN_TYPE column).
Figure 20 shows an example of a NULL-supplied column that is used in a subsequent join.

Multiple Outer Joins -- Missing Rows

Step 1 applies the before-join predicate. Step 2 performs the left outer join. The left outer join from the department and employee tables (step 2) does not produce a matching row in this example. Thus, when step 3 is performed (subsequent left outer join to the project table), the value of the join predicate from the previous join is NULL, because it was from the NULL-supplied table. The row is still preserved, however, because it is a left outer join. For an inner join, the row would not have been preserved.
Finding missing rows
If you ensure that subsequent join predicates always refer to columns from preserved-row tables, the true value will be available for the later join. This is an extremely important point to ensure that the correct rows are returned in the join.
Figure 21 shows an example whereby the second join refers to a join predicate from the preserved-row table.

Multiple Outer Joins -- Finding Missing Rows

Multiple Outer Joins -- Finding Missing Rows

In this example, I have corrected the error from the previous example ( Figure 20 ). The resultant data from step 2 (first left outer join) contains the actual value 'D01' from the preserved row table. This value is used in step 3 (subsequent left outer join) to match against the project table, rather than against NULL.
The dependence on join sequence for this query requires that the department table be accessed first. The DB2 for z/OS ^TM optimizer (from Version 6 on) is able to determine the remaining join sequences based on the lowest cost, rather than the coding sequence. Each of the joins highlighted as step 2 and 3 depend on the department table only, and not on each other.
For full outer joins, it is more difficult to specify a join column from the preserved-row (or non-NULL-supplying) table, because both tables can supply NULLs. You must always specify the COALESCE clause to ensure that a preserved value is always specified by the join. For example:
ON P.DEPTNO = COALESCE (D.DEPTNO, E.WORKDEPT)

Back to top

Nesting outer joins

Outer join simplification
I have described the situation in which a WHERE clause causes DB2 to simplify the outer join. But there is also a situation in which the ON clause can cause this simplification to occur, and that is when an outer join is nested within another outer join. This type of simplification performed by DB2 is only briefly explained in the DB2 Application Programming and SQL Guide .
Nesting joins appear to defy the following rule for coding an ON clause: the ON clause must immediately succeed the join for which it relates. This is shown in Figure 22 .

Outer Join Nesting -- join simplification

Outer Join Nesting -- join simplification

In Figure 22 , table D is"left joined" to the result of the full outer join of table P and E. Thus, the ON clause dictates that the join between P and E occurs first. This becomes more logical if you strategically place parentheses and indent the query for readability, such as:
SELECT D.DEPTNO, D.DEPTNAME, P.DEPTNO, P.PROJNAME, E.EMPNO

FROMDEPARTMENT D LEFT OUTER JOIN

(PROJECT P FULL OUTER JOIN EMPLOYEE EM

ONP.DEPTNO = E.WORKDEPT)

OND.DEPTNO = P.DEPTNO

ORDER BY D.DEPTNO
The parentheses more easily show that the full outer join is performed first, and that the resultant table is the right table of a left outer join.
After the full outer join (step 1), the result contains NULLs that are supplied from either the left or right table for any unmatched rows. Next, the Department table is left joined to that result set (step 2). For step 2, there are no rows in D that can equal the supplied NULLs, so these rows are not joined. The consequence is that the NULLs are negated by the final ON clause. Therefore, DB2 realizes that there is no requirement for a full join to be coded initially.
The full outer join from the previous example can be rewritten by the DB2 optimizer as a left outer join (as in Figure 23 ).

Outer Join Nesting -- query rewrite

Now there are no NULL-supplied rows negated by the ON clause.
Preserving NULLs
To avoid DB2 performing this simplification, or more importantly, to maintain all rows introduced by the full outer join from Figure 22 , use COALESCE to ensure that the resultant join column is not NULL.

Outer Join Nesting -- NULL preservation

The presence of COALESCE ensures that the subsequent join column cannot be NULL. COALESCE dictates that when the column is NULL-supplied take the value from the preserved row table. For example, when the left table is preserved, the right is NULL-supplied, and vice versa.
Now, because of COALESCE, there aren't any NULLs introduced on the subsequent join column, and thus no rows are inadvertently negated by the ON clause.
Important: As I mentioned earlier, any subsequent table join after a full join must contain the COALESCE in the ON clause to ensure that NULL-supplied rows are not negated by the join.

Membandingkan Dua Sumber Data Pada SQL (JOIN)

Post a Comment

Popular

Blog Archive

Membandingkan Dua Sumber Data Pada SQL (JOIN)

Next

Newer Post

Previous

Older Post

Post a Comment