An interesting topic about CDA is the ability apply join and union constructs to our incoming dataset using a
CompoundDataAccess element. This definition is a sort of extension of the
DataAccess element and through a
type attribute gives us the ability to
join or
append two given
DataAccess datasets outputs forming a new one.
How does CompoundDataAccess works internally
It is really interesting to have a look at the sources and see how the things really works:
- when you want to join the two DataAccess outputs CDA builds a little Kettle transformations and uses the Merge Join step two do the required join operation
- if you want to append the two DataAccess outputs CDA very easily appends by itself the two datasets following the order you specified in configuration file.
Remember that in either cases you can apply this operations considering only two datasets at the time for each
CompoundDataAccess element definition.
CompoundDataAccess basics
To define a new
CompoundDataAccess element you have to mandatory specify
- an id attribute, to externally identify this DataAccess elemement
- a type attribute that accepts two values union or join following the type of composition you're looking for.
Because
CompoundDataAccess is only an extension of the
DataAccess element, we can apply the
Parameter,
Columns,
CalculatedColumns and
Output elements following the same rules specified for
DataAccess in my previous post
here
Moreover, because the
CompundDataAccess element works on resulting
DataAccess datasets we have to preliminarily define the two needed
DataAccess. It is not possible (but would be a nice to have and I can think about implementing it) use
DataAccess elements defined in external CDA files.
Joining datasets output
To join two datasets we have to define a
CompoundDataAccess element with the type attribute valued to the
join value. The join type lets you execute a
FULL OUTER JOIN upon the two input datasets. Required elements are:
- Left element definition, to specify a dataset for the left part of the join,
- Right element definition, to specify a dataset for the right part of the join.
Anyone of the two elements accepts the following attributes:
- id, is valued to the id of the DataAccess element definition you are going to consider as the left/right dataset
- key, is a comma separated list of elements to be considered as the join key. Remember that the columns keys are specified giving their position in the DataAccess query and not the name.
Below I give an example to fully understand this concept. Suppose we defined the following two
DataAccess definitions in our CDA file
<DataAccess id="1" connection="1" type="sql" access="private" cache="true" cacheDuration="300">
<Name>Sql Query on SampleData - Jdbc</Name>
<Query>
select o.YEAR_ID, o.STATUS as status, sum(o.TOTALPRICE) as totalprice from orderfact o
group by o.YEAR_ID, o.STATUS
</Query>
.
.
.
.
</DataAccess>
<DataAccess id="2" connection="1" type="sql" access="public" cache="true" cacheDuration="5">
<Name>Sql Query on SampleData</Name>
<Query>
select o.YEAR_ID, o.status, sum(o.TOTALPRICE * 3) as tripleprice from orderfact o
where o.STATUS = ${status} and o.ORDERDATE > ${orderDate}
group by o.YEAR_ID, o.STATUS
order by o.YEAR_ID DESC, o.STATUS
</Query>
.
.
.
</DataAccess>
and suppose in the same cda file we define this CompoundDataAccess element.
<CompoundDataAccess id="3" type="join">
<Left id="1" keys="0,1"/>
<Right id="2" keys="0,1"/>
<Columns>
<CalculatedColumn>
<Name>PriceDiff</Name>
<Formula>=[TRIPLEPRICE]-[TOTALPRICE]</Formula>
</CalculatedColumn>
</Columns>
<Parameters>
<Parameter name="status" type="String" default="Shipped"/>
<Parameter name="orderDate" type="Date" pattern="yyyy-MM-dd" default="2003-03-01"/>
</Parameters>
<Output indexes="0,1,2,5,6"/>
</CompoundDataAccess>
As you can see the
CompoundDataAccess element has the two keys,
Left and
Right, two indicate the two sides of the join. The left one uses the
DataAccess id 1 (the first one above) and the second the
DataAcccess id 2. Then you see the comma separated list that defines the positions of the keys columns in the related DataAccess query. In this case you can see, looking at the related DataAcces queries, that either the elements uses
YEAR_ID and the
STATUS columns
Appending datasets ouputs
To append two datasets we have to define a
CompoundDataAccess element with the type attribute valued to the
union value.The union type lets you execute append two
DataAcces output datasets giving their order using two configuration elements:
- Top, to specify the first dataset,
- Bottom, to specify the second dataset.
Anyone of the two elements accepts the following attributes:
- id, is valued to the id of the DataAccess element definition you are going to consider as the Top/Bottom dataset
Below a sample of configuration using a compound union. The
DataAccess elements considered for this sample are the same defined in the previous sample for the join type.
<CompoundDataAccess id="3" type="union">
<Top id="2"/>
<Bottom id="1"/>
<Parameters>
<Parameter name="year" type="Numeric" default="2004"/>
</Parameters>
</CompoundDataAccess>