How does CompoundDataAccess works internally
It is really interesting to have a look at the sources and see how the things really works:
- when you want to join the two DataAccess outputs CDA builds a little Kettle transformations and uses the Merge Join step two do the required join operation
- if you want to append the two DataAccess outputs CDA very easily appends by itself the two datasets following the order you specified in configuration file.
Remember that in either cases you can apply this operations considering only two datasets at the time for each CompoundDataAccess element definition.
CompoundDataAccess basics
To define a new CompoundDataAccess element you have to mandatory specify
- an id attribute, to externally identify this DataAccess elemement
- a type attribute that accepts two values union or join following the type of composition you're looking for.
Moreover, because the CompundDataAccess element works on resulting DataAccess datasets we have to preliminarily define the two needed DataAccess. It is not possible (but would be a nice to have and I can think about implementing it) use DataAccess elements defined in external CDA files.
Joining datasets output
To join two datasets we have to define a CompoundDataAccess element with the type attribute valued to the join value. The join type lets you execute a FULL OUTER JOIN upon the two input datasets. Required elements are:
- Left element definition, to specify a dataset for the left part of the join,
- Right element definition, to specify a dataset for the right part of the join.
- id, is valued to the id of the DataAccess element definition you are going to consider as the left/right dataset
- key, is a comma separated list of elements to be considered as the join key. Remember that the columns keys are specified giving their position in the DataAccess query and not the name.
Below I give an example to fully understand this concept. Suppose we defined the following two DataAccess definitions in our CDA file
<DataAccess id="1" connection="1" type="sql" access="private" cache="true" cacheDuration="300">
<Name>Sql Query on SampleData - Jdbc</Name>
<Query>
select o.YEAR_ID, o.STATUS as status, sum(o.TOTALPRICE) as totalprice from orderfact o
group by o.YEAR_ID, o.STATUS
</Query>
.
.
.
.
</DataAccess>
<DataAccess id="2" connection="1" type="sql" access="public" cache="true" cacheDuration="5">
<Name>Sql Query on SampleData</Name>
<Query>
select o.YEAR_ID, o.status, sum(o.TOTALPRICE * 3) as tripleprice from orderfact o
where o.STATUS = ${status} and o.ORDERDATE > ${orderDate}
group by o.YEAR_ID, o.STATUS
order by o.YEAR_ID DESC, o.STATUS
</Query>
.
.
.
</DataAccess>
and suppose in the same cda file we define this CompoundDataAccess element.
<CompoundDataAccess id="3" type="join">
<Left id="1" keys="0,1"/>
<Right id="2" keys="0,1"/>
<Columns>
<CalculatedColumn>
<Name>PriceDiff</Name>
<Formula>=[TRIPLEPRICE]-[TOTALPRICE]</Formula>
</CalculatedColumn>
</Columns>
<Parameters>
<Parameter name="status" type="String" default="Shipped"/>
<Parameter name="orderDate" type="Date" pattern="yyyy-MM-dd" default="2003-03-01"/>
</Parameters>
<Output indexes="0,1,2,5,6"/>
</CompoundDataAccess>
As you can see the CompoundDataAccess element has the two keys, Left and Right, two indicate the two sides of the join. The left one uses the DataAccess id 1 (the first one above) and the second the DataAcccess id 2. Then you see the comma separated list that defines the positions of the keys columns in the related DataAccess query. In this case you can see, looking at the related DataAcces queries, that either the elements uses YEAR_ID and the STATUS columns
Appending datasets ouputs
To append two datasets we have to define a CompoundDataAccess element with the type attribute valued to the union value.The union type lets you execute append two DataAcces output datasets giving their order using two configuration elements:
- Top, to specify the first dataset,
- Bottom, to specify the second dataset.
Anyone of the two elements accepts the following attributes:
- id, is valued to the id of the DataAccess element definition you are going to consider as the Top/Bottom dataset
<CompoundDataAccess id="3" type="union">
<Top id="2"/>
<Bottom id="1"/>
<Parameters>
<Parameter name="year" type="Numeric" default="2004"/>
</Parameters>
</CompoundDataAccess>
No comments:
Post a Comment