Friday, March 19, 2010

A first look into Pentaho CDA

These days I decided to have a look into Pentaho CDA the newest idea from Pedro Alves. The project is available here on Google code. He gave announcement about this two months ago in his blog and immediately it appeared interesting.

What is CDA?

CDA is a BI Platform plugin that that allows to fetch data in various formats. The good things are that it allows you to acces data using all of the possible data sources you can use in BI Server (SQL, MDX, Metadata, Kettle etc.) and can be used by any client (internal or external to BI Server) because you can call it through a simple url and it gives you back a data stream.

Let's go and compile the code

So after some very busy days of hard work were I only had the opportunity to keep my local copy of the code updated with the repository and see what's going on, yesterday I had some time available and I was able to compile it and give it try. The framework easily compiles with the ant script you find in the source tree. The only thing you've to consider is to create a file named override.properties containing a set of definitions related to your working environment

  plugin.local.install.solutions.dir = <substitute_with_your_biserver_home>/pentaho-solutions   
  plugin.local.install.user = joe   
  plugin.local.install.pass = password   

After having compiled the code successfully you can install it under your BI Server rel. 3.5.2 with the command

 ant install-plugin  

The basics

Immediately after our successfull compilation and installation we can start biserver and login into the system. Under the bi-developer solution you have a folder named CDA where you can find an extended set of examples to start playing with.

CDA configuration is based on xml files. Every xml file contains a set of data access definitions. Everyone is identified by an id and has a specified data access method. I don't copy and paste a sample here because you can have a look in the samples by yourself. They are very simple and the format is immediately understandable so that you can easily edit them by hand.

Basically you can have your output data stream in multiple formats. At the time of writing this article you can use JSON (the default), XML, Excel and CSV. This gives you the maximum flexibility on what you can do with that data.

If you want to try to execute a query using a data access method defined in your .cda file you can use the following this sintax

http://<biserver_host>:<port>/pentaho/content/cda/doQuery?solution=bi-developers&path=cda/cdafiles&file=sql-jndi.cda&outputType=xml&dataAccessId=1

This calls the method doQuery to execute the query using the dataAccess with id 1 defined in sql-jndi.cda and located under the  solution bi-developers in folder cda/cdafiles giving you the output using XML. Try to execute it changing the output type and see what happens. Remember that if you don't specify an output type it will be JSON by default.

Suppose that we wnat an output typ like xml, the answer you'll see in your browser window is something like this

Shipped20044114929.96Shipped20051513074.46

almost insignificant but, if you get the page source you clearly can see the xml stream sent to you by our CDA plugin

 <CdaExport>  
      <MetaData>  
           <ColumnMetaData index="0" type="String" name="STATUS"/>  
           <ColumnMetaData index="1" type="Numeric" name="Year"/>  
           <ColumnMetaData index="2" type="Numeric" name="price"/>  
           <ColumnMetaData index="3" type="String" name="PriceInK"/>  
      </MetaData>  
      <ResultSet>  
           <Row>  
                <Col>Shipped</Col>  
                <Col>2004</Col>  
                <Col>4114929.96</Col>  
                <Col isNull="true"/>  
           </Row>  
           <Row>  
                <Col>Shipped</Col>  
                <Col>2005</Col>  
                <Col>1513074.46</Col>  
                <Col isNull="true"/>  
           </Row>  
      </ResultSet>  
 </CdaExport>  

Really cool! If you don't want to make your hand dirty in changing jdbc definitions I suggests you to try all the files whose name contains the word jndi that means they use a datasource and not a direct jdbc connection defined in the file (these are the files that contains the word jdbc).

Read the next part of this serie and learn about CDA configuration files basics