Wednesday, April 28, 2010

Easily build BI Server from behind an NTLM proxy

Pentaho projects bases on Apache IVY to manage project's dependencies. This is good because it gives you a consistent and uniform way to manage projects dependencies but gives you some troubles if you need to make a build of one of the many Pentaho projects from behind an NTLM firewall.

So here it is a brief and concise survival guide about how to do that quickly and efficiently.

Install a local proxy in you development computer

First of all download cntlm a local proxy that let you easily authenticate through your company or customer NTLM authentication proxy. You've versions for Windows and Linux. I learned about it two years ago when I was on Linux and I had to manage some Maven projects. Now I'm on Windows and the good for me is that it has a version also for this platform (I tried using also ntlmaps but I wasn't able to have it working properly with IVY)

Installation is really simple but, on WIndows, has a tweak we will talk about later. Download the .zip from sourceforge and unzip it wherever is fine for you. Execute the setup.bat and it will rapidly install the local proxy as a Windows Service.

Go to the installation directory (typically C:\Program Files\cntlm) and edit cntlm.ini. Replace the informations for username, password, domain and your proxies hosts and ports as detailed in the file excerpt given below

#
# Cntlm Authentication Proxy Configuration
#
# NOTE: all values are parsed literally, do NOT escape spaces,
# do not quote. Use 0600 perms if you use plaintext password.
#

Username        <your_username>
Domain          <your_domain>
Password        <your_password>   # Use hashes instead (-H)
#Workstation    netbios_hostname        # Should be auto-guessed

Proxy           <1st_proxy_host>:<1st_proxy_port>
Proxy           <2nd_proxy_host>:<2nd_proxy_port>

#
# This is the port number where Cntlm will listen
#
Listen                3128

Make the changes and save the file. Now, before starting the local proxy service, here is the tweak. In my case, the setup.bat script forget to set a service startup parameter that sets the cntlm configuration parameter file location. So to workaround that, open your registry and go to the following key HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\services\cntlm\Parameters. Look for the AppArgs attribute and modify it adding the parameter -i "C:\Program Files\Cntlm\cntlm.ini" as shown below



After that you're ready to start you service successfully otherwise you'll get an error.

Back to our Pentaho's project build

In my case I was making a build of the newest bi-server 3.6 get from the related branch of Pentaho repository (http://source.pentaho.org/svnroot/bi-platform-v2/branches/3.6). To enable IVY to use your locally configured authentication proxy server you've to set the following environment variable before starting compiling the project

ANT_OPTS=-Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128

It is up to you to decide where to set that variable either as a Windows environment variable or from within a command line session (if you're starting the build manually).

And what about for the Linux users....

If you're on Linux you can install cntlm using the package manager of your favorite distro. I used it on Ubuntu and Fedora withou any problem. The configuration file is the same as before but is located in /etc/cntlm.conf. You've to change the same parameters detailed above for cntlm.ini (the file is exactly the same). Start the daemon, set the ANT_OPTS environment variable and have fun with your build.

Wednesday, April 21, 2010

CDA configuration files basics (part 1)

Last time I give a brief explanation about the basics of CDA compilation and deployment and I showed you how to start executing a simple query on our datamart using CDA getting  back a resultset formatted as you expected. I decided to take you to a fully functional sample going through a set of explanatory steps summarized below:
  • Explain the structure of CDA files and learn how to built a new one
  • Show the basic verbs you can use to interact with CDA
  • Put everything together with a working example
Before start playing with CDA remember to get the latest CDA sources compile it and reinstall the plugin as explained in my first post on this blog.

CDA files has a very basic structure. They are very easy to be learned also by a novice user with a basic knowledge of XML. Basically a CDA file contains definitions for some important elements
  1. datasources
  2. data access
  3. compound data access

The DataSource element

Datasources in CDA have the usual meaning of a named connection to a database server. The name (in our case the connection id) is commonly used when creating a query to the database.

Each DataSource is made up by a set of connections defined as follow:
  • has an id attribute to uniquely identify the connection in this CDA datasource definition. Of course we cannot have two connections with the same id.
  • has a type attribute whose value depends on the connection type. We've different connection types one for each datasource type: sql, mondrian, Pentaho Metadata (MQL), Olap4J
  • a set of other elements that differs depending on the connection type
We can have 1 or more connections in the DataSource element according to your needs.
    Configuring connections to relational sources 


    If we look at the samples that comes together with the latest CDA distribution you have a sample for each of the possible connection types. Open PUC console and go to the folder bi-developer/cda/cdafile and you have the list of cda definitions shown below in the files window. As you can see every CDA file is identified by a specific icon.

    First interesting point: to open a file we've a CDA files editor integrated in Pentaho console. Cool! So to open a CDA file select it, right click on the file name and select Edit



    Let's have a look at how to connect to a relational datasource. We've two samples for that. One to connect through jdbc and another to connect through a JNDI datasource. Open the sql-jdbc.cda for example (the jdbc one). As you can see, in the right area of PUC console the editor opened and shows you the file content. CDA editor is the standard tool to modify  CDA files in a quick and easy way. You can modify your file, save it and also test it using the preview functionality (we'll talk about it later). Looking at the file immediately after the CDADescriptor element you'll find the DataSource element and, as its child, the connection definitions.


    As you can see you can find a connection with id="1" and type="sql.jdbc" and then a set of elements describing properties that are directly related with the selected connection type (in this case the definition of a jdbc connection so we've elements for driver, url, username and password). Take your time to go through all the other samples and see the differences in connection types.

    The DataAccess element

    The DataAccess contains the definitions for the query that needs to be executed, the parameters the query accepts and some other side elements that we will explore in the next steps.

    A DataAccess element:
    • has an id
    • has a descriptive name (given through a child element), 
    • links a specific connection id defined in the previous DataSource section
    • contains a query element,
    • contains a set of given parameters (not mandatory),
    • contains a set of other not mandatory definitions will go thorugh in the next steps. 
    We can have one or more DataAccess elements in our CDA file.
    So go back to our example (sample-jdbc.cda) and continue our exploration.


    As you can see in the picture above we've some things to be noted:
    • Remember to give an id and to link the connection defined in the DataSources section 
    • Because we're writing text as value of an xml file element, remember the that comparison operators < > has to be written respectively as &lt; and &gt; otherwise you'll get an error.
    • Parameters has given a name. In the query you can indicate a parameters with the usual syntax ${<parameter_name>} where parameter_name is the exact name of the parameter we're going to apply in the where clause.
    • Every parameter has a type that, together with the name, needs to be mandatory given. Supperted types are: String, Date, Numeric and Integer. Any parameters can have a default value specified.
    • If the parameter's type is Date you can specify a pattern you followed for your date format.
    Test the query

    To test our DataAccess definition before moving foward with our implementaion click on the Preview button. When the Preview form appears you can select the DataAccess definition you want to test and see the results below on a nice looking table.

    Some things to note:
    • the previewer sets the default values for the query (if any) but you can change them accordingly to your needs, 
    • the date format for the orderDate parameter follows the pattern specified in the CDA file definition.

    Now I'll give you the time to play with the samples and experiment a bit. Next time we'll talk about the second part of the DataAccess elements and CompoundDataAccess definitions.