Useful links related to SSIS

October 12, 2011 1 comment

Here are some random topics related to SSIS, but might be useful sometimes.

The below link explains how to implement

batch processing in ssis

Scheduling the SSIS Package as a Job

OLE DB Connection in SSIS Package does not remember password

It might happen sometimes during password change.  Logging off or  Shutting down the machine might work.

“Object reference not set to an instance of an object”

Copying package from some other machine may raise these errors. Creating a new package from scratch might solve the issue.
As discussed in the link below, it is a per user problem.

biztalk vs ssis


Package configurations, logging, error handling in ssis

October 12, 2011 Leave a comment

The following links explains the different package configurations methods provided in SSIS.

Logging, error handling

Logging and error handling helps a lot for general audit and during debugging, when a problem comes from no where.

How to call wcf service from ssis 2005

October 12, 2011 Leave a comment

The following link explains how to call wcf service using a web service task.

A point to note is that the scripting in ssis 2005 has .Net 2.0 framework. XmlSerializer is quite an old technology which is used in web services. Windows Communication Foundation supports it for backwards compatibility.
The DataContractSerializer does not support the programming model used by the XmlSerializer and ASP.NET Web services. In particular, it does not support attributes like XmlElementAttribute and XmlAttributeAttribute. To enable support for this programming model, WCF must be switched to use the XmlSerializer instead of the DataContractSerializer.

Calling a WCF Service from SSIS produces issue with setting variable values

Sometimes there may be

connection problems

You have to overcome this from client code (from the service which calls other service). Use this code in the initialization of your service application to increase connections:

System.Net.ServicePointManager.DefaultConnectionLimit = 10;

Script Component in SSIS

October 12, 2011 1 comment

In the following link, the script component is described with some examples.

The above link notes the two types of transformation in the script component:

Synchronous Transformation

The output is synchronized with the input and the input data will be processed on a row by row basis.

Asynchronous Transformation

The output is not synchronized with the input. All the input data will be fetched initially, then all the rows will be read and followed by the output generation.

An example of using asynchronous Script transformation:

Cleaning Address Data with SSIS Using a Web Service

In the above link, we have the following:
To call a web service from within an SSIS script component you have two options:

  1. Create a proxy class, in any .NET language, using the command prompt utility wsdl.exe and compile this into a strong-named assembly which is then registered in the GAC.
  2. Alternatively create a Visual Basic proxy class using wsdl.exe and import the new class directly into your VSA project.

In order to implement the first option, we need to register the assembly in GAC and also sign it with a strong name and it may not be the preferred way. The need for a strong name pair key and GAC registry is emphasised in the following link.

In general, to generate a proxy class, locate the wsdl.exe file in the system and in the visual studio command prompt type the following:

C:\Program Files\Microsoft Visual Studio 10.0\VC>
wsdl /l:vb /out:proxyfilename.vb http://serviceaddress/service.svc

The difference between synchronous and asynchronous script components

The following links explains how to make a script component asynchronous and also the differences between the two.

Let me take this opportunity to paste some more general info about

Asynchronous Method Overloads

There are potentially two overloads for the asynchronous operations: single-invocation and multiple-invocation. You can distinguish these two forms by their method signatures: the multiple-invocation form has an extra parameter called userState. This form makes it possible for your code to call Method1Async(string param, object userState) multiple times without waiting for any pending asynchronous operations to finish. If, on the other hand, you try to call Method1Async(string param) before a previous invocation has completed, the method raises an InvalidOperationException.

The userState parameter for the multiple-invocation overloads allows you to distinguish among asynchronous operations. You provide a unique value (for example, a GUID or hash code) for each call to Method1Async(string param, object userState), and when each operation is completed, your event handler can determine which instance of the operation raised the completion event.

see under the heading Asynchronous Method Overloads in the link:

Problems with async version:

Async script components work well as long as the pipeline is not closed.

So it fails when the package debugging is stopped after successful execution or when run through a sql server agent job.

Sometimes you have some pretty complex ETL’s going in SSIS, and you might have multiple projects/solutions that need to call other SSIS Packages or SQL Agent Jobs and you have a pretty big production going on. You might have an ETL solution that needs to kick off other packages, and you can either import those into your solution or call them where they lie on the file system/SQL server, etc. You might have to call some SQL agent jobs, and most often they are async calls (you dont need to wait for them to come back) and this works nicely, I do this all the time. The Execute SQL Agent Task in SSIS works nice, or you can just call the SQL statement to execute a job, either way, it kicks off the job and then just comes back successful right away, and doesn’t care if the job actually succeeds.

SSIS – SQL task and Script Component

October 12, 2011 Leave a comment

This below link explains in detail about creating a SSIS package in VS 2008, project folder structure and designer.  Good for a beginner.

Tasks in the Control flow toolbox:

The Execute SQL Task:
This task is widely used in the packages.

A word of caution is below:
“MSDN recognize as a SSIS bug that a bigint output parameter returned by a sql stored procedure cannot be assigned to a Long SSIS variable .”

Webservice task:
The below link has a document explaining the usage of webservice task in a clear manner.

Here are some useful tools in the Data flow.

Data viewer:

Data Viewer is truly a unique feature, which provides the ability to view rows of data as Integration Services processes them in the data-flow pipeline.

Script component:

SSIS has script components both for control flow as well as for data flow. These allow users to write scripts in Visual Basic .NET. SQL Server 2008 includes Visual Studio Tools for Applications, which provides a scripting environment in which you can use Visual Basic .NET, or C# to implement script components. In SQL server 2005, the only choice is VB.

Data Integration Services

October 12, 2011 Leave a comment

Data, Data, every where,

And all the boards did shrink;

Data, Data, every where,

Nor any drop to drink.[1]

Of what use is the ocean of data, if it can not quench our thirst for decision-making- information.

Introduction [2]

Data can help us to understand the past better and get direction for the future. So we need to transform data into meaningful and actionable information.

A Real-World Scenario

Consider a departmental store which has details of its products and prices stored in a database. It would like to display them in its websites real time. There may be various promotions or offers on the products. This will affect the prices regularly. So all the products needs to be validated for right prices.

Some times, data conversion might be required. Also, we would love to see the products to be sorted according to certain categories. Different store owners will ask for the updated data. All this and much more tasks needs be done repeatedly as a job. Ok, let us hire a leader who can integrate all these tasks.

Challenges of Data Integration

At one level, the problem of data integration in our real-world scenario is extraordinarily simple. Get data from multiple sources, cleanse and transform the data, and load the data into appropriate data stores for analysis and reporting.

The most pertinent challenges are:

•      Multiple sources with different formats.

•      Structured, semi-structured, and unstructured data.

•      Data feeds from source systems arriving at different times.

•      Huge data volumes.

In an ideal world, even if you somehow manage to get all the data we need in one place, new challenges start to surface, including:

•      Data quality.

•      Making sense of different data formats.

•      Transforming the data into a format that is meaningful to business analysts.

Not only do you need to achieve all of these results, but also you need to achieve them as fast as possible. In extreme cases, such as online businesses, you must integrate data on a continuous basis.

Thus different (especially nonstandard) data sources need to be included in the Extract, Transform, and Load (ETL) process and more complex operations (such as data and text mining) need to be performed on the data.

SQL Server Integration Services (SSIS) is an effective toolset for both the traditional demands of ETL operations, as well as for the evolving needs of general-purpose data integration.

SSIS is part of the SQL Server Business Intelligence (BI)[3] platform that enables the development of end-to-end BI applications.

SSIS is used not only for large datasets, but also for complex data flows. As the data flows from source(s) to destination(s), you can split, merge, and combine the stream of data with other data streams, and also manipulate it on the fly.

back to [1]

back to [2]  Most of the content is from

back to [3] SQL Server Integration Services, Analysis Services, and Reporting Services all use a common Microsoft Visual Studio® based development environment called the SQL Server Business Intelligence (BI) Development Studio. BI Development Studio provides an integrated development environment (IDE) for BI application development.

Categories: SSIS, Technical

What is Google base data API

February 2, 2011 Leave a comment

In the last blog we saw that our product data is uploaded into the Merchant center and the non-product data is uploaded  into the Google Base website and then internally all these data are stored in Google’s repository which is also called as Google Base. Limiting our discussion to the Product data, instead of uploading our raw feeds using ftp upload, we can also send  them to the Merchant center using the Google’s API (an interface provided by the application to programmatically do our job).

The API provided by Google in the beginning was Google Base Data API (referred as Base API). This API could be used for both feeding the products into Gbase as well as, searching the products. The details for getting started with Base API are provided here and the developer guide is here Visiting these links would have caught your eyes that the base API is deprecated now. Yes, Google has decided to separate the task of uploading and querying using two different API namely Content API and Shopping API along with many more features added to the new APIs.  Base API was doing the work of two men. It had a stable version of the client libraries in many programming languages.But now that Google has announced that it will retire Base API by June 1, 2011 we need to look into the two new guys for continuing our uploads and queries.

The following figure shows uploading and querying for data using the older Base API. Customers upload their product catalog data feed files using the Merchant Center or using the GData libraries to programmatically transmit the catalog. Once the data is parsed, validated and inserted into Google Base, its available for search in Google Product Search and can be queried in Commerce Search.


Older Architecture

Upload and Query using older Base API

After Google Base is retired in June, the item types such as jobs, real estate, events, and activities will not be supported.  For more info read <>

Now coming to the newer APIs, the content API allows retailers to upload product data to Google. It can be used to add, delete, query or modify any item individually or in a batch. The data can be used for Google Product Search, Google Commerce Search(GCS), and Product Ads. The Search API is aimed at helping Google Affiliate Network publishers and Google Commerce Search customers. Those who upload feeds in forms of XML or CSV files directly can continue to use the Merchant Center as before.

The figure given below shows uploading and querying for data using the Content and Search APIs for Shopping. Customers can continue to upload their product catalog using the Merchant Center but if they are doing the upload programmatically, they will use the Content API for Shopping to transmit items. These items get validated, parsed and indexed per use case depending on where an item should appear ( GCS or Product Search). Once the data is available for search, the Search API for Shopping is used to to retrieve both the items and facets in one call. You can look into for more details.



New Architecture

Uploading and Querying using the new APIs


We will look at the Content API in the subsequent blogs. Meanwhile you can get started with content API with Getting Started Guide.



Categories: Google, Technical