1 2 3 4 Previous Next

arun.varadarajan

60 Posts

 

A lot many of my colleagues keep asking me...

I am currently a JAVA programmer working on production support for client X offshore. I see no growth in my current job.

I was thinking of maybe getting into SAP since my company is hiring a lot of SAP guys and I heard that you can make a Lot of Money in SAP and that the demand for SAP professionals is going through the roof...

I am sure that many of you must have come across such statements quite often.....

I am not sure what to advise this very good friend of mine....!!!!

First of all I start thinking :

If I were a fresher ( fresh out of college ) - I get placed in a company which trains me in JAVA / .NET etc etc

Then I am put in a project and expected to grow within the organization.....

As for SAP ...?

If I were a fresher - I am hired out of campus and trained in BASIC production support for SAP Modules....

With Modules like BASIS and BW being the easiest since the requirements are less rigorous than other modules.

Example :

An L1 for SAP BI would be typically involved in monitoring Data Loads and a L1 for BASIS might be involved in

transports and at best taking backups / monitoring jobs etc.

Now coming to the intriguing part ....

As for the NON SAP space - the career ladder would be production support --> Application Development team member....

As for SAP .. the requirements for implementation typically come as 2 years Implementation experience  - this would mean that the Production support guy never has a chance to get into implementation unless the experience is faked ( for want of a better word ) - does this mean that effectively we are encouraging integrity breaches by putting such conditions.

Then this leads me to ask the same question again :

1. What does it take for a person say with 2 years experience in production support to get into the SAP world and grow his/her career

2. All this was referring to Indian companies... also if a person is a fresher and has missed the bus on campus recruitment and gets trained in some local institute in some SAP technology - what should h/she do to get a job.

How does a person with no Experience (read: Fresher) get into SAP ? and where can a fresher expect to land a job particularly in terms of responsibilities . How does one grow in SAP Projects – irrespective of modules – or is it module specific ?

Again voicing out my opinion - have we built walls around us that are too high to scale and end up facing a shortage of people for projects and end up with sub optimal resources that might work for some time and then blow up in our faces ...?

I was reading Is SAP Certification Worth the Money?

and his column on ERPGenie.com

And Santosh V's blog on Certification versus Experience

All these blogs talk about experience in the market - what does it have for a fresher ? All along  the offshoring model works on having a bulge in terms of cost of the resource versus the billing for the resource - but the blogs seem to indicate that there is no room for the fresher or a fresher has to essentially be content with production support and nothing more . nothing less ?

This is more of a rant than a complaint ... have a lot of friends who work in technologies like JAVA / .NET etc and have put in a lot of work there but are mainly clueless about where to move to and get into SAP.

Of course these  technologies can be used in the SAP world also but then the job profile requires them to have knowledge of the SAP world also which sort of spoils the whole experience.

These views are just my personal opinions and not to be considered as representing my organization / against any organization in particular.

Just wanted some feedback which I thought I could point people to in order for them to develop their skills in that direction.

What is creative Destruction ?

http://en.wikipedia.org/wiki/Creative_destruction

Summed up in an economic sense by Joseph A. Schumpeter in his work where he uses the term to describe capitalism. 

But then this would mean a destruction of established goals and objectives and a paradigm shift being brought about by innovation and some unforseen situations like the black swan theory etc.

In the world of software ....

1. We are never stationary - often running hard to stand still in one place and in most cases he dreaded Migration word means rework. But then the question is ...

Is rework a form of creative destruction?... I have not mentioned the specific case of bug fixes but then enhancements / migrations ? do they imply this form where due to innovations either in the way we think ( new scenarios ) or a paradigm shift ( ABAP OO ) or ( A Process Oriented Platform ) etc make us shift the way we think and operate

 

But then the process of creative destruction in the economic sense is more brought out as a result of time and unforseen circumstances. In the case of software I wonder what these forces could be...

 

Pretty confused... I remain...

 

Arun

Storing compressed data - Compression Schemes :

Here the compresson schemes for the databases vary : The choice of the compression mechanisms are based on :
1. Amount of CPU power available
2. Amount of disk Space available.
3. Number of successive reads and writes

The decider for the same depends on :
1. Time taken to compress and decompress
2. Time taken to write additional data into the data store - this would mean decompressing and compressing again
3. How this results in performance of the queries on the compressed / decompressed column store.

Ideally the compression scheme should be ;
1. Lossless - meaning that there should be no loss of data while compressing and decompressing.
2. fast compress and decompress - meaning that the compressed data should be readable easily and fast
3. Preferable readable without decompressing.

Most studies around this have found that in lower powered systems light weight compression schemes is preferable - this means that on systems that do not have enough CPU and any compression of this sort takes a lot oif time. This outweighs the benefit from lower I/O time and hence light weight schemes like row level encoding etc are easier on the CPU. This would mean that the compression ratios are lower and hence at some point the compression decompression becomes a wasteful exercise.

however when attempted on a powerful system - more CPU and faster Disks ( preferably RAID ) the compression is much faster and compression ratios can be higher by using more expensive compression schemes. Thus both objectives such as that of

 

  • Higher compression ratios
  • Better compress decompress times are acheived.


Also the query processor / OLAp processor has built in logic to directly read compressed data to some extent that reading uncompressed data thus saving a lot of decompress scenarios. This is also called as lazy decompression.


SUmmarizing all the talk on compression and column stores :
The first part was to understand what OLAP was all about
Then came the concept of the row store and column store where data could be viewed in columnar format This was then followed up by compression.

Now to explain the sequence and why BIA is being quoted regularly here ;
The BIA stores data in a column format and compresses the same. It uses a search engine TREX to read the compressed data to display queries. Also the BIA is based on a RAID with some heavy duty Quad Core processors which can compress the data and uncompress the same easily.

Here the only difference is that the BIA stores the data in files rather than tables. There is an index file for each index and depending on the query - the index file is first searched to see if the result can be got without decompressing the file else the file is decompressed onto main memory and then the results are fetched. The architecture relies on some very fast disk drives also known as Near Real time Storage and significant CPU also.

Here the combination of the two contributes to the fast performance of the BIA. The disk drives are part of a separate SAN that needs to be installed with BIA and are totally separated as compared to the SAP BI SAN. SAN Meaning Storage Area Network. The BIA SAN is based on a RAID architecture that ensures proper failover procedures and high availability.

Cheers!

Why do we need compression ?

The nature of tables and tabls storing row level data makes them candidates for compression because:

1. The space consumed is lesser
2. CPU advances permit us to compress and uncompress the tables in memory and do the necessary analysis  while reducing disk space
3. I/O time is improved by reducing seek times because fetch time is less because the amount of data to e fetched is less thus reducing transfer times
4. Also a larger portion of the table will fit in memory because of reduced data.

Compression in a row store :

Compression in a row store scenario would mean removing the rows with zeroes and also looking at repeated keys which lend themselves to compression.
Because an attribute is stored as a part of an entire record, combining the same attribute from different records together into one value would require some way to “mix” tuples.

Compression in a column store

Storing data in columns presents a number of opportunities for improved performance from compression algorithms then compared to row-oriented architectures. In a column-oriented database, compression schemes that encode multiple values at once are natural.

Also how else can data be compressed . This goes into compression techniques employed by file compression which involves lossless compression. I do not want to go into details and digress .. but the net outcome is this.

You have a table with N number of lines stored on the database.
Each row is addressed to a single two dimensional point on the disk and this data can be compressed and stored. Why should this be compressed ? this saves space and moreover all the data is not needed all the time. The system tends
to compress the data and uncompress the same when required. This way ...

1. Disk Space is optimized.
2. Data is uncompressed and fetched whenever required.

Things to consider here are :
The uncompression takes a fair amount of CPU.. but then in most cases disk space is inexpensive.

In row stores the number of records still are large to be compressed / uncompressed ..  i.e the compression ratio cannot be optimized further.
However in a column store the data can be compressed further due to the nature of repeating rows.  Also indices on the columns will add up to less that the indices on the row store.

Thus it becomes apparent that column stores naturally lend themselves to compression and thereby the compressed table can be quite small. Also column stores when compressed using some common compression mechanisms do not need to be decompressed to be read and can be read directly.

For example, if a compressed column says the value “42” appears 1000 times consecutively in a particular column for which we are computing a SUM aggregate, the operator can simply take the product of the value and run-length as the SUM, without having to decompress.

 Will discuss about compression techniques...

 Cheers!

Approaches to building a column store :

There are multiple ways by which this column store described in the previouss blog can be built ... Each one better than the other in terms of performance but having its own complexities and disadvantages :

Approach 1 : Keep the storage same , change the schema...

To vertically partition a row-store database.

All the rows in the table are broken up into multiple tables each consisting of the table
key and the attribute. This would mean that there is a table for each attribute
in the table. Depending on the query only the necessary tables  are accessed.

This is somewhat similar to the star scheme but then this design demands a separate
dimensions for each attribute as per existing design ..

However there is one more way to achieve this - this can also be achieved using
Aggregates to a certain extent but then Aggregates tend to be query specific and for a
true approach a table with so many columns needs to be built.

Here also the number of attrubutes accessed determines the performance.
The lesser the number of attributes - then the better the p[erformance -
the more the number of attributes - it comes almost equal to or greater than
original performance.


But then the pros and cons of this approach :
1. Heavy data redundancy - for very large databases - unacceptable....

2. Existing architecture need not be redefined - just the star schema needs to be thought through - but no change in SQL / DBMS required.


Approach 2:  Keep the schema same - change the storage

To modify the storage layer of the DBMS to store data in columns rather than rows. At the logical level the schema looks no different

However, at the storage ( disk ) level, instead of storing the table in its current format
( row by row )a table row-by-row, the table is stored column-by-column. This way the need for storing the table keys again for each column is avoided , thereby reducing the amount of redundancy as well as amount of data stored.

Similarly with the previous approach, only those columns that are relevant for a
particular query need to be accessed and merged together. Once this merging has taken place, the normal (row-store) query executor can process the query as normal.

This would mean that redundancy at the column level would still continue but then
the whole table is partitioned by columns. For example if there are 10 columns in a
table with 1000 rows .. we would have 10 tables with 1000 rows each . This way there is no change required to the  existing schema / code - the application accesses which columns it needs and fetches  the data , imagine partitioning the cube on each characteristic in the cube. This is what is suggested. That too this is a physical partition and not logical.

Pros and cons :
1. Extremely friendly for reading data - will show definite improvement in read performances
for random queries that will degenerate as you increase the number of columns.
2. Poor write performance - imagine writing one row of data across so many columns ,
updating indices etc.

Approach 3 : Modify both storage and Schema

This is approach 1 and two put together... Basically doing away with both disadvantages :
1. Keeping the storage simple ,
2. Also avoiding the reconstruction of row level data by the query processor - basically keeping the data in one format
and changing the query executor to read data in the same fashion.

Pros and cons :
If a predicate like Max , Distinct etc is applied on a columns then that column can alone be processed
Selection and aggregation on a column based data store will result in fewer rows getting merged leading to performance improvements.

Now that approaches to build a column store have been detailed - the next stage is to detail the next step in performance improvement - this would be compressing the storage.

Turning the concept of ROLAP on its head – column based approaches :

To recollect :

Current ROLAP Approach :
Row based Indexing and retrieval
This basically has every row stored separately and indexed according to needs / application design. This would mean that to fetch
Data the row indices are read.
Typical indices used : F Table Indices etc.

Advantages :

Random access for the table is quite fast since access cannot be predicted. This is okay considering current datawarehouses since
There is a row based index on the cube which is ued for random access and we create smaller cubes for smaller indexes in the form of aggregates and query the same as required.

Disadvantages :

The size of the table cannot be optimized – this is because the index tends to grow pretty large and storage is not efficient ( not a very big concern considering the cost of storage !!!) but then this becomes a hugely limiting factors in Very Large Databases ( 100 TB sorts ) – not sure what is the delimiting factor for a VLDB but have given my own interpretation of the same.

Column based approach :

Column Oriented databases follow a different approach , this involves indexing the columns together rather than the rows – this is because the unique values in the columns is lesser and hence the indexes are smaller.

What if instead of looking at the rows – if I were to index each column separately ? this would mean that :


1. Repeated values are not stored
2. Query on the database is much easier in terms of processing since much lesser records are fetched
3. Lends itself to sparse queries – queries where the number of columns accessed are lesser compared to the columns in the actual table

However …


1. Loading data into these tables is an extremely processing intensive affair
2. When the number of parameters for the query increases then the query performance starts to degrade and in some cases can be below that of a row based approach

Example

Customer Sales TypeMonth Value
 ABC C 01.2008 20000
 ABC I 02.2008 30000
 XYZ C 01.2008 15000
 XYZ I 01.2008 30000

Here there are 4 records scanned for a list of customers …. Even if we consider more filter conditions :
Customers for 01.2008 – this will scan 3 records….

However if we consider a column based approach :

There will be three columns :
Customer:

Customer
 ABC
 XYZ

 

Sales Type
 C
 I


Month
 01.2008
 02.2008



Joining these three tables together and querying based on the user requirement will be faster due to lesser values being looked for but however if all three columns are being accessed – then the query in some cases might end up being slower than the row based query.

Now the next thing to talk about is storing this data which would involve compression of the same also. Will talk about that in a separate blog .

Meanwhile :
References :


http://www.databasecolumn.com/2007/09/data-compression.html

http://en.wikipedia.org/wiki/Column-oriented_DBMS

Also there is a lot of work done in this area by a person called Daniel J Abadi who teaches at MIT - A lot of published material also which can be looked into.

And continuing onwards to towards to goal of explaining the concept behind BIA ...

Cheers for the weekend,
Arun Varadarajan

Some time ago I cam across an interesting term when reading about the BI Accelerator and what makes it so fast. The term was “ Column Based Compression “

 I have basically started at the root of all things -something I felt was required to start off this topic, something that is known to all but then serves as a nice introcustion to set the stage.


On delving further into the same – came across a whole lot of published material on the same which seemed very interesting and questioned the basics of how queries need to be executed in an analytical perspective.

This basically led to questions such as :

  • What sets apart Analytyic Applications from traditional OLAP systems.
  • What is the importance of Performance / DB in an analytical application
  • What is the difference between OLAp and OLTP.


What is the difference between a query on an OLTP system versus an OLAp system.
OLTP :

Queries are more record based –
Example – what is the invoice value for Invoice number XXXXX for customer ABC….
This is a very granular query and in most cases the query patterns are fixed – meaning it is known about user patterns in an OLTP scenario.

In other words :
The focus is on automating daily business tasks and  queries on an OLTP database tend to be closely associated with a specific process / screen actions – For example – VA02 – where the sales order is analyzed. To provide for this kind of access , the query structure has these parameters coded in advance with all optimization possible by way of indices and necessary  variables filled in at run-time.

OLAP - Analytic Applications:
Example - What is the sales done for year 2008 to customer ABC as compared to the sales done in Year 2007….
This query has the following characteristic:

Less Predictable.
It cannot be predicted as to what the user can query against and it would defeat the purpose of a data warehouse / analytical applications if we were to do so. Here the analysis can be done basis any dimension , any combination – more specifically Analytical applications lend themselves more towards random queries and it is difficult / improper to predict user patterns in the same.

Longer Lasting.
The analysis is always done on a much higher granularity than the granularity at which OLTP databases are consumed. And also the same data set tends to be compared with history data and not essentially present data. OLTP databases are more of finding the current version of data rather than doing an analysis.

More Read-Oriented Than Write-Oriented.
With huge data volumes and historical data Analysis is meant to be for a read oriented system and the tables are meant more for reading data than inserting new records into the same. Typically data is written to the data warehouse in batches (for example, data collected during the day can be sent to the data warehouse from the enterprise transactional databases and batch-written over-night), followed by many read-only queries. Real time data is also supported but then the utility is more operational than analytical.

Of High Granularity
Analytical queries are not meant to query at the lowest level – this would mean that an analytical query is not meant to query at an invoice level but then is expected to query at a customer / state / country level which is of much higher granularity.
Further, they tend to focus on only a few such attributes at a time – typical queries will not have get all the attributes of a customer – it is more user specific and what the user needs at that point in time.

Will post more about row stores and column stores in the next blog. 

Thanks,

Arun Varadarajan

Data Storage on disk

Database tables are mapped to a one dimensional interface. This is because ultimately the rows and columns that form part of the database have to be referenced to a one dimensional reference on the disk for the information to be stored. This data is retrieved from the disk while the same is accessed.


The most common approach is to map the two dimensional data row by row to this reference point. This way all the information is stored in its entirety.
This approach also lends itself to better inserts which means it is easier to write data into the table since each row is referenced separately and same is the case for updates also.


This is the case with almost all relational databases. By this I mean that :

  • Data is stored in a row by row fashion
  • Data is retrieved by accessing this time.


Now this design lends itself to situations where :

  • A lot of data writes happen
  • Querying of the table happens at the most granular level
  • Data volumes are not very high ( near future data exists )


The above symptoms / scenarios are typical of an OLTP system. Now the big question … how does this lend itself to a OLAP scenario where :

  • A lot of data reads happen
  • Querying of the table is uncontrolled and almost always ad-hoc
  • Volume of data is very high


With this scenario it can be seen that retrieving this information is quite expensive since for every query each row has to be scanned. This would mean that even if the row has unrelated information ( additional data fields ) which is not required for the query , it is nevertheless fetched.

Well, Have done the difficult part…. Now some questions :

  • What am I trying to tell you ?
  • Where is this leading to ?
  • Does this mean that my data warehouse is dead ?


A lot of heretical questions … but then …

1. Databases have been designed this way all the way from the 1970’s , not much has changed from the time codd’s rules were promulgated… but then those were the times when 1MB used to be a lot of space and disk was extremely expensive and business process automation was key where the intent was to store data in a relational way for easy retrieval for operational day to day needs.

Now the scenario is quite different …

  • CPUs have become much more faster and capable and scalable
  • Data volumes have increased manifold

By the 80’s people started looking for analytical information from their databases – here this would mean data across months / years etc – this would mean that a lot of data has to be parsed. Here for example in a table of 1 million rows , if I give the following query…

Select distinct(customer) from table.

This would ideally scan the entire table to give me the number of distinct records. But then I might have only about 300 customers for whom I have data.

Where is this leading to ?

Ultimately the aim of this blog is to try and arrive at what happens inside the BIA and how it works .. but then this requires a lot of explanation of new terms – Am trying to explain all the required before going on to the main topic.

Is my datawarehouse dead ?
Not really – only that it can be used much more efficiently. Your investments are safe!!

Coming back to the discussion….


Going back to the query … is there any easier way to get this out ? I would ideally like to scan only 300 records instead of 1 million records…
The question is how do we do this and can the same be achieved with the existing database architecture or something radical is required.

One particular approach is to have a separate database ( read data warehouse ) for analytical queries and have the transactional systems go on as expected. That’s where we are .. but then the underlying concern of running relational queries itself is challenged which comes to asking are relational databases out of their time ?

Cheers,
Arun Varadarajan

You can start by identifying the major components of your query time.
This is similar to the query statistics that you see in RSRT but then I have decided to analyze the same using statistics information available as part of standard content.

I have installed the BI content as prescribed in the BI technical content installation and I am on BI 7.0 – The same has not been attempted in 3.x for want of a system.


BI Content Cubes


The OLAP statistics detailed cube has all the necessary information ( also a lot of it – execute your queries on the same with care – it might lead to too much of data to analyze!!! )

Data Display


This exercise can be done in excel / you can have a query for the same – I did this in excel since I was not sure if any business content query existed for the same.

Here the fields of interest right now are :

  • Used Infoprovider
  • Query Runtime Object
  • Calendar Day ( Do not query for all days – too much of data might lead to overflow on the key figures !!!)


Here it depends on how you want to do the analysis – you can either go by Query or by Infoprovider ( Multiprovider also constitutes an infoprovider – this is the infoprovider on which the query is built )


Enter the query technical name in the Query runtime object and give the necessary filters of Calendar day as indicated below – I have taken only 3 days of data – but then have not found any issues even while taking 3 to 6 months of data – only that the resultset generation is a little slow for my system.

Filters

Fields for Selection
Make sure you check the “Output number of Hits “ – I will come to why this is required later in this blog.

Under fields of selection – select whichever is applicable – Infoprovider / query and also check the following.

 Fields for Selection


The statistics event is key for this analysis…

I have found overflows to occur in Olap Counter and Step when selecting a larger slice of data in terms of days and hence have deselected them. I have used the number of hits as a counter instead.

The output will be something like this – have hidden the actual results … but then going on to what we can do with the results…

Resultset



We have got a long list of Statistics events – but then what do we do with the same....?

Go to table RSDDSTATEVENTS – this has the description of the statistics events which are displayed above..

RSDDSTATEVENTS
Now go back to the base data you have and do the following :

Statistics Event| Rowcount | OLAP Time | Average time

Here the Average time is OLAP Time / Rowcount. This is debatable – but then compress the cube fully to get accurate rowcounts. Or use the OLAP counter if you do not get any overflow message.

Now we will need to find the percentage contribution to the total
This again is a simple Excel formula where you take the percentage contribution to total and you have the Events that contribute a major portion of the query time and accordingly look at resolving the same.

We had used this to determine queries / cubes that were candidates for BI Accelerator – but then the same analysis can be used for other purposes also.

 

The cube contains a lot more detail and a lot more analysis can be done - this blog is to initiate this query analysis and take it further.

The same can be analyzed using RSRT also - but then historical analysis can be done using the exercise mentioned above.

 

Will post a blog on using RSRT and a query run schedule analysis soon...

Arun Varadarajan

Arun Varadarajan

BIA Index rollup

Posted by Arun Varadarajan Aug 20, 2008

The follow up to my previous blog on BIA Index build - I decides to look into the index rollup 

 

*Index rollup : </p><p>After the BIA index is built - the subsequent data loads will have to be rolled up into the BIA for reporting. If this is not done then the data will not be available for reporting. This process is the same as rolling up a request into an Aggregate. Here there is no compress after rollup option - the data is compressed by default and the BIA does something called Vertical compression / column compression instead of the standard row level compression that we are all familiar about. </p><p>Caution : Even if you  switch off the BIA Indices for reporting - the rollup is required.*Get the list of requests that are to be rolled up. Here there is a BIA Index but no Aggregate.

RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_INSTANCE_FOR_RNR    633666 LINE 43                       
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_TSTATE_FOR_RNR 2 LINE 243                                
Status transition 2 / 2 to 7 / 7 completed successfully                                   
RSS2_DTP_RNR_SUBSEQ_PROC_SET SET_TSTATE_FURTHER_START_OK LINE 261                         

Set the status of reporting on the cubes before rolling up the BIA Index.<br /><br /><CUBE> ACT                                                                              <br /><CUBE> ACT                                                                              <br /><CUBE> ACT                                                                              <br /><br />Statistics UID of indexing job: '4AVGUJ241MF68T94VA6S7GTU5' (RSDDSTATTREX/RSDDSTATTREXSERV)<br /><br />Generate statistics for the cube - this is similar to Indexing - have found the same<br />job getting triggered when INdex creation happens.+Start indexing the S Tables first... <br /><br /><CUBE> ACT                                                                              <br /><CUBE> ACT                                                                              <br />Loading data to index BIW_BIC:SYBSPACK (records '0000000002', job '0' )<br />YF2IG_C07 ACT                                                                              <br />Caution: Key figure '<XXXXX>' is of type "FLOAT"                                     <br />Caution: Key figure '<XXXXX>' is of type "FLOAT"                                     <br />Caution: Key figure '<XXXXX>' is of type "FLOAT"                                     <br />Caution: Key figure '<XXXXX>' is of type "FLOAT"                                     <br />Caution: Key figure '<XXXXX>' is of type "FLOAT"                                     <br />Caution: Key figure '<XXXXX>' is of type "FLOAT"                                     <br />Caution: Key figure '<XXXXX>' is of type "FLOAT"                                      <br />Caution: Key figure '<XXXXX>' is of type "FLOAT"                                      <br />Caution: Key figure '<XXXXX>' is of type "FLOAT"                                      <br /><CUBE1> ACT          </p><p>                                                                    <br />Here it also checks where else these characteristics / index on S table is used in BIA and indicates if those cubes are active or not.<br /><br />Loading data to index BIW_BIC:XYBSPACK (records '0000000002', job '0' ) Total number of indexed reocrds in index 'BIW_BIC:SYBSPACK' after COMMIT: '2'<br />Loading data to index BIW_BIC:SYMATERL (records '0000000007', job '0' )<br /><CUBE> ACT                                                                              <br /><CUBE> ACT                                                                              <br />Loading data to index BIW_BIC:<CHARACTERISTIC> (records '0000000007', job '0' )<br />Total number of indexed reocrds ( Looks like a type in SAP.. )in index 'BIW_BIC:<CHARACTERISTIC>' after COMMIT: '7'<br />Loading data to index BIW_BIC:<CHARACTERISTIC> (records '0000000002', job '0' )<br />Total number of indexed reocrds in index 'BIW_BIC:<CHARACTERISTIC>' after COMMIT: '2'<br />Loading data to index BIW_BIC:F<CUBE> (records '0000043030', job '0' )                  <br />Total number of indexed reocrds in index 'BIW_BIC:F<CUBE>' after COMMIT: '43,030'<br />Loading data to index BIW_BIC:D<CUBE>P (records '0000000001', job '0' )                 <br />Roll up BIA index                                                                          <br />Statistics UID of indexing job: '<JOB ID>' (RSDDSTATTREX/RSDDSTATTREXSERV)<br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Index of master data table(s) for InfoObject '<CHARACTERISTIC>' </p><p>Here all the characteristics indicated as nav attributes of the characteristic liested above are indexed.</p><p><br />This includes Nav Attributes and also other characteristics whch are part of the master and also part of the data load.</p><p><br />Index for table '<NAV ATTR>' is being processed                                         <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Index for table '<CHARACTERISTIC>' is being processed                                       <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Index for table '<CHARACTERISTIC>' is being processed                                       <br />Index of master data table(s) for InfoObject '<CHARACTERISTIC>'<br />Index for table '<NAV ATTR>' is being processed                                       <br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Index for table '<CHARACTERISTIC>' is being processed<br />Index for table '<CHARACTERISTIC>' is being processed                                           <br />Index of master data table(s) for InfoObject '<NAV ATTR>'                                       <br />Index for table '<NAV ATTR>' is being processed                                           <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed                                           <br /><br />Check which index to load - Main Index or Delta Index...this is for the SID Tables.<br /><br />Delta mode for index '<NAV ATTR>' set to 'Off' (duration: '0.035387')                     <br />Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')                           <br />Index of master data table(s) for InfoObject 'YSKU7'                                       <br />Index for table '<NAV ATTR>' is being processed                                           <br />Delta mode for index '<NAV ATTR>' set to 'Off' (duration: '0.052726')                     <br />Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')  </p><p>The restriction by Join comes because a Join Index is being created between the S table and X table</p><p>This is because of the Nav Attribute being indexed..more information on Join Indexes  </p><p>                         <br />No new data for index of table '<NAV ATTR>'    <br /><br />*Other characteristics that are not part of any master table are being indexed.*<br />                                           <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed<br />Index for table '<CHARACTERISTIC>' is being processed                                           <br />Index for table '<CHARACTERISTIC>' is being processed<br />Index for table '<CHARACTERISTIC( Line ITEM)>' is being processed<br />Delta mode for index 'SYBILLDOCU' set to 'Off' (duration: '2.243841')                 <br />Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')                           <br />Index 'BIW_BIC:<CHARACTERISTIC>' for BIA index filled (written records '25'):<br />Prepare optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Commit optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Index for table '<CHARACTERISTIC>' is being processed                                          <br />Index for table '<MASTER CHARACTERISTIC>' is being processed<br />Delta mode for index '<CHARACTERISTIC>' set to 'Off' (duration: '0.046333')<br />Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')                           <br />Index of master data table(s) for InfoObject 'YCUSTMER'                                    <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Delta mode for index '<CHARACTERISTIC>' set to 'Off' (duration: '0.011561')<br />Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')                           <br />No new data for index of table '<CHARACTERISTIC>'                                            <br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Index for table '<CHARACTERISTIC>' is being processed                                          <br />Index for table '<CHARACTERISTIC>' is being processed                                          <br />Index for table '<CHARACTERISTIC>' is being processed                                       <br />Index for table '<CHARACTERISTIC>' is being processed                    </p><p>                     <br />Delta mode for index '<CHARACTERISTIC>' set to 'Off' (duration: '0.050789')<br />Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')                           <br />Index 'BIW_BIC:<CHARACTERISTIC>' for BIA index filled (written records '2'):<br />Prepare optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Index of master data table(s) for InfoObject '<CHARACTERISTIC>'<br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Delta mode for index '<CHARACTERISTIC>' set to 'Off' (duration: '0.038126')<br />Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')                           <br />Index 'BIW_BIC:<CHARACTERISTIC>' for BIA index filled (written records '2'):<br />Prepare optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Commit optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Commit optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Delta mode for index '<CHARACTERISTIC>' set to 'Off' (duration: '0.048498') <br />Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')                           <br />Index 'BIW_BIC:<CHARACTERISTIC>' for BIA index filled (written records '7'):<br />Prepare optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Index of master data table(s) for InfoObject '<CHARACTERISTIC>'<br />Index for table '<CHARACTERISTIC>' is being processed                                         <br />Delta mode for index '<CHARACTERISTIC>' set to 'Off' (duration: '0.009456') Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')                           <br />Index 'BIW_BIC:<CHARACTERISTIC>' for BIA index filled (written records '7'):<br />Prepare optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Commit optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Commit optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':   <br /><br />Commit optimize:+

 

The previously optimized data is made visible. <br />If you perform rollback for an index, the system rolls back the data to the last commit optimize. <br />[Source | http://help.sap.com/saphelp_nw2004s/helpdata/en/43/5391420f87a970e10000000a155106/content.htm] :<br /><br />Basically since this is a master data index - guess the rollback of master data is being built in for if you roll back a request in the master data - the commit optimize takes over here.<br />                                    <br />Index for table '<CHARACTERISTIC>' is being processed                                          <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Index for table '<CHARACTERISTIC>' is being processed                                        <br />Delta mode for index '<CHARACTERISTIC>' set to 'Off' (duration: '0.055343')<br />Read-/fill mode: 'D' (Restriction by JOIN with 'RSDDTREXNEWSID')                           <br />Index 'BIW_BIC:<CHARACTERISTIC>' for BIA index filled (written records '2'):<br />Prepare optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Commit optimize for BIA subindex 'BIW_BIC:<CHARACTERISTIC>':<br />Index for table '<CHARACTERISTIC>' is being processed    <br /><br />Now the fact table is being indexed :<br /><br />Index for table 'F<CUBE>' is being processed                                       <br />Delta mode for index 'F<CUBE>' set to 'Off' (duration: '2.155832')                 <br />Read from 'F-' fact table                                                                  <br />Index 'BIW_BIC:F<CUBE>' for BIA index filled (written records '43030'):                 <br />Prepare optimize for BIA subindex 'BIW_BIC:F<CUBE>':                                    <br />Commit optimize for BIA subindex 'BIW_BIC:F<CUBE>':               </p><p>                      <br />43030 records only and gets finished quite quickly - took about 1-2 minutes or so<br />Had to remove the times for easier presentation<br /><br />Now the dimension tables get indexed.*Request Index : Index 'BIW_BIC:D<CUBE>P' for BIA index filled (written records '1'):<br />Prepare optimize for BIA subindex 'BIW_BIC:D<CUBE>P':                                   <br />Commit optimize for BIA subindex 'BIW_BIC:D<CUBE>P':                                    <br />Index for table 'D<CUBE>1' is being processed                                      <br />Index for table 'D<CUBE>3' is being processed                                      <br />Index for table 'D<CUBE>2' is being processed                                      <br />Index for table 'D<CUBE>6' is being processed                                      <br />Index for table 'D<CUBE>4' is being processed                                      <br />BIA index for InfoCube '<CUBE>' rolled up successfully                                  </p><p>Now this is also request based - based on the number or requests that are to be rolled up the set of activities above will repeat...*Now a check is made to check if there are any aggregates on the cube that have to be rolled up.<br />In this case there are'nt any...<br />No rollup necessary                                                                        <br />Rollup is finished: Data target <CUBE>, from 633,526 to 633,666                         <br />Job finished                                                                               <br />

Net net the BIA index rollup is fine .. only issues observed so far are :

1. Rollup after addition of Nav Attributes to the cube - sometimes warnings are shown...

2. Deletion of data - it deletes from the BIA index also - but then sometimes gets locked and in most cases requires a full Index rebuild. - here you can see a check box for Delete from BIA Indices in the selective deletion window - have'nt tried it out yet..

Will see if I can put together something on Maintenance activities for BIA after this... and have some presentations on BIA which should come in handy to publish some more material.

Arun Varadarajan

 


What is compression :
For this we need to go into specifics of how data gets updated into the cube.
A cube has two fact tables..
The F Fact table or the uncompressed fact table - this table is partitioned based on request ID.
In other words .. the F Fact table has a partition for each data load. This is the request ID. The partitioning index for the cube is the 900 index or the P Index.

Then there is the E Fact Table or the compressed fact table. The E Fact Table may or may not be partitioned based on the
cube design. If the cube is partitioned then the E Fact table is partitioned.

*When compression is done </p><p>The selected requests in the F Table are summarized and then inserted into the <br />E Fact Table and the summarized into one Request or in other words - the requets are merged together by summarizing the same.<br /><br /><u>Advantages of compression :</u><br />Query access is faster<br />Cube size comes down - this helps in tasks like rebuild of aggregates etc.<br />Indexes are better maintained<br />Data Load is more efficient because the F Table has fewer requests ane lesser data.<br /><br /><u>Disadvantages of compression :</u><br />Request based deletion is not possible on compression. Only selective deletion is possible.<br /><br /><br />Let us look ate the basic compression job - this job was got from the SM37 monitor...<br />Here references to acxtual cubes have been removed and also Key Figures and characters are highlighted.<br /><br /><u>Compression starts here - parameters for compression :
Zero Elimination</u><br /><br /><u>What is zero elimination :</u><br />If there is some record which has got nullified in the cube it can be currently listed as :<br />Customer|Product|Sales<br />ABC | sprocket | 100<br />ABC | sprocket | -100<br /><br />The DSO will be reading zero for the same record..</p><p>If you have zero suppression in the query then in all likelihood this record is not reported  in such a scenario this record can be removed from the cube to reduce the number of records.<br />Zero elimination basically is for records in the fact table that do not have any facts - in other words all the key figures for the record in the fact table are zero.</p><p><br /><br /><u>Job Sequence*This also gives the clock symbol for the requests in the manage tab where the request is shown as being compressed / summarized.

Leave RSM1_CHECK_DM_GOT_REQUEST in row 70; Req_State ''                             
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_INSTANCE_FOR_RNR    469143 LINE 43                 
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_TSTATE_FOR_RNR 7 LINE 243                          
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_INSTANCE_FOR_RNR    468424 LINE 43                 
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_TSTATE_FOR_RNR 7 LINE 243                          
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_INSTANCE_FOR_RNR    467673 LINE 43                 
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_TSTATE_FOR_RNR 7 LINE 243                          
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_INSTANCE_FOR_RNR    466617 LINE 43                 
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_TSTATE_FOR_RNR 7 LINE 243                          
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_INSTANCE_FOR_RNR    465550 LINE 43                 
RSS2_DTP_RNR_SUBSEQ_PROC_SET GET_TSTATE_FOR_RNR 7 LINE 243                          

Now the summarization query is fired ... kinda long but has a lot of information...!!P" ) / INTO "/BIC/E<CUBE>" <br />E USING ( SELECT /+ PARALLEL ( FACT , 3 ) /0 "PDIMID" , "KEY_<CUBE>T" , <br />"KEY_<CUBE>U" , "KEY_<CUBE>1" ,KEY_<CUBE>2 , "KEY_<CUBE>3" , "KEY_<CUBE>4" , <br />"KEY_<CUBE>5" , "SID_0CALMONTH" ,SUM(KF1" ) AS "KF1" ,SUM( "KF2" ) AS "KF2" ,<br />SUM( "KF3" ) AS "KF3" ,SUM( "KF4" ) AS "KF4" ,SUM( "KF5" ) <br />AS "/BIC/YKF5" ,SUM( "KF6" ) AS "KF6" ,SUM( "KF7" ) AS "KF7" ,<br />SUM( "KF8" ) AS "KF8" ,SUM( "/BIC/YKF9" ) AS "KF9" ,SUM( "KF10" ) <br />AS "KF10" ,SUM( "KF11" )AS "KF11" ,SUM( "KF12" ) AS "KF12" ,<br />SUM( "KF13" ) AS "KF13" ,SUM( "KF14" ) AS "KF14,SUM( KF15" ) AS <br />"KF15" ,SUM("KF16" ) AS "KF16" ,SUM( "/BIC/YKF17" ) AS "KF17" ,<br />SUM( "KF18" ) AS "KF18" ,SUM( "KF19" ) AS"KF19" ,SUM( "KF20" ) <br />AS "/BIC/YKF20" ,SUM( "KF21" ) AS "KF21" ,SUM( "KF22" ) AS "KF22"<br />,SUM( "KF23" ) AS "KF23" ,SUM( "KF24" ) AS "KF24" ,SUM( "/BIC/YKF25" ) <br />AS "KF25" ,SUM( "KF26" ) AS "KF26" ,SUM( "KF27" ) AS "/BIC/YKF27" ,<br />SUM( "KF28" ) AS "KF28" ,SUM( "KF29" ) AS "KF29"FROM "/BIC/F<CUBE>" <br />FACT WHERE "KEY_<CUBE>P=74 AND KEY_<CUBE>T" IN (4751 ,4755 ,4756 ,4757 ,4759 ,4777 ,<br /><< Request numbers Selected >>4778 ,4779 ,4784 ,4790 ,47938 ,4804 ,4806 ,4807 ,4817 ,4818 ,4828 ,4831 ,4835 ,4836 ,<br />4837 ,4839 ,4845 ,4850 ,4851 ,48853 ,4854 ,4855 ,4856 ,4857 ,4858 ,4859 ,48601 ,4862 ,<br />4863 ,4864 ,4865 ,4866 ,4867 ,4868 ,4874 ,4876 ,4881 ,4882 ,4886 ,4887 ,4888 ,48890 ,<br />4891 ,4892 ,4893 ,4894 ,4895 ,4896 ,48978 ,4899 ,4900 ,4901 ,4902 ,4903 ,4904 ,4905 ,<br />,4907 ,4908 ,4937 ,4938 ,4939 ,4940 ,4941 ,49943 ,4944 ,4945 ,4946 ,4947) GROUP BY KEY_<CUBE>T" , "KEY_<CUBE>U" , "KEY_<CUBE>1" , "KEY_<CUBE>2" , "KEY_<CUBE>3" ,  "KEY_<CUBE>4", "KEY_<CUBE>5" , "SID_0CALMONTH" HAVING (SUM("KF1") <> 0 ) OR (SUM ("KF2" ) <> 0 ) OR (SUM ("KF3") <> 0 ) OR (SUM("KF4") <> 0 ) OR (SUM ("KF5")> 0 ) OR (SUM ("KF6") <> 0 ) OR (SUM (=KF7) <> 0 ) OR (SUM ("KF8")> 0 ) OR (SUM ("KF9") <> 0 ) OR (SUM ("KF10") <> 0 ) OR (SUM ("KF11")> 0 ) OR (SUM ("KF12") <> 0 ) OR (SUM (KF13) <> 0 ) OR (SUM ("KF14")<br /><> 0 ) OR (SUM ("KF15") <> 0 ) OR (SUM ("KF16") <> 0 ) OR (SUM ("KF17") <<br />> 0 ) OR (SUM ("KF18") <> 0 ) OR (SUM ("/BIC/YKF19") <> 0 ) OR (SUM ("KF20") <<br />> 0 ) OR (SUM ("KF21") <> 0 ) OR (SUM ("KF22") <> 0 ) OR (SUM ("KF23") <<br />> 0 ) OR (SUM ("KF24") <> 0 ) OR (SUM ("KF25") <> 0 ) OR (SUM ("KF26") <><br />0 ) OR (SUM ("KF27") <> 0 ) OR (SUM ("/BIC/YKF28") <> 0 ) OR (SUM ("KF29") <><br />0 ) ) F ON ( E."KEY_<CUBE>P" = "PDIMID" AND EKEY_<CUBE>T" = F."KEY_<CUBE>T" AND   E."KEY_<CUBE>U" = F."KEY_<CUBE>U" AND   E."KEY_<CUBE>1" = F."KEY_<CUBE>1" AND   E."KEY_<CUBE>2" = F."KEY_<CUBE>2" AND   E."KEY_<CUBE>3"= F."KEY_<CUBE>3" AND   E."KEY_<CUBE>4" = F.<br />KEY_<CUBE>4 AND   E."KEY_<CUBE>5" = F."KEY_<CUBE>5" AND   E."SID_0CALMONTH" = F."SID_0CALMONTH" ) WHEN NOT MATCHED THEN INSERT ( E."KEBL_C11P" , E."KEY_<CUBE>T" , E."KEY_<CUBE>U" E."KEY_<CUBE>1" , E."KEY_<CUBE>2" , E."KEY_<CUBE>3" , E."KEY_<CUBE>4" , E."KEY_<CUBE>5" , E."SID_0CALMONTH" , E."KF1" , E."KF2" , E."KF3" , E."KF4" , E."KF5" , ."KF6" ,E."KF7" , E."KF8" , E."KF9" , E."KF10" , E."KF11" ,."KF12" , E."KF13" , E."/BIC/YKF14" , E."KF15" , E."KF16"<br />, E."KF17" , E."KF18" , E."/BIC/YKF19" , E."KF20" , E."KF21, E.KF22" , E."KF23" , E."/BIC/YKF24" , E."KF25" , E."KF26 E.KF27" , E."KF28" , E."/BIC/YKF29" ) VALUES ( "PDIMID" , F."KEY_<CUB" , F."KEY_<CUBE>U" , F."KEY_<CUBE>1" , F."KEY_<CUBE>2" , F."KEY_<CUBE>3" , F."KEY_<CUBE>4" , F."KEY_<CUBE>5" , F."SID_0CALMONTH" F."KF1" , F."KF2" , F."/BIC /YKF3" , F."KF4" , F."KF5", F."KF6" , F."KF7" , F."/BI<br />C/YKF8" , F."KF9" , F."KF10" , F."KF11" , F."KF12" , F."/B<br />IC/YKF13" , F."KF14" , F."KF15" , F."KF16" , F."KF17" , F."/B<br />IC/YKF18" , F."KF19" , F."KF20" , F."KF21" , F."KF22" , F."<br />KF23" , F."KF24" , F."KF25" , F."KF26" , F."KF27" , F."/B<br />IC/YKF28" , F."KF29" ) WHEN MATCHED<br />THEN UPDATE /+ INDEX("/BIC/E<CUBE>" "/BIC/EYBL_C11P") */ SET E."KF1" = E."CL

 

Bi Accelerator Index build

The BI accelerator (BIA) is built to be used as an appliance - a black box which is not to be questioned as to how it works.

This puts some people at unease when considering the following facts:

1. Are we sure that the job is being done ?

2. What happens if something goes wrong ?

and many such more....

To start with we have a lot of ways to understand how the BIA functions. Lets start with the BIA Index Initial fill job...

 

I have used the job log as it is .. excuse any formatting gaffes.. I have also maintained the timestamps to maintain sequence...

 

*The first step </u></p><p>is to switch off reporting for the cubes - this is necessary since statistics gets generated for the cubes and also the BIA will have unrestricted access to the cube and also the indexes till it is filled </p><p> 16.08.2008 01:00:11 Job started<br />16.08.2008 01:00:11 Step 001 started<br />(program RSDDTREX_AGGREGATES_FILL, variant &0000000000186, user ID )<br />16.08.2008 01:00:11  OFF<br />16.08.2008 01:00:14  OFF<br />16.08.2008 01:00:14  OFF<br />16.08.2008 01:00:14 INFOCUBE: <br />16.08.2008 01:00:14 Statistics UID of indexing job: '<br />4ATXNHLMXZGVUCD2WKYLY7CP9' (RSDDSTATTREX/RSDDSTATTREXSERV)</p><p><u>Build of Master Data Indices</u> </p><p>Now the BIA proceeds to go and build the indices for the master data objects <br />namely the SID tables in the cubes. Here you can notice that the number of jobs <br />spawned for index builds for the SID tables is started. This setting of three parallel processes is set in the BIA configuration and can be changes if required. </p><p> 16.08.2008 01:00:18 Loading data to index BP1_BIC:SXXXXX (records '0000500000', job '1' )<br />16.08.2008 01:00:22 Loading data to index BP1_BIC:SXXXXX (records '0001000000', job '2' )<br />16.08.2008 01:00:22 Loading data to index BP1_BIC:SXXXXX (records '0001181769', job '0' )<br />16.08.2008 01:01:02 Total number of indexed reocrds in index 'BP1_BIC:S<XXXXX>' after COMMIT: '1,181,769'<br />16.08.2008 01:01:03 YUNBL_C04 OFF<br />16.08.2008 01:01:03 YUNBL_C04 OFF<br />After each index build the number of records indexed are indicated.</p><p><u>Key figure check</u> </p><p>16.08.2008 01:01:03 Caution: Key figure '/BIC/<KEY FIGURE>' is of type "FLOAT"<br />16.08.2008 01:01:03 Caution: Key figure '/BIC/<KEY FIGURE>' is of type "FLOAT"<br />16.08.2008 01:01:03 Caution: Key figure '/BIC/<KEY FIGURE>' is of type "FLOAT"<br />16.08.2008 01:01:03 Caution: Key figure '/BIC/<KEY FIGURE>' is of type "FLOAT"<br />16.08.2008 01:01:03 Caution: Key figure '/BIC/<KEY FIGURE>' is of type "FLOAT"<br />16.08.2008 01:01:03 Caution: Key figure '/BIC/<KEY FIGURE>' is of type "FLOAT"<br />16.08.2008 01:01:03 Caution: Key figure '/BIC/<KEY FIGURE>' is of type "FLOAT"<br />16.08.2008 01:01:03 Caution: Key figure '/BIC/<KEY FIGURE>' is of type "FLOAT"<br />16.08.2008 01:01:03 Caution: Key figure '/BIC/<KEY FIGURE>' is of type "FLOAT"<br />Here any FLOAT Key figures in the cube are highlighted. You can ignore this because many times <br />key figures that are not related to your cube might appear.<br />16.08.2008 01:01:04 <CUBE> OFF   </p><p><u>Concurrent jobs that might be running</u> </p><p>16.08.2008 01:01:04 Lock for table '/BIC/SXXXX'. Job will be restarted again later<br />This message appears if any attribute change runs is in progresses and the BIA fill<br />will wait for the Attribute change run to complete. </p><p>16.08.2008 01:01:15 Loading data to index BP1_BIC:S<XXXXX><br />(records '0000833333', job '1' )<br />16.08.2008 01:01:20 Loading data to index BP1_BIC:S<XXXXX> <br />(records '0001666666', job '2' )<br />16.08.2008 01:01:26 Loading data to index BP1_BIC:S<XXXXX><br /> (records '0002499999', job '3' ) <br />16.08.2008 01:01:31 Loading data to index BP1_BIC:S<XXXXX> <br />(records '0003333332', job '4' ) <br />16.08.2008 01:01:37 Loading data to index BP1_BIC:S<XXXXX> <br />(records '0004166665', job '5' )</p><p>Here you can see five parallel jobs for the index build for an SID table.. </p><p><u>Parallelism</u> </p><p> 16.08.2008 01:08:20 Total number of indexed reocrds in index 'BP1_BIC:SXXXX' after COMMIT:<br /> '68,198,941'<br />16.08.2008 01:08:21 Parameter BATCHPARA during indexing: '3'<br />The number of batch processes used is also given in the job log.</p><p>16.08.2008 01:08:21 '10' different values in column 'SID_0CALMONTH' of table '/BIC/DYUNBL_C04T'<br />16.08.2008 01:08:21 WHERE clause: OPT ='BT', LOW = '', HIGH = '0000200806' (field: 'SID_0CALMONTH')<br />16.08.2008 01:08:21 Program RSBATCH_EXECUTE_PROZESS successfully scheduled as job BITRP_4ATXO9JVXWGQMD4W4ZHAJCOI5 with ID 01082100<br />16.08.2008 01:08:21 WHERE clause: OPT ='BT', LOW = '', HIGH = '0000200809' (field: 'SID_0CALMONTH')<br />16.08.2008 01:08:22 Program RSBATCH_EXECUTE_PROZESS successfully scheduled as job BITRP_4ATXO9JVXWGQMD4W4ZHAJCOI5 with ID 01082101<br />Here the Cube is partitioned on calmonth and the index accordingly looks for data in the same way<br />by partitions.</p><p><u>Fact Table Indexing</u> </p><p>Now the heavy duty loading of the fact table starts....<br /><br />16.08.2008 01:08:25 Loading data to index BP1_BIC:F<XXXXX> (records '0000051020', job '1' )<br />16.08.2008 01:24:13 Loading data to index BP1_BIC:F<XXXXX> (records '0022856960', job '5' )<br />16.08.2008 01:24:15 Loading data to index BP1_BIC:F<XXXXX> (records '0022907980', job '5' )</p><p>16.08.2008 01:25:04 Background process '2' complete: number of indexed records '32,643'<br />16.08.2008 01:28:04 Background process '1' complete: number of indexed records '28,209,394'<br />16.08.2008 01:28:04 Parallel background process for indexing table '/BIC/F<XXXXX>' complete ('37')  </p><p><u>Request ID Indexing*

Recently there was a post on the BI forums on what constituted functional and technical roles in BI.. on further discussions with colleagues in the same field .. I got to delve more deeply into the basis for such a What is difference between Technical & Functional skills of BW…. 

Traditional R/3 and other transactional systems like CRM etc have a strong demarcation between technical and functional , this is especially evident in R/3 where you will come across technical consultants across functions but will find function specific consultants like SD functional consultant etc…

  Is such a classification valid for BW as well ?

good question , but then one more reason why this has arisen is due to the fact that BI is seen as a front end system where a person may get away by just typing in descriptions and utilizing the three buttons of the mouse…!!! Weird but true…

Now coming to the crux of the discussion … are there technical and functional roles in BI and if so what are they expected to do ?

Attacking the issue from another angle .. some of the role that are typical of a end to end BW project would be :  BW Architect BW functional consultant BW Technical consultant…

This would lead people to ask .. is it for the sake of the project that we come across such roles or is there something that these people are specifically meant to do ??

From what I have seen .. correct me if I am wrong…

BW Architect : Similar roles – Data Architect / Data Modeler / Systems Architect  The BW architect understands high level design and would typically be involved right from blueprinting to end of the design stage… where the primary responsibilities could be :

1.     Crafting the EDW layer

2.     High level data model

3.     Designing the high level data flows

4.     Corporate data model

5.     Well versed with basic DW concepts relating to cubes / ODS / performance / DB

These would mean that : The architect owns the data model for the BW system Understands the system landscape of the client / source systems Understands and is conversant with the processes currently defined and knows in full the various data points captured at various intervals of the process Is aware as to how the current design fits into the larger scheme of things .  

What an architect is not expected to do :

Any coding of any sort Actual physicalization of the model Any development if required Debug / write any code..

BW Functional consultant

Synonyms : Data Modeler / Business Analyst / Systems Analyst  The BW functional consultant is one who can understand the high level design and break it into smaller functions and the functional consultant may be well versed in the processes of particular modules like PP / MM / SD so that h/she may look at :         1.Documentation        

2.Data sources

3.Data models ( detailed fully attributed ) ones             to suit the same        

4.Estimate the level of customization required

5.Convert the requirements into specifications            that someone else could develop / off shored…         

6.Understand modeling concepts like cubes / ODS etc…

What a Functional consultant is not expected to do :

Do any development of code / code snippets Data loading / monitoring Can be involved in physicalization of the EDW layer depending on available skill sets and comfort levels doing development  Here the line dividing the functional consultant and technical consultant starts to blur… and expectedly so….

BW Technical consultant

Synonyms : ABAP-BW Consultant / Systems Analyst … The BW technical consultant is essentially seen as a person who either leads the technical team or is part of the development team involved in converting the specifications into actual objects..  Basic expectations would be :

1.Understand the technical specifications

2.Physicalize the layers

3.Develop ABAP code if required ( Usually done by ABAP consultants )

4.Monitor the data model and support it into UAT

What a technical consultant is not expected to do :

1.Documentation

2.Requirements

3.Architecture  And now coming to the next facet …

What the above roles are not expected to do but would be great if they did!!!

1.In depth understanding of performance tuning

2.Look at an EDW as a storehouse of related data instead of a report bank

3.Look at how an EDW serves the business by way of providing business critical information

4.How the business benefits from the reports delivered  Technical aspects

1.Understand the coding being done – not essentially do the same but even basic understanding is good – the reason being that further down the line the reason for many of the modifications is lost by way of missing documentations etc…

2.Understand the data model from the perspective as to why it has been done this way and what is the particular requirement driving this particular design 3.Look at requirements from the perspective of effort versus simplicity and achieve a balance between the same and not to deliver a bare bones system or a completely complex system with all bells and whistles..

Most of these are observations from being a Business analyst / SAP BI consultant / SAP BI architect on various projects and often having to deal with multiple hats at the same time , along with having the task of analyzing what has been done already and evaluating others work and also constantly asking the question

“ This could be done in these ways .. supposing someone comes along a year later – are we leaving any option unturned which could be a lifesaver later on??”

Another role which would be very useful to have is  : Netweaver Architect : More than often customizations demanded are outside the purview of the BI system but could be done using the web application server like BSP Pages / Web Dynpros etc ..

which is given to the client – would give the view that there is a very good level understanding of the processes and consolidation of sources is also happening .. which would be very heartening for the client to see as well…interfaces that ease day to day business needs…

Till then have a nice day and a lot of food for thought....

Arun

A frequent question is – how simple should things be ? this is with regard to implementing a solution with the right balance between complexity and functionality.  This would of course require the person to wear multiple hats – often leading to a collision course…  A typical scenario :- The current flow of events warrants that some one steps in to manually approve certain transactions that are above a certain value. When modeling the same into the system :  1. Do you go about modeling the same As-Is ? 2. Do you ‘Anticipate’ future requirements and build in the intelligence for the process to do an e-mail approval ? 3. Where does the business fit in to this picture…??  Modeling the process As-Is would mean that someone somewhere later on would definitely mention to the business that something like this is possible ( oops…)  Option 2 would require a lot of complexity by way of configuration , security , integration etc etc behind the scenes.  Option 3 is more far reaching when you have a technically marvelous solution that the users do not understand!!  Which one of the above do you choose ? essentially saying that do you promise the moon to the end user and go all out trying to give them the same resulting in ( not always but most of the time ) in a complex maze of interfaces that are tough to interpret and debug or simpler still.. give the user a simple configuration with the perceived message that further refinements in terms of user experiences are possible but would require a lot of technical jugglery and might impact the regular flow of events ??  Many a time the hat of a Business Analyst or BPX would prompt you to take option one whereas the Designer in you is screaming murder in terms of development effort and complexity .. which ones do you address?  There is an interesting quote from E.F Codd – the father of the RDBMS concept… Attempting to force one technology or tool to satisfy a particular need for which another tool is more effective and efficient is like attempting to drive a screw into a wall with a hammer when a screwdriver is at hand: the screw may eventually enter the wall but at what cost?         - Providing OLAP to User Analysis – Hyperion – 2006 The same applies to good design concepts as well especially when we tend to go overboard with simplification or trying to force one thought process for a requirement that does not quite fit the current design..  The question again comes to What do you do ? My 2 cents on the same would be :  1.Simplification is a long running thing and it could be looked at in a longer time frame compared to giving the users a working satisficing solution for now with the promise that further tinkering would be done post delivery. – This essentially achieves two things – one is to understand the user psyche as to what they want and what more do they want and another it that the client is aware that we are looking at further tinkering and the same could be rolled out with any further requirements / modifications.  2.The question that needs to be addressed is – are the users ready to adopt  satisficing solutions that address their immediate needs or is Business and IT deciding that users should get the best solution possible at the first shot – here a proactive IT business relationship goes a long way in smoothening things and also reinforcing the view of business that IT can deliver and the team is good at it.  Here the BPX would be a necessity by way of : 1.Adopting good design practices by donning the design hat 2.Controlling requirements – by adopting the business analyst hat.  All that needs to be done would be to effectively juggle the hats around and make sure things go on time and in full…
image Intelligroup SDN team has a splash at exotic Goa, India!!  They say a picture is worth a thousand words. The pictures say it all, be it the world famous beaches, Fort Aguada or the beautiful Bom Jesus Church, we Intellians made a splash especially with our colorful SDN T-Shirts and Intelligroup T-Shirts and being there as a large group of 16 people.  Professionally we guys were as diverse as the range of SAP topics on SDN and had never actually met each other before (at least few of them), but the one thing common to these 16 people were that along with delivering to perfection on their demanding global SAP projects, they were all being active on SDN, learning from SDN and contributing to it thus adding value to the SDN community. This had prompted Intelligroup to give us a completely sponsored vacation break at an exotic location!!!  The whole team of  SDNers got together at the Bangalore airport (most of them had started from Hyderabad), and had fun putting faces to the names we used to see on SDN and the bunch of mails that were exchanged before this trip actually started.  We landed at the Dabolim airport at Goa in the late afternoon after an hour’s flight from B’lore. An hour and a half’s drive from the airport took us to our beach resort. We were by far the largest group at the resort and we immediately made our presence felt by making a big splash in the swimming pool at the resort!!!. Few of us did not know how to swim and SDNers being true to their nature started helping out those who did not know how to swim too!!!.     That day passed without much ado, the main focus being getting to know each other and finding out more about everyone else at Intelligroup!!. We did go to the beautiful weekly Night Bazaar where we were treated to some acrobatics shows and rock music. We then were back at the resort, lounged around the pool before retiring for the day, looking forward to the exciting 3 days ahead.  Day 1! began with all of us starting off to do what Goa is famous for – beach combing!!! We first started with a visit to Fort Aguada which is an old (15th Century) fort of the Portugese era overlooking the sea, and of course made famous by the movie ‘Dil Chahtha Hai’.image From there we went to Vagator beach where we kicked it off with an energetic game of Beach Volleyball and of course a dip in the sea. We found out to our glee that the beach was pretty shallow with gentle waves and not much of a crowd. The brave ones ventured further into the sea and did some swimming while the others chose to carry on with their game of Volleyball. We met some foreign tourists and had a guitar demonstration from a Finnish tourist on the beach.  After Vagator, it was time for lunch with some Goan delicacies on the menu. After a hearty lunch, we were off to one of the most beautiful beaches in Goa –the Baga Beach. As soon as we hit Baga, it was off to the water and some Frisbee in the water. Once we had enough of the sea, we found some space on the beach to play volleyball again!, this time some others who had come to the beach joined the fun. We alternated between Frisbee, Volleyball, water scooters and swimming until we caught the picturesque sunset and a lot of photo ops.imageDone with Baga, we returned to the resort for a good dinner and some interesting discussions on SDN and SDN @ Intelligroup! after which we all went off in groups to the last of the trio of beaches, Calangute, especially with the beach being 10 minutes away from our resort. We found some shacks, as they were called, on the beach and picked out some of the beds that had been laid for sunbathers in the morning and quietly watched the stars, reminiscing how SDN came into our professional lives and how SDN has made a difference to the work being done by the contributors world over and more so, the respect that each person has in the forums with regard to what they post and how people across the world are being helped with their projects. We finally got back to the resort in the wee hours of the morning and called it Day 1!!  Day 2 began hitting the swimming pool, breakfast and then off to Old Goa to see the one of the oldest Church in India and other things that Goa was famous for other than beaches. We visited the very beautiful Basilica of Bom Jesus Church and got to see the sacred relics of the body of St.Francis Xavier. We were amazed at the ornate chapels. We then checked out the Archeology Museum nearby which had some stunning collections of sculptures, paintings, coins .etc. all ranging between 11th and 15th century Goa. image We then headed to the Dona Paula Jetty which is famous for its piers and water rides. This was followed by a pleasant boat cruise where we were treated to some Goan folk dance and music. After having a blast at the disco on the boat we headed back to the resort for the final time content that we had seen most of Goa and had a rocking time doing it. Hungry but not tired, we walked down to our neighborhood beach, Calangute, to one of the shacks for a midnight dinner and retunred to the resort in the wee hours and called it Day 2!!  Day 3! we realized we had not been to Calangute Beach in daylight at all! and promptly headed there to swim in the beach and some Photo Op sessions and beat it back to the resort with just enough time to stuff our belongings back into bags and grab lunch before we left for the airport , Al Qaeda contributing to this rush by getting Goa onto the Terror Alert and extending Security Formalities!!!imageFinally the team – energized by the trip with the knowledge of what we have done and thoughts bristling with ideas to add more value to our projects and to SDN - landed at B’ lore, said their goodbyes to those SDN mates from Intelligroup, Bangalore and the rest of them continued to Hyderabad, with everyone’s thoughts sinking back to deliverables and waiting to tell everyone else what they had missed but could have been on!!image

Actions

Filter Blog

By date:
By tag: