I work on the SAP Web Intelligence (Webi) Support team. I want to highlight one of the new features that will be introduced for WebI in SAP BusinessObjects Business Intelligence Platform 4.2 (BI4.2) called Parallel Queries. This allows for parallel query execution to be performed when refreshing WebI documents with multiple data providers.
Prior to BI4.2: Each data provider (query) in a Webi report was run sequentially (when the first query finished executing, the second would run, and so on).
Starting in BI4.2: Parallel query execution is enabled by default for WebI documents that have multiple data providers.
Parallel queries are supported for the following data provider types:
Supported Web Intelligence clients:
The following screen shot shows what happens before BI4.2 and in BI4.2:
Parallel Query Settings at the WebI level:
The following shows the WebiParamMaxConcurrentRefreshJobs parameter in the Descriptor XML file:
The following describes the behavior at the database connection level relating to parallel queries:
A summary of the limitations in SAP BI4.2 are as follows:
Example of the log file analysis of a parallel query example:
Log File Analysis:
To analyze the log files, I used the FlexiLogReader. This log reader can be used for analysis of BI .glf log files. The FlexiLogReader can be retrieved from the following link:
Analysis Steps:
1. Open the FlexiLogReader by launching the FlexiLogReader64.exe file.
2. Select the Webi Rich Client .glf log file that was created when logging was turned on when refreshing the report with 4 queries running in parallel.
3. Specify the default configuration (PLUGIN: BI 4.x GLF Log Parsing and End to End Analysis (v1.2.0).
4. Specify to load all files that are in the directory of the file that you just selected. In this case, we only have one .glf log file, but if there were more, by selecting this option, all of the files in the current directory would be chosen.
5. When the log files are loaded, notice that one of the ways that the log files are displayed is by “ServerName/PID/TID”. This means that the logs can be broken out by “Server Name = WebRichClient”, Process ID (In this analysis there is only one process ID), and Thread ID (there are many threads). The advantage of this for analyzing refreshes of parallel queries is that there will be one thread that “manages” the creation of all of the refreshes, and a further thread will be spawned for each of the individual refreshes.
6. To find the lines of code that are related to the parallel refresh process, find all of the lines containing the tag [//R]. To do this in the FlexiLogReader, select “Search” – “Search Window” to bring up the search window.
The search term should be added as a tag so that we can filter the logs to only see the entries that include the lines that have the [//R] entry.
Now ensure that you only check the Search Terms [//R] tag and don’t check any of the other ones. After that, select “Show Only Log Entries Linked To Selected Tags”.
Now, show an extra column that refers to the “Thread ID”.
First off, find out what thread ID is associated with the first row that was returned in the [//R] search.
From the above, we can see that the thread ID that is associated with that first row is 9028.
Now, add the “Thread ID” column to the columns that are shown in the trace summary. To do that, select the first row, select “View”, and select “Choose Columns To Show”.
In the “Define Visible Columns” screen, check the “9028” column which stands for the thread ID column and then select “OK”.
The result is that we now have “Column 3” showing which is the “Timestamp”, “Column 12” showing which is the “Thread ID”, and “Column 36” showing which is the “Text”.
7. By looking at the logs that are filtered by [//R], we can see that there are the following 5 different threads:
9028 – This is the thread ID of the “main” thread that manages the refreshes of the 4 different data providers that are refreshing in parallel.
11212 – This is the thread for the refresh of the first data provider (DP0).
11208 – This is the thread for the refresh of the second data provider (DP1).
11776 – This is the thread for the refresh of the third data provider (DP2).
12028 – This is the thread for the refresh of the fourth data provider (DP3).
8. In the main thread (thread ID = 9028 in this case), the following shows the start and finish of each of the queries that are being run in parallel as separate threads.
9. In FlexiLogReader, we can take the thread ID of one of the threads for one of the query refreshes and filter based on that thread ID. So, for example, if we take the thread ID of the first query that is refreshed (thread ID = 11212) and filter based on that thread ID, we see the following (as shown in the below screen shot). We will leave the [//R] search term selected so that in the summary screen, we will see the logs for thread ID = 11212 as well as the logs for the [//R] search term. By filtering on the thread ID, you can see all of the activities that went on during the refresh process. You can see when the thread starts, which query it is referring to, and when the thread end (when the refresh is finished for that query).
You can analyze the other query refreshes in the other threads using the same process as shown above.
Summary:
I hope that this blog has highlighted the Parallel Queries feature that will be available in BI4.2 including the data providers that it is supported on, how to configure it, how it should lead to quicker refresh times, and how the log files can be examined effectively using the FlexiLogReader. It looks like this will be a great feature!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
35 | |
17 | |
17 | |
15 | |
11 | |
9 | |
8 | |
8 | |
8 | |
7 |