Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

This is part 2 of a blog in 3 parts (at the moment) on how IdM manages queue processing and the audits and logs created during processing

Part 1: Tables, views and processing logic: http://scn.sap.com/community/netweaver-idm/blog/2014/02/05/on-queue-processing-or-the-lack-thereof

Part 2: Viewing the current queues: http://scn.sap.com/community/netweaver-idm/blog/2014/02/10/on-queue-processing-or-the-lack-thereof-p...

Part 3: Post execution audits and logs: http://scn.sap.com/community/netweaver-idm/blog/2014/02/21/on-queue-processing-what-just-happened-pa...


Feedback is most welcome, and additions as well as corrections can be expected.

Edit 20140224: Oracle versions of queries added.

Getting an overview of the queues

One of the most important things to do in case of a productive stand-still or issue is to get an overview of what's in the different queues.

Link evaluations, approvals and workflows have separate queues and processing of them is done by different threads in the dispatcher(s).

Jobs are simply set to state=1 and scheduletime < now in the mc_jobs table.

Jobs and actions

As mentioned above, jobs do not really have a queue. They are scheduled to run by having scheduletime set and state set to 1. The dispatcher will start runtime(s) to process jobs if the mc_dispatcher_check procedure returns 1 or more standard jobs to run. The java runtime will use the procedure mc_getx_job to reserve a job from the available jobs. Once running the state in mc_jobs changes to 2.

Just to clarify, a job sits outside the Identity Store(s) in a job folder and usually works with bulk processing and contains 1 or many passes. Actions are inside the workflow of an Identity Store and can only contain 1 pass and process 1 entry at a time. To slightly confuse the matter, the configuration of an action task is a job, in the mc_jobs table, and the logs it creates are stored in the mc_logs table. There's a link between the task in mxp_tasks to mc_jobs on mxp_tasks.jobguid = mc_jobs.jobgui.

With this knowledge a query listing jobs and provisioning actions that are running can look like this:

SQL Server:


select name,case when provision=1 then 'Action' else 'Job' end type, CurrentEntry, Current_Machine from mc_jobs with(nolock) where state = 2

Oracle:


select name,case when provision=1 then 'Action' else 'Job' end type, CurrentEntry, Current_Machine from mc_jobs where state = 2;

This produces output like this:

Note that the CurrentEntry column in mc_jobs is updated every 100 entry, or every 30 seconds by the runtimes.

The provisioning queue & semaphores

The provisioning queue is based on the table mxp_provision. To process parts of the queue a dispatcher must first set a semaphore that indicates that other dispatchers should keep away from processing the same type of task. This is done by setting a semaphore (basically its own Id as owner along with a timestamp) in the mc_semaphore table. The timestamp is updated as the thread is processing entries, and a semaphore whose timestamp is older than 300 seconds is considered dead. This means that if you have conditional statements taking a very long time to run so that the dispatcher thread is not able to update the timestamp within 300 seconds the semaphore is released and another dispatcher will start processing conditional statements as well. That means trouble because the two threads risk running the same conditional mskey,action,audit combination!

The provisioning queue is divided into views according to the threads in the dispatcher: mxpv_grouptasks_ordered/mxpv_grouptasks_unordered/mxpv_grouptasks_conditional/mxpv_grouptasks_switch/mxpv_grouptasks_approval/

mxpv_grouptasks_attestation

These views will at most contain 1000 entries from a Top 1000 limiter. As mentioned in part #1, actions that are to be processed by runtime engines are picked up by a procedure and has no view.

The link evaluation queue

This queue contains assignments that need to be evaluated. Any mxi_link entry with mcCheckLink < now is in this queue. This includes role/privilege assignments and entry references such as manager.

The dispatcher processes this from the view mxpv_links. This view will contain 0 entries in normal situations, up to 1000 under load. To get the real number of links that need evaluations you can run:

SQL Server:


SELECT count(mcUniqueId) FROM mxi_link with(NOLOCK) WHERE (mcCheckLink < getdate()) AND (mcLinkState IN (0,1))


Oracle:


SELECT count(mcUniqueId) FROM mxi_link WHERE (mcCheckLink < sysdate) AND (mcLinkState IN (0,1))

To see if a specific user has privileges that are queued for evaluation, or if a privilege has entries where it's state is still to be evaluated:

SQL Server:


-- Assignments to evaluate for 'User Tony Zarlenga'
SELECT count(mcUniqueId) FROM mxi_link with(NOLOCK) WHERE (mcCheckLink < getdate()) AND (mcLinkState IN (0,1)) and
mcThisMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'User Tony Zarlenga')
-- User assignments to evaluate for privilege 'PRIV.WITH.APPROVAL'
SELECT count(mcUniqueId) FROM mxi_link with(NOLOCK) WHERE (mcCheckLink < getdate()) AND (mcLinkState IN (0,1)) and
mcOtherMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'PRIV.WITH.APPROVAL')


Oracle:


-- Assignments to evaluate for 'User Tony Zarlenga'
SELECT count(mcUniqueId) FROM mxi_link WHERE (mcCheckLink <sysdate) AND (mcLinkState IN (0,1)) and
mcThisMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'User Tony Zarlenga')
-- User assignments to evaluate for privilege 'PRIV.WITH.APPROVAL'
SELECT count(mcUniqueId) FROM mxi_link WHERE (mcCheckLink < sysdate) AND (mcLinkState IN (0,1)) and
mcOtherMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'PRIV.WITH.APPROVAL')

Listing actions ready to be run by runtime engines

Runtime actions are listed in the provisioning queue with actiontype=0. Combined with state=2 (ready to run) and exectime < now the entry is ready to be processed by a runtime. A very basic query listing number of entries ready for processing by different actions is:

SQL Server:


select count(P.mskey) numEntries,P.actionid, t.taskname from mxp_provision P with(NOLOCK), mxp_tasks T with(NOLOCK)
where P.ActionType=0 and T.taskid = P.ActionID
group by p.ActionID,t.taskname



Oracle:


select count(P.mskey) numEntries,P.actionid, t.taskname from mxp_provision P , mxp_tasks T
where P.ActionType=0 and T.taskid = P.ActionID
group by p.ActionID,t.taskname

Unless you have a lot of actions with delay before start configured actions will usually have an exectime in the past. This query will create a simple result showing the entries that can be processed by runtimes:

Listing actions ready to be run by runtime engines and the state of the job

In most cases this is only part of the full picture. You really want to know if a runtime is actually working on those entries as well. Let's add mc_jobs and mc_job_state to the query to get a bit more detail:

SQL Server:


select count(P.mskey) numEntries,P.actionid, t.taskname,js.name as jobState
from mxp_provision P with(NOLOCK)
inner join mxp_tasks T with(NOLOCK) on T.taskid = P.ActionID
left outer join mc_jobs J with(NOLOCK) on J.JobGuid = T.JobGuid
left outer join mc_job_state JS with(NOLOCK) on j.State = JS.State
where P.ActionType=0 and P.state=2 and T.taskid = P.ActionID
group by p.ActionID,t.taskname,js.name



Oracle:


select count(P.mskey) numEntries,P.actionid, t.taskname,js.name as jobState
from mxp_provision P
inner join mxp_tasks T on T.taskid = P.ActionID
left outer join mc_jobs J on J.JobGuid = T.JobGuid
left outer join mc_job_state JS on j.State = JS.State
where P.ActionType=0 and P.state=2 and T.taskid = P.ActionID
group by p.ActionID,t.taskname,js.name

The current reason for my system not processing anything is getting clearer:

No actions are running so something is blocking or stopping the runtimes from starting and I know to look at the dispatcher. Since I've manually stopped it it's no big surprise and troubleshooting is simple.

Just a few of the actions/jobs are running

If you think that not enough runtimes are being started and see situations like this:

You should look at item 5 in the checklist below and also have a look at the properties and policy of the dispatcher

Dispatcher properties and policies

Max rt engines to start determines how many runtimes a dispatcher starts when it finds X actions ready to run in the queue. In this case, even if 100 are ready to run it will only start 1 in this check interval (see the picture to the right).

Max concurrent rt engines controls how many runtimes a dispatcher will have active at the same time. 1 is always reserved for the Windows Runtime though. So my system is now limited to a single active java runtime at any time.

Max loops for rt engine is also a very useful setting. Starting a java runtime processes and loading all the classes can often take a second or three and in a low-load scenario this can be the slowest operation in the system. This setting tells the runtime that once it's done with an action/job it should enter a small loop with a 1 second delay to check for additional actions that are available. This also increases performance as it is independent of the check interval(see two pictures below)

Also notice the global setting for max concurrent rt engines. If you have 3 dispatchers that can run 50 simultaneous runtimes each you can still limit the total active runtime count to 100 for instance.

The check interval controls how frequently the dispatcher connects to the database to check for available tasks, actions and jobs.

A general recommendation is to increase this in systems with multiple dispatchers so that the average interval is around 5 seconds. So when running 2 dispatcher, check interval is 10 on both, with 4 dispatcher interval is 20 and so on.

By increasing the Max concurrent rt engines setting to 10, and max rt engines to start to 3 the situation is quickly changed to the point where it's difficult to create a screenshot:

Troubleshooting actions/jobs not starting

A quick checklist for troubleshooting:

  1. Check that the dispatcher process (mxservice.exe on windows) is running
  2. Check how many java processes you have in task manager (or using ps, 'ps -ef | grep java', or similar on Unix. Also see Andreas Trinks suggestions in the comments section)
    • If there are no or just a few java processes then the runtimes are most likely not started by the dispatcher
      • Check prelog.log for startup issues
    • If you have lots of java processes but no actions running, then the runtime is probably having problems connecting to the db or reserving a job
      • Check prelog.log
  3. Check that the dispatchers are allowed to run the jobs that are queued
  4. A job listed with state ERROR will not run, and has to be forced to restart. Check it's logs for errors though, they end up error state for a reason (most of the time)
  5. Check the database activity monitor, reports or using queries from IDM SQL Basics #2: Locating problem queries to check if
    • If the procedure mc_dispatcher_check is running for a long time the dispatcher is unable to perform the check on the queue to see how many actions are ready for processing and java runtimes will not be started.
    • If the procedure mxp_getx_provision is running for a long time the result is many java processes in the system but they are unable to allocate jobs

Listing the number of items and their state in the queue

I would usually start off with the following query that lists the number of entries per task, per state and including the state of the linked Job for action tasks.

SQL Server:elect


     count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName, ISNULL(JS.Name,'not an action task') JobState
from
  mxp_provision P with(nolock)
  inner join mxp_Tasks T with(nolock) on T.taskid = P.actionid
  inner join mxp_state S with(nolock) on S.StatID = P.state
  inner join MXP_ActionType A with(nolock) on A.ActType=P.ActionType
  left outer join mc_jobs J with(nolock) on J.JobGuid = T.JobGuid
  left outer join mc_job_state JS with(nolock) on j.State = JS.State
group by
     t.taskid,T.taskname, A.name, S.name, JS.Name
order by
     A.name, S.name





Oracle:


select
     count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName, NVL(JS.Name,'not an action task') JobState
from
  mxp_provision P
  inner join mxp_Tasks T on T.taskid = P.actionid
  inner join mxp_state S on S.StatID = P.state
  inner join MXP_ActionType A on A.ActType=P.ActionType
  left outer join mc_jobs J on J.JobGuid = T.JobGuid
  left outer join mc_job_state JS on j.State = JS.State
group by
     t.taskid,T.taskname, A.name, S.name, JS.Name
order by
     A.name, S.name

I've started my dispatcher test task described in Part #1 for 1000 entries The query above gives me a result like this during the processing

(click to enlarge)

A quick explanation of some of the type/state combinations and what would process them

Action Task/Ready To Run: Action that is ready to be processed by a runtime

+ JobStatus: The state of the job linked to the Action Task. If it's Idle it means a runtime has not picked this up yet.

Conditional, Switch and (un)Ordered Tasks are processed by dispatchers that have a policy that allows Handle Tasks.

Ready to run for conditional or switch task means i'ts ready for evaluation

Ready to run for Ordered/Unorderd task means the workflow can be expanded into the queue

Expanded OK means the workflow at this level is expanded

Waiting generally means that its waiting for a sub-process or child event to finish

The final view of the provisioning queue, with current entry count for actions

Since the mc_jobs table contains a column named CurrentEntry we can also see how many entries running actions have processed using:


SQL Server:


select
  count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName,
  case when js.name = 'Running' then 'Running, processed:'+cast(ISNULL(J.CurrentEntry,0) as varchar) else js.name end state
from
     mxp_provision P with(nolock)
  inner join mxp_Tasks T with(nolock) on T.taskid = P.actionid
  inner join mxp_state S with(nolock) on S.StatID = P.state
  inner join MXP_ActionType A with(nolock) on A.ActType=P.ActionType
  left outer join mc_jobs J with(nolock) on J.JobGuid = T.JobGuid
  left outer join mc_job_state JS with(nolock) on j.State = JS.State
group by
     t.taskid,T.taskname, A.name, S.name, case when js.name = 'Running' then 'Running, processed:'+cast(ISNULL(J.CurrentEntry,0) as varchar) else js.name end
order by
     A.name, S.name





Oracle:


select
  count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName,
  case when js.name = 'Running' then 'Running, processed:'||to_char(NVL(J.CurrentEntry,0)) else js.name end as Jstate
from
     mxp_provision P
  inner join mxp_Tasks T on T.taskid = P.actionid
  inner join mxp_state S on S.StatID = P.state
  inner join MXP_ActionType A on A.ActType=P.ActionType
  left outer join mc_jobs J on J.JobGuid = T.JobGuid
  left outer join mc_job_state JS on j.State = JS.State
group by
     t.taskid,T.taskname, A.name, S.name,case when js.name = 'Running' then 'Running, processed:'||to_char(NVL(J.CurrentEntry,0)) else js.name end
order by
     A.name, S.name    

The result is quite useful as it's now possible to see how many entries the actions that are running have processed so far (click to see):

This will have to do for revision one. If I get time to add more this week I will, but there is patches and SPs to work on as well.

7 Comments
Labels in this area