Bcp tuning

Former Member · ‎05-11-2016

Hi

(ASE 15.7 sp60 running on windows 2008)

We are in the process of aggregating many similar databases to one big database on a new server.

Because this is time consuming, we are planing to do this in steps like this:

1. Create the new database, tables etc (but not index, triggers, constraints)

2. Bcp in some of the databases (using fast bcp)

3. create the indexes, triggers, constraints

4. Let the users in and use the new database for some days

5. Drop indexes, triggers and constraints)

6. Bcp in more databases (using fast bcp)

7. create the indexes, triggers, constraints

8. Let the users in and use the new database for some days

(continue until all databases have been bcp-ed into the new database)

Now, the problem.

When we test this, we get good speed on the first bcp-jobs (2.)

Typical we see 1000+ rows / second

BUT after creating indexes etc, and dropping them we get much slower speeds for the rest of the bcp-jobs (6.)

On the same table, we now get 80 rows / second.

What are we doing wrong?

How can I find out more about what is taking so long?

Regards

Pål

Former Member · ‎06-07-2016

Hi again

Thanks Andrew, Mark and Brett for trying to help.

We have now recieved help from Ruairi Prendiville. He found that if we increased the size of Network packages from 2048 to 65024 and the number of pre-allocated extents from 2 to 32, we got very good speeds again. We also changed our device usage from one 700GB device (located on SAN with extreem IO flashdisks), to chunks of 50GB devicefiles (located on the same SAN). I don't know if the device change made mutch diffence, but we are now happy with the speed of bcp in again.

Regards

Pål

Former Member · ‎05-25-2016

Today we have two more findings.

1. The decrease in speed seems to come only if we create and drop clustered indexes, and then bcp-in to a table on the same device. If we bcp-in to a table on another device, speed is not suffering so mutch.

2. A collection of sysmonwatis.

SELECT msw.WaitTime,mwei.Description FROM monSysWaits msw inner join monWaitEventInfo mwei on msw.waitEventId = mwei.WaitEventId order by msw.WaitTime des

After starting some threads with bcp-in I ran this command, and after a few minutes I ran it again. This is the differences of the two.

Description	WaitTime
xact coord: pause during idle loop	720
hk: pause for some time	276
waiting for incoming network data	1028
until an engine has been offlined wait	180
distributed xact co-ordinator idle loop	180
checkpoint process idle loop	162
Wait until heartbeat or check interval expires	12
waiting for client connection request	236
waiting on run queue after sleep	417
waiting on run queue after yield	188
waiting for last i/o on MASS to complete	59
waiting for network send to complete	41
wait for i/o to finish after writing last log page	20
wait to acquire latch	19
waiting for buf write to complete before writing	12
waiting for a lock	15

former_member188958 · ‎05-23-2016

I suspect the decrease in performance is due to the way ASE allocates new extents to objects when doing fast bcp or creating indexes.

The allocation process uses the Global Allocation Map (GAM) structure that is stored in the sysgams system table. Sysgams isn’t a normal table. It is essentially a large bitmap with one bit per allocation unit (block of 256 pages, further broken down into 32 extents). If the bit is on, all 32 extents on the allocation unit are allocated, if the bit is off, at least one extent is free.

There are two types of extents, I call them “short” and “long”. Each allocation unit contains 32 extents. The first page of each allocation unit contains the allocation page itself, and holds information on which objects each of the 32 extents on the allocation unit are allocated too, which pages are actually in use, and which of OAM page is tracking each extent.

Because the first page is used for the allocation page, the first extent is “short”, containing only 7 pages for table data. The remaining 31 extents are all “long”, containing 8 pages for table data.

Because of internal optimizations, “Fast” bcp and Create Index can only allocate long extents. In a database that has recently experienced a lot of BCP and/or Index creation, the result can be a large number of allocation units that have all 31 long extents in use but the short extent free.

As you bcp into ASE and fill the current extent, ASE goes looking for a new free extent. It looks first on the same allocation unit the current extent was on. Then it scans the OAM looking for an allocation unit with a free extent (an OAM hint causes the search here to start where the previous search ended). If the OAM search comes up empty, the GAM is used to find an allocation unit with a free extent. In this state there are many allocation units that the GAM shows as having at least one free extent, however upon reading in the allocation page ASE finds it has only the unusable free extent free, so the next allocation unit that isn’t full is read in and the process is repeated until it finally reaches an allocation unit that has a free long extent. The GAM search always starts at the beginning of the database, and big bcp jobs need to allocate a lot of new extents, so the same all-but-full allocation pages keep getting read over and over as more extents are needed.

Typically this problem is not seen because other types of DML activity (normal inserts, expanding updates, etc.) are perfectly happy to use the short extent and as soon as they do these all-but-full allocation units become full, the GAM bit is cleared, and allocation doesn’t look at that allocation unit again (until something deallocates an extent). But the problem is seen at times when there are heavy consumers of the long extents.

A workaround for this issue is to temporarily drop the default segment from the device fragments early in the database that have already become full – the GAM search is limited to allocation units that are part of the object’s segment.

Alternatively, create a new segment on an unused device, use sp_placeobject to cause new allocations for that object to be made on the new segment. However, this is more likely to run out of space then the previous method of dropping the default segment on full devices.

You can test for this being the issue by issuing a number of DBCC STACKTRACE commands on the spids doing the bcp in, if most of the stacks show functions like pggam_getfree_lowhigh() and pg__allocate_extents() this is probably the issue.

-bret

Former Member · ‎05-12-2016

Shouldn't be happening. You can collect monSysWaits for both and compare. I do a lot of BCP of late (fast & slow). Speed should be consistent.

Bcp tuning

Accepted Solutions (0)

Answers (4)

Answers (4)

Adding connectivity-destination-service as a depen...

Re: SAP Datasphere - add user error

Re: Connect to ABAP environment using RFC

Underlying Views and Tables for Consumption CDS Vi...

Re: Hide Crystal report parameter while execute re...