Kuebiko: sql

Showing posts with label sql. Show all posts

Tuesday, January 15, 2019

Deleting old records in large table in SQL Server

Let's face it DELETE command is really slow. It is often quite a challenge to delete multiple records. If there is no WHERE clause then it is much better to use TRUNCATE, but if you have to use WHERE you are back to square one :). This is my approach how to use DELETE command on a large set of data:


DECLARE @rows INT = 1

SET @rows = 1



WHILE @rows > 0

    BEGIN

       DELETE TOP (5000) [dbo].[MyLog]

       WHERE CreationDateTime <= DATEADD(MONTH, - 6, GETDATE())



       SET @rows = @@ROWCOUNT;

    END

Tuesday, April 3, 2018

ORA-01795: maximum number of expressions in a list is 1000

ORA-01795: maximum number of expressions in a list is 1000
01795. 00000 -  "maximum number of expressions in a list is 1000"
*Cause:    Number of expressions in the query exceeded than 1000.
           Note that unused column/expressions are also counted
           Maximum number of expressions that are allowed is 1000.
*Action:   Reduce the number of expressions in the list and resubmit.
Error at Line: 1,002 Column: 1

This is one of the differences between SQL server (management studio) and Oracle (Oracle SQL developer). It is especially problematic when I have a list of values in where statement. Use SQL server whenever possible.

Wednesday, April 20, 2016

Oracle PLSQL alert table add not null column with default value throws exception

I was trying to run following script:

alter table qrtz_blob_triggers add sched_name varchar(120) not null DEFAULT 'TestScheduler';

I kept receiving a following exception:

alter table qrtz_blob_triggers add sched_name varchar(120) not null DEFAULT 'TestScheduler'
Error report -
SQL Error: ORA-30649: missing DIRECTORY keyword
30649.0000 -  "missing DIRECTORY keyword"
*Cause:    DEFAULT DIRECTORY clause missing or incorrect.
*Action:   Provide the DEFAULT DIRECTORY.

I had to change the order of a query (default before not null):

alter table qrtz_blob_triggers add sched_name varchar(120) DEFAULT 'TestScheduler' not null;

I will keep this as an example for people that keep telling me that SQL is a declarative/functional language, and not something that Scarily Qualifies as a Language

Wednesday, July 17, 2013

NHibernate cascade save and how many queries are produced

The simplest configuration for a cascade save in nhibernate looks following.

public class CUrlMap : ClassMap<CUrl>
{
    public CUrlMap()
    {
        Table("CommonUrl");
        Id(x => x.Id);
        Map(x => x.CommonUrl);

        HasMany(x => x.CMarketUrls)
            .Inverse()
            .Cascade.All()
            .KeyColumns.Add("CommonUrlId");
    }
}

public class CMarketUrlMap : ClassMap<CMarketUrl>
{
    public CMarketUrlMap()
    {
        Table("CommonMarketUrl");
        Id(x => x.Id);
        Map(x => x.MarketUrl);
        Map(x => x.MarketId);

        References(x => x.CUrl).Column("CommonUrlId");
    }
}

One CUrl has many CMarketUrls. The goal is to create/update/delete everything in lowest possible number of steps. Unfortunately it is not really beautiful from a code and SQL point of view.
The simple code to save CUrl

using (var trans = _repository.BeginTransaction())
{
    _repository.SaveOrUpdate(curl);
    trans.Commit();
}

The first thing is that CommonUrlId column (a foreign key) in SQL must allowed nulls. It is because what Nhibernate does during creation of CUrl is it first creates it and then it creates all CMarketUrl that it keeps in a collection (CMarketUrls). It means that total number of queries is:

1 x insert to CUrl
K x insert to CMarketUrls (where K is number of CMarketUrl that are kept in CMarketUrls)

But we are not done yet, the code listed above will save everything but it will not assign Id from a freshly created CUrl to also freshly created CUrlMaps. Programmer needs to map this Id manually during creation. It complicates code, and now it looks:

using (var trans = _repository.BeginTransaction())
{
    _repository.SaveOrUpdate(curl);
                    
    foreach (var cMarketUrl in curl.CMarketUrls)
    {
        cMarketUrl.CUrl = curl;
        _repository.SaveOrUpdate(cMarketUrl);
    }
    trans.Commit();
}

What nhibernate will do is after creation of new records it will update CMarketUrls and set to them newly created Id for CUrl.

It all means that each time when we create a new CUrl with some CMarketUrls the total number of queries is following:

1 x insert to CUrl
K x insert to CMarketUrls (where K is number of CMarketUrl that are kept in CMarketUrls)
1 x batch update (in this one batch all CMarketUrls are updated, CUrl is set to them)

As always there is plenty of place to improve nhibernate.

Thursday, January 24, 2013

SQL Server installation Unknown error (0xfffffff)

I’ve seen this problem so many times. The hard part is that the error is not meaningful, and there is no additional info in EventViewer or SQL installation logs. The problem is GPO policy. Basically someone set a policy for AD not to allowed users to run applications called

setup.exe

You need to disable it in regedit tool, goto:

\HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer\DisallowRun

And rename setup.exe to something like setupzzz.exe

Thursday, January 3, 2013

Reading from transaction log file - SQL Server

Sometimes I need to figure out when a certain operation happened, like when a row was deleted, or added. In order to investigate I use two tools:

DBCC LOG(databasename, typeofoutput)

where typeofoutput:
0: Return only the minimum of information for each operation -- the operation, its context and the transaction ID. (Default)
1: As 0, but also retrieve any flags and the log record length.
2: As 1, but also retrieve the object name, index name, page ID and slot ID.
3: Full informational dump of each operation.
4: As 3 but includes a hex dump of the current transaction log row.

fn_dblog runs in a context of a DB, so be sure to select the one that is interested to you. Results are ordered, the oldest are first to be displayed.

SELECT
SPID,                         -- The process ID for the transaction
[Begin Time],                 -- The start time for the transaction
[Current LSN],                -- The lsn of this record           
[Previous LSN],               -- Previous lsn for this record
Operation,                    -- The operation performed by the log record
Context,                      -- The resource affected by the operation
[Log Reserve],                -- The space reserved for rollback in bytes
AllocUnitName,                -- The name a the affected allocation unit
[Page ID],                    -- The page ID of the affected page
[Number of Locks],            -- The number of locks held by the transaction
[Lock Information]            -- Information about the locks held and the resources locked
FROM fn_dblog(NULL, NULL)

The number of rows displayed depends on a Recovery Mode, if it's set to Simple, SQL server will decide how much information keep in transaction log file, if it's set to FULL, then this query will display all operations from a time when Recovery Mode was set to FULL.

Tuesday, December 11, 2012

Dealing with a hanged SQL Backup

The backup process is triggered by a SQL Server Job. One can see what SQL Server Jobs are currently running by executing a following query:

exec msdb..sp_help_job @execution_status = 1

In order to see all the queries that are executed we can use sp_who, inside cmd column we should see a BACKUP string. sp_who query also enables us to know what SPID the backup process has

exec sp_who

And a query below shows what is a status of a backup process - like displays estimated completion, and its time.

SELECT r.session_id,r.command,CONVERT(NUMERIC(6,2),r.percent_complete)
AS [Percent Complete],CONVERT(VARCHAR(20),DATEADD(ms,r.estimated_completion_time,GetDate()),20) AS [ETA Completion Time],
CONVERT(NUMERIC(10,2),r.total_elapsed_time/1000.0/60.0) AS [Elapsed Min],
CONVERT(NUMERIC(10,2),r.estimated_completion_time/1000.0/60.0) AS [ETA Min],
CONVERT(NUMERIC(10,2),r.estimated_completion_time/1000.0/60.0/60.0) AS [ETA Hours],
CONVERT(VARCHAR(1000),(SELECT SUBSTRING(text,r.statement_start_offset/2,
CASE WHEN r.statement_end_offset = -1 THEN 1000 ELSE (r.statement_end_offset-r.statement_start_offset)/2 END)
FROM sys.dm_exec_sql_text(sql_handle)))
FROM sys.dm_exec_requests r WHERE command IN ('RESTORE DATABASE','BACKUP DATABASE')

Monday, December 3, 2012

Measuring SQL Query execution time

It is not recommended to measure SQL execution time on a DB, wise guys believe that it is much more meaningful to run performance tests from an application, so the response time will also include a network delay, and SQL provider computation time. In my scenario, I do not have an access to an application, and there are many entry points for a one SQL Query. That is why it is optimized on a DB side. A query below displays performance metrics of a query, in my example I am interested with each query that includes 'VersionedFields' string.

select top 40 * from sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt where qt.text like '%VersionedFields%' order by last_execution_time desc

'Elapsed time' is a most important metrics for me. Much more low level way to measure a query response time is to use STATISTICS

DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON 
GO
SET STATISTICS TIME ON
GO

-- SQL Query goes here like, SELECT * FROM VersionedFields

DBCC DROPCLEANBUFFERS
SET STATISTICS IO OFF
GO
SET STATISTICS TIME OFF
GO

Notes regarding DROPCLEANBUFFERS. I only run it on a test bed, not on a production.

Use DBCC DROPCLEANBUFFERS to test queries with a cold buffer cache without shutting down and restarting the server.
To drop clean buffers from the buffer pool, first use CHECKPOINT to produce a cold buffer cache. This forces all dirty pages for the current database to be written to disk and cleans the buffers. After you do this, you can issue DBCC DROPCLEANBUFFERS command to remove all buffers from the buffer pool.

If one really wants to clear entire cache, a CHECKPOINT should be also used.

[CHECKPOINT] Writes all dirty pages for the current database to disk. Dirty pages are data pages that have been entered into the buffer cache and modified, but not yet written to disk. Checkpoints save time during a later recovery by creating a point at which all dirty pages are guaranteed to have been written to disk.

I do not run checkpoint that often.

Shrinking Log file in SQL Server 2008

Procedure of shrinking DB log file (transaction log file). In SQL Server Management Studio

Right click on a DB, Properties -> Options-> Recovery model -> change from 'Full' to 'Simple' -> OK
Right click on a DB -> Tasks -> Shrink -> Files ->  File type -> Log -> OK

The shrinking procedure should not take more than 3 s. It is not possible to change DB Recovery mode this way if a mirroring is setup. Now, some rules of shrinking and maintaining log file.

By default Recovery model is set to FULL
If you store in a DB crucial/important information, then recovery model should be set to FULL
If recovery model is set to FULL, it means that there should be a backup in place, a backup that also includes a transaction log file
When a backup for a transaction log file runs, the transaction log file is shrink. The truncation occurs after a Checkpoint process
So if your transaction log is big, like 7 GB, it means that you:
- Don't have a backup that includes transaction log fie. And it means that you don't need FULL recovery mode
- Your backup is not working
- You have a big and heavily used database

Thursday, November 22, 2012

A great query to identify missing indexes and print it as a create index query. They are ordered by a Total Cost

SELECT  TOP 10 
        [Total Cost]  = ROUND(avg_total_user_cost * avg_user_impact * (user_seeks + user_scans),0) 
        , avg_user_impact
        , TableName = statement
        , [EqualityUsage] = equality_columns 
        , [InequalityUsage] = inequality_columns
        , [Include Cloumns] = included_columns
FROM        sys.dm_db_missing_index_groups g 
INNER JOIN    sys.dm_db_missing_index_group_stats s 
       ON s.group_handle = g.index_group_handle 
INNER JOIN    sys.dm_db_missing_index_details d 
       ON d.index_handle = g.index_handle
ORDER BY [Total Cost] DESC;

Where Total Cost stands for:

total cost=(avg_total_user_cost *avg_user_impact *(user_seeks +user_scans))/1000'000

avg_total_user_cost – Average cost of the user queries that could be reduced by the index in the group
avg_user_impact – Average percentage benefit that user queries could experience if this missing index group was implemented. The value means that the query cost would on average drop by this percentage if this missing index group was implemented
user_seeks – Number of seeks caused by user queries that the recommended index in the group could have been used for
user_scans – Number of scans caused by user queries that the recommended index in the group could have been used for

Below is a nice query to generate an actual command for an index creation:

PRINT 'Missing Indexes: '
PRINT 'The "improvement_measure" column is an indicator of the (estimated) improvement that might '
PRINT 'be seen if the index was created. This is a unitless number, and has meaning only relative '
PRINT 'the same number for other indexes. The measure is a combination of the avg_total_user_cost, '
PRINT 'avg_user_impact, user_seeks, and user_scans columns in sys.dm_db_missing_index_group_stats.'
PRINT ''
PRINT '-- Missing Indexes --'
SELECT CONVERT (varchar, getdate(), 126) AS runtime,
  mig.index_group_handle, mid.index_handle,
  CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS improvement_measure,
  'CREATE INDEX missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)
  + ' ON ' + mid.statement
  + ' (' + ISNULL (mid.equality_columns,'')
    + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL (mid.inequality_columns, '')
  + ')'
  + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
  migs.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
PRINT ''
GO

Monday, November 12, 2012

Profiling SQL Server 2005 and 2008

I am used to use SQL Server Dashboard for profiling SQL Server 2005. By profiling I mean, something nasty is going on between application and SQL Server and no one has any clue what is the reason. This is why Microsoft wrote a series of useful reports that gather information, or use data stored in Master database, to help you to understand what may be the reason. Unfortunately SQL Server 2008 does not support Dashboard, it was replaced by SQL Server Performance Studio. I installed recently Performance Studio, I was hoping to see something similar to Dashboard, unfortunately Performance Studio is a full blown Warehouse - it gathers plenty of additional counters when application is running. It hugely increase CPU usage. While running it in my Test Bed, I receive plenty of Timeout requests to a DB server, I don't recommend it to be installed on a machine that is facing CPU Utilization or memory problems :) Due to that I had to use good old SQL queries to figure out what is going on. There is a nice article explaining how to do it. To see what DB is hit the most I use

SELECT TOP 10 
        [Total Reads] = SUM(total_logical_reads)
        ,[Execution count] = SUM(qs.execution_count)
        ,DatabaseName = DB_NAME(qt.dbid)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
GROUP BY DB_NAME(qt.dbid)
ORDER BY [Total Reads] DESC;

SELECT TOP 10 
        [Total Writes] = SUM(total_logical_writes)
        ,[Execution count] = SUM(qs.execution_count)
        ,DatabaseName = DB_NAME(qt.dbid)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
GROUP BY DB_NAME(qt.dbid)
ORDER BY [Total Writes] DESC;

Null means temporary table not assigned to any DB. To see the most costly queries

SELECT TOP 10 
 [Average IO] = (total_logical_reads + total_logical_writes) / qs.execution_count
,[Total IO] = (total_logical_reads + total_logical_writes)
,[Execution count] = qs.execution_count
,[Individual Query] = SUBSTRING (qt.text,qs.statement_start_offset/2, 
         (CASE WHEN qs.statement_end_offset = -1 
            THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2 
          ELSE qs.statement_end_offset END - qs.statement_start_offset)/2) 
        ,[Parent Query] = qt.text
,DatabaseName = DB_NAME(qt.dbid)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY [Average IO] DESC;

Queries executed most often

SELECT TOP 10 
 [Execution count] = execution_count
,[Individual Query] = SUBSTRING (qt.text,qs.statement_start_offset/2, 
         (CASE WHEN qs.statement_end_offset = -1 
            THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2 
          ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)
,[Parent Query] = qt.text
,DatabaseName = DB_NAME(qt.dbid)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY [Execution count] DESC;

Sunday, June 20, 2010

SQL tests and various DBMS

Long, long time ago, I was writing a maven plug-in to manage different DBMS systems from an application. As you probably know there are some differences between various DBMS systems, and it is good for your application to take an advantage of a DBMS specific behaviors. In my maven plug-in it was due to performance reasons, but one should be aware that sometimes our approach that perfectly works in SQL Server, will not work on Oracle database. Some mistakes are easy to catch - for example application refuse to work:) Other are really difficult to catch. I remember that one of my colleagues, was not aware about different transaction approaches between Oracle and SQL Server (optimistic vs. pessimistic), and it caused some problems to catch it, cause everything seemed to work fine, and from time to time, everything crashed - because submit was refused to run. Anyway, I always wanted to have some easy database available out of the box that can be installed on most popular DBMS systems, so I can run my tests and check if everything works fine. ChinookDatabase is a tool that I use recently. Enjoy.