About Timothy A Wiseman

I am a SQL Server DBA with an interest in optimization as well as recent graduate of Boyd School of Law, Python Programmer, and occasional Go Player and student of Jiu Jitsu.

SQLite and the Performance Implications of Indexes

Indexes make an enormous difference in the read speed of a databases. I have previously mentioned that adding proper indexes is a significant part of improving overall database performance. We can use SQLite and Python to demonstrate how much of a difference indexes make in certain cases.

For this purpose, we create a database with 5000 rows. Each row will have two columns, one with a random number and the other with a random string. Then we will test the speed with a query that finds rows where col1 is identical but col2 is different using the Python timeit library. Then we will see how long it takes to both create an index and execute the same query. Just to ensure that the results are consistent, we will drop the index and then repeat the test. All together, the code to do this looks like:

#imports
import sqlite3
import random
import timeit
from collections import OrderedDict

#Constants used as settings
alpha = 'abcdefghijklmnopqrstuvwxyz'
numRows = 5000

def createRandomDB():
    conn = sqlite3.connect('TestingIndex.py')
    curs = conn.cursor()
    curs.execute('drop table if exists IndexTestTbl')
    conn.commit()
    curs.execute('create table IndexTestTbl (col1, col2)')
    conn.commit()
    for i in range(numRows):
        col1 = random.randint(1, numRows)
        col2 = ''.join(random.choice(alpha) for x in range(25))
        curs.execute('insert into IndexTestTbl values (?, ?)', (col1, col2))
    conn.commit() #commit is expensive, to execute only after the loop

def readTable():
    #Does not actively destroy indexes. Assumes that the table lacks indexes
    conn = sqlite3.connect('TestingIndex.py')
    curs = conn.cursor()
    curs.execute('''select *
                    from IndexTestTbl t1
                    where exists (select * from IndexTestTbl t2
                                  where t1.col1 = t2.col1
                                  and t1.col2 != t2.col2)
                        ''')
    a = curs.fetchall()

def createIdxReadTable():
    conn = sqlite3.connect('TestingIndex.py')
    curs = conn.cursor()
    curs.execute('''create index if not exists
                     TestIdx on IndexTestTbl (col1)''')
    conn.commit()
    readTable()

###########################################
if __name__ == '__main__':
    resultsDict = OrderedDict() #Using OrderedDict to keep the plots in order
    createRandomDB()
    resultsDict['ReadNoIdx'] = timeit.timeit('readTable()',
            number = 1, setup='from __main__ import readTable')

    resultsDict['CreateIdxRead'] = timeit.timeit('createIdxReadTable()',
            number = 1, setup = 'from __main__ import createIdxReadTable')

    #drop the index
    conn = sqlite3.connect('TestingIndex.py')
    conn.execute('drop index TestIdx')
    conn.commit()

    resultsDict['ReadAfterDropIdx'] = timeit.timeit('readTable()',
            number = 1, setup='from __main__ import readTable')
    resultsDict['CreateIdxRead2'] = timeit.timeit('createIdxReadTable()',
            number = 1, setup = 'from __main__ import createIdxReadTable')

    #print the results
    print('Take one with no index required {} seconds'.format(resultsDict['ReadNoIdx']))
    print('Create Index and then read required {} seconds'.format(resultsDict['CreateIdxRead']))
    print('Read the table after dropping the Index {}'.format(resultsDict['ReadAfterDropIdx']))
    print('Create the index again and read takes {}'.format(resultsDict['CreateIdxRead2']))

    #graph the data
    import matplotlib.pyplot as plt
    width = .8
    fig = plt.figure()
    ax = plt.subplot(111)
    ax.bar(range(len(resultsDict)), resultsDict.values(), align = 'center', width = width)
    ax.set_xticks(range(len(resultsDict)))
    ax.set_xticklabels(resultsDict.keys(), rotation = 30, size = 'small')
    plt.tight_layout()
    plt.savefig('SqliteIndexTest.png')

The results:

Take one with no index required 1.088418062616674 seconds
Create Index and then read required 0.2158297379552767 seconds
Read the table after dropping the Index 1.0706404903284745
Create the index again and read takes 0.1789995581284507

 

Results Graph

Results Graph

In this case, it took longer to run the query without the index than it did to create the index and run the query, and the difference was quite substantial. Of course, this was deliberately set up as an extreme example. The rows in the heap were inserted in a random order and the query involved a correlated subquery. I also made sure that I ran the test from the traditional harddrive instead of the SSD. Still, I have seen similar cases in production environments where it took less time to create an index and run the query it was made to support than it did to run the query without bothering to create the index.

It is possible to become overly reliant on indexes, and when there are performance issues in practice I find that there is often more room for improvement and optimization in the queries and code than in the indexes. Still, under some circumstances the right indexes can make a dramatic difference in execution time.

MS Access – Store a Different Value than Displayed in a Combo Box

It is occasionally usefully to display a different values in an Access combo box than what is actually stored in the table.  This can be done if you want to store the primary key in another table to make a reference easier, but display some other data.  It can also simply help save storage space when that is a concern.  One way to do this is to bind the table to one column, but tell it to display two columns with the bound column having a space of 0.

Naturally, this is easier to see with an example, so let us make a totally artificial and painfully simple database to store book ratings.  We will have a table that lists the rating options with both a number and a text description of the rating.

ratingsoptions1
Then we’ll make a table with three columns: ID, BookName, and BookRating.  We will use this to store our ratings, and we will make BookRating be a number.  We will store the value of the rating rather than the text description.  In a large database, this could save some space.  Then we can make a form based on BookRatings, with the actual BookRating being a combo box.  Since this is just an example, we will keep the form almost painfully simple.
bookratingsdesignview1
We will make certain that the BookRating field is bound to the first column, the primary key, of the RatingsOptions.  But then, we will set the display to display two columns.  The first will be effectively invisible with a width of zero and the second will be displayed with a width of 1”.

propertysheets1

And this will give us the text descriptions listed as the options, but it will only store the numeric primary key.
formandtable1

Python Distributions

Python Distributions

Python is free and open source software.  A programmer could download the executables for Python directly from the official site or even download the sourcecode and compile it themselves.  But for someone wanting to get straight to programming in Python it is generally better to get a Python distribution.  The distributions will generally include a selection of third party libraries that are commonly used but not included with the core of Python, a solid IDE, and perhaps other development tools as well.  I take a look at three excellent contenders for Python Distributions here.

WinPython

 

Spyder IDE from Win Python

Spyder IDE from Win Python

WinPython is the version of Python I use at home and the one I personally recommend to anyone that does not want to spend a lot of money on a distribution.  It comes with a wide array of third party libraries including SciPy, making it ready to do many types of analytic work immediately.

WinPython is also designed so it can be run as portable software.  This makes it convenient for those that need more than one installation of Python, such as a 3.x and a 2.x installation running side by side.  Conveniently, WinPython offers both 3.x and 2.x installations.  Being portals also means there is no need for administrator access to the machine to use Python on it.

WinPython comes with Spyder, my favorite light weight IDE.  I use other options when working on large applications requiring numerous components, but for small Python scripts or small programs I think Spyder is highly conducive to my work flow.  WinPython also includes other nice development tools like Qt Designer and has Cython for performance.

 

Enthought Canopy Python

 

Enthought Canopy is a commercially supported version of Python.  It does offer a free version, but that is severely limited compared to the full version.  Enthought is also installed in the user folder and generally does not need local Administrator privileges.

Enthought Canopy comes with a large host of included libraries, including SciPy.  It also includes its own IDE.  While the IDE works quite well, I personally prefer Spyder.  The debugger is excellent, but is only included with the paid version.  The paid version also includes other additional features such as online training courses.  Enthought is currently only available in a 3.x version.

Overall, I think Enthought is an excellent distribution if you will use and are willing pay for its premium features such as its debugger and its training courses.  If you are looking for a free distribution then I think WinPython is generally superior, but a large part of that is that I am partial to Spyder.

 

Python (X, Y)

Python (X, Y) describes itself as “the scientific Python distribution” and is maintained by Gabi Davar.  This used to be my preferred distribution and it remains a well-made Python distribution.  It has a slower update speed than some others.  As I write this, the last update was posted in June, 2015.  It is also available only for Python 2.X.  Much like WinPython, it uses Spyder as its default IDE and comes with a variety of scientific libraries and additional tools.  Given the slow update speed, I viewed WinPython as a graceful upgrade path from Python(X, Y).

Conclusions and Other Distributions

Of course, those are far from the only distributions available.  I only discussed the ones that I had personally used.  I have also heard good things about the Anaconda distribution though I will not personally comment on it since I have not tried it.  From what I have personally experienced, I am happy to recommend Enthought for anyone that wants and is willing to pay for the premium version.  If you want a free version or if you will not use Enthought’s premium features, then WinPython has served me well.

 

 

 

 

Pronouns in technical writing.

When doing technical writing, or for that matter most forms of writing , we need to be able to refer to people without always identifying their gender.  The most common reason, at least in my writing, is that I am often speaking of some indeterminate person whose gender simply does not matter.  In my technical writing, I often speak of some generic DBA and what the best approach they could take to solving a problem is.  In my legal writing, I often speak of some generic person whose gender is irrelevant.  Often, the best approach is to recast the sentence in the plural, but when that is awkward or undesirably, I prefer the singular “they”.

While still somewhat controversial, and possibly not an ideal solution, the singular “they” seems the best answer to English’s lack of a good singular pronoun for people that is not gender specific.  It has been in use for quite a while, and its usage is becoming more accepted.  The Washington post recently allowed the singular they, although in somewhat limited circumstances, in a memo.  Merriam-Webster.com hosts a video discussing using “their” as a singular possessive pronoun.

There are several other options that could be used instead of the singular “they”.  “He or she” is commonly used in many places.  In one article I recently published, the article editor and I discussed whether to use “he or she” or “they” before we finally settled on “they.”  While somewhat a matter of taste, I find “he or she” to be unnecessarily long and wordy.  It becomes tiresome, especially when used several times in an article.  I think things like “s/he” and “he/she” are worse.  They are awkward when reading and not directly pronounceable.

An author could simply use one of “he” or “she” as a pronoun referring to a generic person without any implications for their gender.  There is in fact some indications, according to Wikipedia, that it is a long tradition for “he” to be used generically without specific reference to males.  But that has its own complications.  Aside from not being inclusive on its face, it could lead the reader to think the author is specifically referring to one gender or the other.

Perhaps the best solution would be to introduce a new word in English.  Attempts have been made to do this with proposals for “zhe”, “thon”, and “co”, but none of those have been widely adopted or accepted.  Unless English standardizes on a new addition, I think the singular “they” is the best choice for a gender neutral pronoun when the sentence cannot be gracefully rewritten to avoid the pronoun entirely.

New Publications and a Video

The people at Webucator were kind enough to make a video based on my article on dealing with SQL Server Lock Resource Errors. While I am certainly biased, I think the video came out quite well.  They also have a series of courses on SQL Server available here and their YouTube page with other videos is here.

Additionally, MSSQLTips has published another piece of mine dealing with Programmatically Changing SQL Server Linked Tables in Microsoft Access.

I also have a piece of microfiction up at SpeckLit.  It is titled Food Chain.  They have another piece of mine forthcoming that will be available here when it is published.

[Edit 25 Nov. 2015: Fixed a broken link, and put a title in proper title case]

Making Changes to Many Forms In Access

Although generally I prefer to create custom interfaces for my databases using fully developed programming languages like Python or C#. I prefer to have full access to their computational capabilities and in particular their graphing even when the initial plan does not need these. Projects have a tendency of expanding over time. But when I need to quickly create a simple front end for a database I sometimes turn to MS Access.

One issue I have faced occasionally is being asked to make the same change to all of the forms or all of the reports. Fortunately, the VBA built into Access makes this relatively simple for certain types of changes. You can iterate through all forms, make the desired change, and then save that change so it is permanent.

This is best described with an example, though I have only tested this with Access 2010 and 2013. For this example, I am just going to create a database with one table and two forms. I will put the VBA code itself into a module. The code will look like:

Public Function ChangeBkColor()
    Dim obj As AccessObject 'Entities in Allforms are AccessObjects
    Dim CurForm As Form
    
    'We want everything to be closed to avoid any problems
    'with reopening later
    For Each CurForm In Application.Forms
        DoCmd.Close acForm, CurForm.Name
    Next CurForm
    
    For Each obj In Application.CurrentProject.AllForms
        'If not opened as design, it may not save the change
        DoCmd.OpenForm obj.Name, acDesign
        'There should only be one open form now
        Set CurForm = Application.Forms(0)
        'Change the color to red, but could change others
        CurForm.Section(acDetail).BackColor = RGB(100, 0, 0)
        'Setting acSaveYes here is significant
        DoCmd.Close acForm, CurForm.Name, acSaveYes
    Next obj

End Function

I normally execute by highlighting the code in the VBA editor and then pressing F5, but it could also be tied to a macro if that is more convenient. Executing it then opens each form, changes the property in question, and then closes it again, saving the change. Using techniques like this has saved me a fair bit of time in trying to standardize certain things across numerous forms in a database.

SampleResults

SampleChange

Keep it short

I have been thinking about resumes a great deal lately. Since I recently passed the Bar Exam, I have been revising and sending out my resume in search of a legal job. At my current job, I have also fairly recently reviewed several resumes from new applicants. I cannot claim to be a resume expert, and I will not try to give any detailed advice. But there is one thing that many people that have given me advice have all agreed on, and that I wish the people sending in resumes that I have to review had remembered.

Keep it short.

When I looked into resumes before while looking for technical jobs and more recently when I was looking into them for legal jobs, virtually everyone recommended keeping them short. When I review resumes, I distinctly prefer them short. Exactly what short means varies somewhat by who is giving the advice, but some of them recommend strictly keeping it down to one page and a few others say that two is acceptable. When I have reviewed resumes at different times, a second page has never bothered me. But I have been rather annoyed when I have reviewed resumes as long as sixteen pages.

Generally, keeping it short will help ensure that the important information is easy to find and review. The unimportant information does not need to be there at all. There are occasions when a potential employer wants something longer and more detailed. However, those employers will generally make that clear by asking for a C.V. or even a portfolio rather than a resume. With a bit of help from a skilled counselor at my law school, I ruthlessly trimmed my resume down to one page, and I think it is stronger now that it is shorter.