Python Distributions

Python Distributions

Python is free and open source software.  A programmer could download the executables for Python directly from the official site or even download the sourcecode and compile it themselves.  But for someone wanting to get straight to programming in Python it is generally better to get a Python distribution.  The distributions will generally include a selection of third party libraries that are commonly used but not included with the core of Python, a solid IDE, and perhaps other development tools as well.  I take a look at three excellent contenders for Python Distributions here.

WinPython

 

Spyder IDE from Win Python

Spyder IDE from Win Python

WinPython is the version of Python I use at home and the one I personally recommend to anyone that does not want to spend a lot of money on a distribution.  It comes with a wide array of third party libraries including SciPy, making it ready to do many types of analytic work immediately.

WinPython is also designed so it can be run as portable software.  This makes it convenient for those that need more than one installation of Python, such as a 3.x and a 2.x installation running side by side.  Conveniently, WinPython offers both 3.x and 2.x installations.  Being portals also means there is no need for administrator access to the machine to use Python on it.

WinPython comes with Spyder, my favorite light weight IDE.  I use other options when working on large applications requiring numerous components, but for small Python scripts or small programs I think Spyder is highly conducive to my work flow.  WinPython also includes other nice development tools like Qt Designer and has Cython for performance.

 

Enthought Canopy Python

 

Enthought Canopy is a commercially supported version of Python.  It does offer a free version, but that is severely limited compared to the full version.  Enthought is also installed in the user folder and generally does not need local Administrator privileges.

Enthought Canopy comes with a large host of included libraries, including SciPy.  It also includes its own IDE.  While the IDE works quite well, I personally prefer Spyder.  The debugger is excellent, but is only included with the paid version.  The paid version also includes other additional features such as online training courses.  Enthought is currently only available in a 3.x version.

Overall, I think Enthought is an excellent distribution if you will use and are willing pay for its premium features such as its debugger and its training courses.  If you are looking for a free distribution then I think WinPython is generally superior, but a large part of that is that I am partial to Spyder.

 

Python (X, Y)

Python (X, Y) describes itself as “the scientific Python distribution” and is maintained by Gabi Davar.  This used to be my preferred distribution and it remains a well-made Python distribution.  It has a slower update speed than some others.  As I write this, the last update was posted in June, 2015.  It is also available only for Python 2.X.  Much like WinPython, it uses Spyder as its default IDE and comes with a variety of scientific libraries and additional tools.  Given the slow update speed, I viewed WinPython as a graceful upgrade path from Python(X, Y).

Conclusions and Other Distributions

Of course, those are far from the only distributions available.  I only discussed the ones that I had personally used.  I have also heard good things about the Anaconda distribution though I will not personally comment on it since I have not tried it.  From what I have personally experienced, I am happy to recommend Enthought for anyone that wants and is willing to pay for the premium version.  If you want a free version or if you will not use Enthought’s premium features, then WinPython has served me well.

 

 

 

 

Pronouns in technical writing.

When doing technical writing, or for that matter most forms of writing , we need to be able to refer to people without always identifying their gender.  The most common reason, at least in my writing, is that I am often speaking of some indeterminate person whose gender simply does not matter.  In my technical writing, I often speak of some generic DBA and what the best approach they could take to solving a problem is.  In my legal writing, I often speak of some generic person whose gender is irrelevant.  Often, the best approach is to recast the sentence in the plural, but when that is awkward or undesirably, I prefer the singular “they”.

While still somewhat controversial, and possibly not an ideal solution, the singular “they” seems the best answer to English’s lack of a good singular pronoun for people that is not gender specific.  It has been in use for quite a while, and its usage is becoming more accepted.  The Washington post recently allowed the singular they, although in somewhat limited circumstances, in a memo.  Merriam-Webster.com hosts a video discussing using “their” as a singular possessive pronoun.

There are several other options that could be used instead of the singular “they”.  “He or she” is commonly used in many places.  In one article I recently published, the article editor and I discussed whether to use “he or she” or “they” before we finally settled on “they.”  While somewhat a matter of taste, I find “he or she” to be unnecessarily long and wordy.  It becomes tiresome, especially when used several times in an article.  I think things like “s/he” and “he/she” are worse.  They are awkward when reading and not directly pronounceable.

An author could simply use one of “he” or “she” as a pronoun referring to a generic person without any implications for their gender.  There is in fact some indications, according to Wikipedia, that it is a long tradition for “he” to be used generically without specific reference to males.  But that has its own complications.  Aside from not being inclusive on its face, it could lead the reader to think the author is specifically referring to one gender or the other.

Perhaps the best solution would be to introduce a new word in English.  Attempts have been made to do this with proposals for “zhe”, “thon”, and “co”, but none of those have been widely adopted or accepted.  Unless English standardizes on a new addition, I think the singular “they” is the best choice for a gender neutral pronoun when the sentence cannot be gracefully rewritten to avoid the pronoun entirely.

New Publications and a Video

The people at Webucator were kind enough to make a video based on my article on dealing with SQL Server Lock Resource Errors. While I am certainly biased, I think the video came out quite well.  They also have a series of courses on SQL Server available here and their YouTube page with other videos is here.

Additionally, MSSQLTips has published another piece of mine dealing with Programmatically Changing SQL Server Linked Tables in Microsoft Access.

I also have a piece of microfiction up at SpeckLit.  It is titled Food Chain.  They have another piece of mine forthcoming that will be available here when it is published.

[Edit 25 Nov. 2015: Fixed a broken link, and put a title in proper title case]

Making Changes to Many Forms In Access

Although generally I prefer to create custom interfaces for my databases using fully developed programming languages like Python or C#. I prefer to have full access to their computational capabilities and in particular their graphing even when the initial plan does not need these. Projects have a tendency of expanding over time. But when I need to quickly create a simple front end for a database I sometimes turn to MS Access.

One issue I have faced occasionally is being asked to make the same change to all of the forms or all of the reports. Fortunately, the VBA built into Access makes this relatively simple for certain types of changes. You can iterate through all forms, make the desired change, and then save that change so it is permanent.

This is best described with an example, though I have only tested this with Access 2010 and 2013. For this example, I am just going to create a database with one table and two forms. I will put the VBA code itself into a module. The code will look like:

Public Function ChangeBkColor()
    Dim obj As AccessObject 'Entities in Allforms are AccessObjects
    Dim CurForm As Form
    
    'We want everything to be closed to avoid any problems
    'with reopening later
    For Each CurForm In Application.Forms
        DoCmd.Close acForm, CurForm.Name
    Next CurForm
    
    For Each obj In Application.CurrentProject.AllForms
        'If not opened as design, it may not save the change
        DoCmd.OpenForm obj.Name, acDesign
        'There should only be one open form now
        Set CurForm = Application.Forms(0)
        'Change the color to red, but could change others
        CurForm.Section(acDetail).BackColor = RGB(100, 0, 0)
        'Setting acSaveYes here is significant
        DoCmd.Close acForm, CurForm.Name, acSaveYes
    Next obj

End Function

I normally execute by highlighting the code in the VBA editor and then pressing F5, but it could also be tied to a macro if that is more convenient. Executing it then opens each form, changes the property in question, and then closes it again, saving the change. Using techniques like this has saved me a fair bit of time in trying to standardize certain things across numerous forms in a database.

SampleResults

SampleChange

Keep it short

I have been thinking about resumes a great deal lately. Since I recently passed the Bar Exam, I have been revising and sending out my resume in search of a legal job. At my current job, I have also fairly recently reviewed several resumes from new applicants. I cannot claim to be a resume expert, and I will not try to give any detailed advice. But there is one thing that many people that have given me advice have all agreed on, and that I wish the people sending in resumes that I have to review had remembered.

Keep it short.

When I looked into resumes before while looking for technical jobs and more recently when I was looking into them for legal jobs, virtually everyone recommended keeping them short. When I review resumes, I distinctly prefer them short. Exactly what short means varies somewhat by who is giving the advice, but some of them recommend strictly keeping it down to one page and a few others say that two is acceptable. When I have reviewed resumes at different times, a second page has never bothered me. But I have been rather annoyed when I have reviewed resumes as long as sixteen pages.

Generally, keeping it short will help ensure that the important information is easy to find and review. The unimportant information does not need to be there at all. There are occasions when a potential employer wants something longer and more detailed. However, those employers will generally make that clear by asking for a C.V. or even a portfolio rather than a resume. With a bit of help from a skilled counselor at my law school, I ruthlessly trimmed my resume down to one page, and I think it is stronger now that it is shorter.

Dealing with SQL Server LOCK resource errors.

I was recently doing some testing that required a fairly large table. I created the test table, and set it up to insert about 30 million rows. I got an error that read:

The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.

Naturally, having not encountered that before, I started by Googling the error and looking up Error 1204 and Microsoft’s Support Article. The support article helpfully suggested setting locks to 0 using sp_configure. But that is the default value and mine was already set that way.

So, I created a simplified setup that would reliably produce the error based on Jeff Moden’s Tally Table Article.

if OBJECT_ID('BigTallyTable', 'U') is not NULL

drop table dbo.BigTallyTable

create table BigTallyTable
(n BigInt primary key)


insert into dbo.BigTallyTable(n)
select top (30000000)
row_number() over (order by s1.name) as n
from master.dbo.syscolumns s1,
master.dbo.syscolumns s2

And I started playing with it. Since this all tied to locks, one thing I did was break it up into batches:

declare @maxValue int, @maxperround int, @last int
set @maxValue = 30000000 --30 Million
set @maxPerRound = 500000


set @last = 0
while @last < @maxValue
begin

if @maxPerRound > @maxValue - @last
set @maxPerRound = @maxValue - @last

insert into dbo.BigTallyTable(n)
select top (@maxPerRound)
row_number() over (order by s1.name) + @last as n
from master.dbo.syscolumns s1,
master.dbo.syscolumns s2

Print 'Finished a round with @last = ' + cast(@last as varchar) + ' and @maxPerround = ' + cast(@maxPerRound as varchar)

set @last = @last + @maxPerRound
end

This avoided the error and generated the results I was looking for. On a multi-user system (I was doing this on my test instance of SQL which is on my personal machine) it would have also allowed other queries that were waiting for locks to be able to execute more freely, though whether that is necessarily good or not depends on the circumstances.

I also got it to lock by directing SQL Server to take a table lock right from the beginning.

insert into dbo.BigTallyTable with (TABLOCK) (n)

select top (30000000)
row_number() over (order by s1.name) as n
from master.dbo.syscolumns s1,
master.dbo.syscolumns s2

This ran substantially faster than splitting it into batches did and is clearly a simpler query. It does take a more aggressive table lock which, as the article on lock escalation points out, can cause more problems with blocking other queries if that is a concern.

Order of a Result Set in SQL

I have seen people build procedures that rely on a result set being returned in a certain order.  This is fine if an order by clause is used, but SQL Server will not guarantee the order of the results without one.

It is interesting to look at the way SQL Server normally returns a result set though.  So, lets make a small sample table:

create table orderTest (n varchar(15))

insert into orderTest (n)
values  (2),
(1),
(3),
(1)

If we run a simple “select n from orderTest” the results I get back are 2, 1, 3, 1.  In other words, it returns the results in the order they were inserted.  This is relatively consistent.  I have tried similar tests on multiple versions of sql on multiple machines.  As I said at the beginning, I have seen some programs rely on the data being returned in the order it was inserted and those programs have gone for extended periods of time without any problems.  But I want to emphasize that SQL Server will not guarantee the order without an order by clause.  If you want to be able to guarantee that you can get the data back in the order that it was inserted in then you can add an identity column and order by that.

The tendency to return things in the order they were inserted seems fairly stable with simple where clauses, but asking SQL Server to return distinct results will change the order as it checks for duplicates.  For instance “select distinct n from orderTest” gives me 1, 2, 3.

And SQL Server seems to return the results of a union all by just concatenating the results of the two queries.  So

select n
from orderTest
UNION ALL
select n
from orderTest

Gives 2, 1, 3, 1, 2, 1, 3, 1.  Essentially, the contents of the table repeated twice.  But union makes the results distinct and so it changes the order.  Redoing that with union  instead of union all gives 1, 2, 3.

Indexes will also affect the order.  You may notice the tests were all done on a heap with no indexes.  But if I add an index (either clustered or nonclustered) it makes the order for “select n from orderTest” become 1, 1, 2, 3.  Of course, you cannot always rely on the data being returned in index order either, unless there is an order by clause.  Similarly, as Conor Cunningham pointed out, the size of the dataset can affect it and break apparent order as the optimizer seeks to increase parallelization.

In short, SQL Server has certain ways it will normally return data and it can be helpful to be aware of those.  Even small changes to the query or changes in the size of the data can have a major impact on the order the results are returned.  Without an order by clause, SQL is not guaranteed to return the data in the way it normally does and you cannot rely on that order.  If the order actually matters, use an order by clause.