foobuzz

by Valentin, November 26 2015, in tech

Datetime in Python

Last Friday I spent two hours fighting with the datetime class from Python's datetime module. I originally thought that a datetime object represented a single point in time, which is not always the case. I sum up with a few bullets points what I learned about this module:

  • In its simplest form, a datetime object is just a date and a time as you would write it down on a piece of paper. It's a year, a month, a day, and, optionally, an hour, a minute, a second and some microseconds (the 'time' part of 'datetime').

  • So, in this simplest form, a datetime object does not represent a given point in time because it depends on the timezone in which you consider your date and time. It's up to interpretation. Such datetime objects are called naive. You can make a datetime object aware by giving to it the timezone that should be used to interpret this date and time. An aware datetime object does represent a given point in time.

  • The optionnal timezone carried by a datetime object is given by the tzinfo attribute, whose value is of type tzinfo. The tzinfo class is an abstract base class, meaning you can't directly instanciate a tzinfo object. You must instanciate a subclass of tzinfo, such as timezone. But to construct a timezone object, you must give it a timedelta object. The timedelta class represents a difference between two datetimes. When used to construct a timezone, it indicates the offset in relation to UTC. For example, timezone(timedelta(hours=2)) is a valid tzinfo object which represents the timezone UTC+2. timezone.utc is a shortcut for timezone(timedelta(0)).

  • Most methods which return a datetime object return a naive object. The method now returns the local and naive date and time; the method utcnow returns the UTC naive date and time. Since Python 3.2, the strptime method can produce an aware datetime object from a string. The string must contain the timezone in the +HHMM or -HHMM format (for example +0200 for UTC+2), and the placeholder to use is %z to capture the timezone part.

Practical example

Let's say you've acquired such a string representating a date and a time: Mon Nov 23 20:06:13 CET 2015. It's actually the default output's format of the date command in bash. Let's say you want to write a Python script that tells you the exact time difference between the moment the script is executed and the date and time represented by this string.

We're in bad luck here because Python won't be able to tell us to what timezone CET corresponds to. Remember, it can only parse timezones if they're in the form +HHMM. So we need to do some basic research. It turns out that CET stands for Central European Time and is UTC+1.

Now that we have such knowledge, a solution is to replace CET by +0100 in the string then let Python produces an aware datetime object.

from datetime import datetime, timedelta, timezone

DATETIME_STRING = 'Mon Nov 23 20:06:13 CET 2015'
string = DATETIME_STRING.replace('CET', '+0100')
dt = datetime.strptime(string, '%a %b %d %H:%M:%S %z %Y')

Now, we need the current time. There are two functions to get the current time: now or utcnow. The problem with now is that it returns the current local time. So we would need to figure out what is the local timezone, which isn't straightforward (more on that later). utcnow returns the UTC time, so we know the timezone by definition. Note that even if we know the timezone, utcnow still returns a naive object, so we'll have to manually set the timezone to UTC:

now = datetime.utcnow()
now = now.replace(tzinfo=timezone.utc) # Getting an aware datetime object

We can now subtract the two datetime objects to obtain a timedelta object, which has the fancy seconds attribute:

delta = now - dt
print(delta.seconds) # Prints the timespan in seconds between dt and now

Getting the local timezone

In the last part, another strategy would have been to use the now to obtain the local time and then set the local timezone to the datetime object obtained. But getting the local timezone isn't easy. I haven't found any function in the documentation doing that, and posts on the subject on StackOverflow advice using the tzlocal module, which isn't present in the standard library.

It's still possible with the vanilla datetime module.

First, a solution that would work sometimes:

diff = datetime.now() - datetime.utcnow()
minutes = round(diff.seconds / 60)
local_timezone = timezone(timedelta(minutes=minutes))

Here we get the timezone as the difference between now and utcnow. But since the two functions are not executed at the exact same time, the difference doesn't produce a timedelta object with a whole number of minutes, which is the condition for a timedelta to be used to construct a timezone. We get a whole number of minutes by rounding the number of seconds divided by 60. Then we construct a timezone thanks to a new timedelta.

Now imagine that this code runs on a very slow computer and more than a minute is gone between the execution of the two functions, and you've got a corrupted timezone. So, it doesn't really work.

To make it work we can use a timestamp. Python can parse POSIX timestamp into datetime objects. We can use fromtimestamp to get the local datetime from a timestamp and utcfromtimestamp to get the UTC datetime from the same timestamp, then substract the two, which, this time, will represent the exact same instant:

TIMESTAMP = 42
dt_utc = datetime.utcfromtimestamp(TIMESTAMP)
dt_local = datetime.fromtimestamp(TIMESTAMP)
diff = dt_local - dt_utc
local_timezone = timezone(diff)

Well, to be perfectly honnest, I'm still not absolutely convinced that this would work 100% of the time. Depending on the implementation of the fromtimestamps methods, there may exist edge cases causing trouble some times. I don't know. At the end of the day, the true way to get the timezone is to look at the operating system's specific configuration file containing such information, which is set by the user when it installs the operating system. This is what the tzlocal module does by the way. Python probably does it too when it does anything UTC.