python 中的时间戳、时区问题

2021-12-19

编程中常说的时间戳，一般指的是 Unix 时间 - 维基百科。它是一个绝对值：从 UTC1970 年 1 月 1 日 0 时 0 分 0 秒起至现在的总秒数，不考虑闰秒。

也就是说在同一时刻，在地球上的不同位置的电脑上，或者是设置了不同时区的电脑上，运行查询时间戳的程序，得到的值都是一样的，因为时间戳是相对于 UTC 时间的，和电脑所在的时区无关。

在 python 中，可以通过以下代码拿到现在的时间戳：

import time
print(int(time.time()))

既然时间戳已经无关于时区了，在存储和比较的过程中，尽量都使用时间戳，不去管各种语言、各种数据库提供的 date、time、datetime 类型等等，似乎就可以简化大部分的问题了。然而在实际程序中，避免不了和其他各种时间相关的类型相互转换，所以还是要了解一些其他东西。

MongoDB ObjectId

比如在 mongodb 中，默认生成的_id 其实是包含了时间戳信息的，一般要过滤数据库中某个时间段之内的数据时，可以通过bson.ObjectId.from_datetime构造一个_id，传入到查询中。但是如果不仔细查看 from_datetime 的文档，很容易出错。看下面的例子：

import bson
import datetime

a = datetime.datetime.now()
b = bson.ObjectId.from_datetime(a).generation_time

print(a, b)
print(a.tzinfo, b.tzinfo)
print(a.timestamp(), b.timestamp())

对比输出可以看到，a 和 b 的时间戳差了 8 小时，a 有 tzinfo，而 b 没有。查看 from_datetime 文档可以看到：

generation_time will be converted to UTC. Naive datetime instances will be treated as though they already contain UTC.

那么什么是 naive datetime?

在 python 中，datetime 有两种情况：

naive: 没有 tzinfo 的 datetime
timezone-aware: 有 tzinfo 的 datetime

带上 tzinfo，也就相当于带上了时区信息，在 python 中，tzinfo 必须都是datetime.datetime.tzinfo的子类。

datetime.datetime.tzinfo本身是一个抽象类，子类必须实现 name、utcoffset 和 dst 方法。这里不做过多介绍。

直接看 from_datetime 的源码：

@classmethod
def from_datetime(cls, generation_time):
    if generation_time.utcoffset() is not None:
        generation_time = generation_time - generation_time.utcoffset()
    timestamp = calendar.timegm(generation_time.timetuple())
    oid = struct.pack(
        ">I", int(timestamp)) + b"\x00\x00\x00\x00\x00\x00\x00\x00"
    return cls(oid)

看源码就很清晰地发现，如果有 utcoffset，会减去 utcoffset，然后直接用 timetuple 取出当前的年月日时分秒，传递给 calendar.timegm，而接着看 calendar.timegm 的代码得知默认会认为参数里的年月日时分秒等信息都是 GMT 时间（也就是 UTC+0）。

因此，调用bson.ObjectId.from_datetime时：

要么是 timezone-aware datetime
要么是 naive datetime（但是年月日时分秒必须和 UTC+0 保持一致）

pytz 的正确用法

上面提到datetime.datetime.tzinfo这个抽象类，在 python3.9 (zoneinfo.ZoneInfo) 之前，并没有一个官方库的子类实现支持各种时区，所以很多项目会使用 pytz 提供的 timezone。但是 pytz 的设计思路，与官方的 tzinfo 并不相同，由此会产生很多反常的问题。

先看两种比较常见的错误用法：

import datetime
import pytz

tz = pytz.timezone('Asia/Shanghai')
dt1 = datetime.datetime(year=2021, month=1, day=1, tzinfo=tz)
dt2 = datetime.datetime(year=2021, month=1, day=1).replace(tzinfo=tz)
print(dt1)  # 2021-01-01 00:00:00+08:06
print(dt2)  # 2021-01-01 00:00:00+08:06

看运行结果得知，dt1 和 dt2 都在+8 的基础上多了 6 分钟。查阅 pytz 的文档，可以了解到 pytz 只支持 localize 和 astimezone，对于 tzinfo 作为参数的用法并不支持（除了 UTC)。

This library only supports two ways of building a localized time. The first is to use the localize() method provided by the pytz library. This is used to localize a naive datetime (datetime with no timezone information):
The second way of building a localized time is by converting an existing localized time using the standard astimezone() method:

所以下面用法才是正确的：

dt3 = tz.localize(datetime.datetime(year=2021, month=1, day=1))
dt4 = pytz.utc.localize(datetime.datetime(year=2021, month=1, day=1)).astimezone(tz)
print(dt3)  # 2021-01-01 00:00:00+08:00
print(dt4)  # 2021-01-01 00:00:00+08:00

除此之外，这个库还提到了每次经过数学运算，比如加上一个 timedelta 之后，必须再次调用 normalize 修正时区。

tz = pytz.timezone('America/New_York')
dt1 = tz.localize(datetime.datetime(year=2021, month=1, day=1))

dt2 = dt1 + datetime.timedelta(days=90)
dt3 = tz.normalize(dt2)
print(dt2)  # 2021-04-01 00:00:00-05:00
print(dt3)  # 2021-04-01 01:00:00-04:00

可以看到如果不调用 normalize，有可能 offset 是错误的，因为纽约部分时间会变成夏时令，时间向前提了 1 小时。

其他常用函数

通过下面的函数，可以获取某个时间戳所在的 UTC+8 时区某天的零点的时间戳。

import time

def get_day_midnight_timestamp(ts=None):
    if not ts:
        ts = int(time.time())
    offset = (ts + 8 * 60 * 60) % (24 * 60 * 60)
    return int(ts - offset)

基本思路就是：根据时间戳的定义，当时间戳为 0 时，UTC+0 时区的时间为 1970 年 1 月 1 日 0 时，此时 UTC+8 时区的时间为 1970 年 1 月 1 日 8 时。

此时，UTC+8 时区距离 UTC+8 时区的 1970 年 1 月 1 日 0 时经过的秒数为 0 + 8 * 60 * 60。

以此类推，当时间戳为 ts 时，UTC+8 时区距离 UTC+8 时区的 1970 年 1 月 1 日 0 时经过的秒数为 ts + 8 * 60 * 60。

因为每天有24 * 60 * 60 秒，用ts + 8 * 60 * 60除以24 * 60 * 60的余数，就是比当天 0 点多的秒数，用 ts 减去这个秒数，就能得到当天 0 点的时间戳。