一道编程题——lucky triples

最近在使用google的过程中弹出一个编程闯关的页面，后来一查得知这是google的神秘招聘系统foobar。我并不觉得把这些题都做完了会得到什么结果，只是有空的时候做一做练练手而已。我以前参与过google code jam，早已习惯了被虐的感觉。

google foobar一共有5个level，level1有1道题，level2有2道题...每道题的时间都相当充裕，至少2天。每次答题前需要request获取题目，同时开始计时，submit之前可以使用verify命令验证是否通过所有的测试用例，提交后就可以去休息了。

下面的这道题是level3的第一题，本身没什么特别的，只是因为我被它折磨了整整两天，所以记录一下我的解题历程。

题目

完整的题目如下：

Find the Access Codes

In order to destroy Commander Lambda's LAMBCHOP doomsday device, you'll need access to it. But the only door leading to the LAMBCHOP chamber is secured with a unique lock system whose number of passcodes changes daily. Commander Lambda gets a report every day that includes the locks' access codes, but only she knows how to figure out which of several lists contains the access codes. You need to find a way to determine which list contains the access codes once you're ready to go in.

Fortunately, now that you're Commander Lambda's personal assistant, she's confided to you that she made all the access codes "lucky triples" in order to help her better find them in the lists. A "lucky triple" is a tuple (x, y, z) where x divides y and y divides z, such as (1, 2, 4). With that information, you can figure out which list contains the number of access codes that matches the number of locks on the door when you're ready to go in (for example, if there's 5 passcodes, you'd need to find a list with 5 "lucky triple" access codes).

Write a function answer(l) that takes a list of positive integers l and counts the number of "lucky triples" of (lst[i], lst[j], lst[k]) where i < j < k. The length of l is between 2 and 2000 inclusive. The elements of l are between 1 and 999999 inclusive. The answer fits within a signed 32-bit integer. Some of the lists are purposely generated without any access codes to throw off spies, so if no triples are found, return 0.

For example, [1, 2, 3, 4, 5, 6] has the triples: [1, 2, 4], [1, 2, 6], [1, 3, 6], making the answer 3 total.

Languages

To provide a Python solution, edit solution.py To provide a Java solution, edit solution.java

Test cases

Inputs: (int list) l = [1, 1, 1] Output: (int) 1

Inputs: (int list) l = [1, 2, 3, 4, 5, 6] Output: (int) 3

Use verify [file] to test your solution and see how it does. When you are finished editing your code, use submit [file] to submit your answer. If your solution passes the test cases, it will be removed from your home folder.

解法

要从数组l中找三个数x,y,z，使得x整除y，y整除z，也就是lucky triple，输出lucky triples的个数。

暴力解

l的长度不超过2000，一种简单暴力的解法是遍历所有的三元组，判断是否满足整除的条件。

def answer(l):
    count = 0
    length = len(l)
    if length < 3:
        return 0
    l = sorted(l)
    for i in range(length-2):
        for j in range(i+1, length-1):
            if l[j] % l[i] == 0:
                for k in range(j+1, length):
                    if l[k] % l[j] == 0:
                        count = count + 1
    return count

时间复杂度为c(n,3)=n(n-1)(n-2)/6,是n的三次方。当n为2000的时候显然会超时，但是它的输出是正确的，我利用它来计算一些测试用例的答案。

解法二（failed）

下面逐步对时间效率进行优化。 l可能包含大量重复的数，如果去掉重复的数计算量能明显减少。

首先去重并统计每个数出现的次数，然后再遍历所有三元组，此时所有的三元组都是不同的。那么该如何计算所有满足条件的三元组的个数？

如果i,j,k三个数是lucky triples，它们出现的次数分别为c[i],c[j],c[k]，因为i,j,k满足条件：j%i==0, k%j==0，所以从c[i]+c[j]+c[k]个i,j,k中任意取出3个数都能组成lucky triples，比如3个i，2个i和1个j，2个i和1个k...这是一个组合数问题

我一向认为数学里的排列组合是比较难的，可能是因为我的数学水平一直停留在高中吧（自嘲）。回忆一下高中知识，C(n,k)表示从n个数中取k个，有多少种组合，这k个数是没有顺序的，比如从1~5中去2个数，2，3和3，2是同一组。组合数的计算需要用到排列数的知识。A(n,k)表示从n个数中去k个数做排列，有多少种排列方式，还是上面的例子，2，3和3，2是不一样的排列。排列数的公式比较容易理解，第一次从n个数中取一个有n种，第二次从剩余的n-1个数中取一个有n-1种，取的顺序也就是排列的顺序，因此A(n,k)就是n(n-1)...*(n-k+1)，即A(n,k)=n!/(n-k)!。如果k=n，也就是全排列结果为n!，因此相同的k个数的排列方式有k!种，用排列数除以k!就得到了从n个数中取k个数的组合数C(n,k)=n!/((n-k)!k!)。

我使用下面的方法计算组合数：

# the number of combinations
import operator  
def combin(n,k):
    if n<k:
        return 0
    if k==n:
        return 1
    if k > n/2:
        k=n-k
    return  reduce(operator.mul, range(n - k + 1, n + 1)) /reduce(operator.mul, range(1, k +1))

这和我们人笔算的流程是一样的。注意不能使用factorial函数，那样会重复计算。实际上根本用不到这个函数，因为n为3，完全可以根据公式直接写计算过程。

继续上面的分析，不能简单的从i,j,k中任意取出3个数，因为这样会重复计算，比如从i,j,k1和从i,j,k2中任意取出3个数都包含了3个i的情况（如果i至少出现3次）。

为了做到不重不漏，结果看起来会比较麻烦：

对于不同的i，如果c[i]>=3，则计算c(c[i],3)
对于不同的j(j>i)，如果j%i==0并且c[i]+c[j]>=3，则计算c(c[i]+c[j],3)-c(c[i],3)-c(c[j],3)，why?类似从2个盒子里取3个球，要求每个盒子至少取一个球的问题
对于不同的k(k>j>i)，如果k%j==0,j%i==0，则有c[i]*c[j]*c[k]种

代码如下：

def answer(l):
    count = 0
    length = len(l)
    if length < 3:
        return 0
    c = {}
    for i in l:
        c[i] = 0
    for i in l:
        c[i] = c[i]+1

    l = sorted(l)
    L=[l[0]]
    for i in range(1,length):
        if l[i]==l[i-1]:
            continue
        L.append(l[i])
    length = len(L)
    # pdb.set_trace()
    for i in range(length):
        ci=c[L[i]]
        if ci>=3:
            count = count + ci*(ci-1)*(ci-2)/6
        for j in range(i+1,length):
            if L[j]%L[i]:
                continue
            cj=c[L[j]]
            s=ci+cj
            if s>=3:
                # count = count + ci*cj+(ci-1)*(cj-1)
                count = count + s*(s-1)*(s-2)/6
                if ci>=3:
                    count = count - ci*(ci-1)*(ci-2)/6
                if cj>=3:
                    count = count - cj*(cj-1)*(cj-2)/6
            for k in range(j+1, length):
                if L[k]%L[j]:
                    continue
                ck=c[L[k]]
                count=count+ci*cj*ck
    return count

时间效率的问题解决了，但5个测试用例只过了前两个。

怎么想也想不通，也许是重复的三个数不算，此时不再计算组合数，而是变成下面的问题：

#pick 3 balls from 3 box, how many combinations？
#balls in the same box are exactly the same
#two balls from different box are different from each other
def combine2(a,b=0,c=0):
    count = 0
    if a+b+c<3:
        return 0
    for i in range(4):
        if i>a:
            continue
        for j in range(4-i):
            if j>b or 3-i-j>c:
                continue
            count=count+1
    return count

直接使用该函数会有重复计算的问题，需要稍加变化，假设有1，2，3个盒子，每次只计算3个球来自以下盒子的情况：

1 1,2 1,2,3

于是我又改成下面的版本：

# only count different tuple
def answer(l):
    count = 0
    length = len(l)
    if length < 3:
        return 0
    c = {}
    for i in l:
        c[i] = 0
    for i in l:
        c[i] = c[i]+1

    l = sorted(l)
    L=[l[0]]
    for i in range(1,length):
        if l[i]==l[i-1]:
            continue
        L.append(l[i])
    length = len(L)
    # pdb.set_trace()
    for i in range(length):
        ci=c[L[i]]
        for j in range(i+1,length):
            if L[j]%L[i]:
                continue
            cj=c[L[j]]
            for k in range(j+1, length):
                if L[k]%L[j]:
                    continue
                ck=c[L[k]]
                count=count+1
            if ci>1:
                count = count + 1
            if cj>1:
                count = count + 1
        if ci>=3:
            count = count + 1
    return count

可还是只过了前两个用例。

当天晚上我在床上左思右想，一直到凌晨4点多才睡去，那个时候我特别绝望，发誓等这道题accept了就不想再挑战后面的题了。

第二天我反复看了题目和我的代码，实在找不出还有什么问题。我试图不再想这道题，可这一天心情一直很郁闷。

解法三（time exceeds）

又过了一天，我再次看了一下题目，原来是我忽视了下面这句话里的i < j < k

counts the number of "lucky triples" of (lst[i], lst[j], lst[k]) where i < j < k

oh,my god!原来排序是我噩梦的开始。原来一切都是因为我读题不认真。比如输入是[2, 1, 4]，上面的程序会得到1，而正确答案是0。

又改了一版，不再排序，合并相邻的重复数：

def answer(l):
    count = 0
    length = len(l)
    if length < 3:
        return 0
    d = []
    i = 0
    while i < length:
        c = 1
        while i+c < length and l[i+c] == l[i]:
            c = c + 1
        d.append((l[i], c))
        i = i + c
    # pdb.set_trace()
    length = len(d)
    for i in range(length):
        iv,ic = d[i]
        if ic>=3:
            count = count + ic*(ic-1)*(ic-2)/6
        for j in range(i+1,length):
            jv,jc=d[j]
            if jv%iv:
                continue
            s=ic+jc
            if s>=3:
                count = count + s*(s-1)*(s-2)/6
                if ic>=3:
                    count = count - ic*(ic-1)*(ic-2)/6
                if jc>=3:
                    count = count - jc*(jc-1)*(jc-2)/6
            for k in range(j+1, length):
                kv,kc = d[k]
                if kv%jv:
                    continue
                count=count+ic*jc*kc
    return count

还是超时！naniiiiiii

考虑下面的输入:

1,2,1,2,1,2....

没有相邻的重复数，导致遍历了所有的三元组，此时该方法的时间复杂度比最简单的暴力解法还高。

如果提前计算出每个数能整除的所有数，只遍历满足条件的三元组，时间复杂度会下降很多。

# improve answer above
def answer(l):
    count = 0
    length = len(l)
    if length < 3:
        return 0
    d = []
    i = 0
    while i < length:
        c = 1
        while i+c < length and l[i+c] == l[i]:
            c = c + 1
        d.append((l[i], c))
        i = i + c

    m = {}
    length = len(d)
    for i in range(length):
        m[i] = []
        iv, ic = d[i]
        for j in range(i+1,length):
            jv,jc=d[j]
            if jv%iv==0:
                m[i].append(j)

    for i in range(length):
        iv,ic = d[i]
        if ic>=3:
            count = count + ic*(ic-1)*(ic-2)/6
        for j in m[i]:
            jv,jc=d[j]
            s=ic+jc
            if s>=3:
                count = count + s*(s-1)*(s-2)/6
                if ic>=3:
                    count = count - ic*(ic-1)*(ic-2)/6
                if jc>=3:
                    count = count - jc*(jc-1)*(jc-2)/6
            for k in m[j]:
                kv,kc = d[k]
                count=count+ic*jc*kc
    return count

依旧超时，仔细一想对于上面的输入时间只是减半了而已，其实此时我已经无限接近标准答案了。

最终解

最终的答案很简单，完全不考虑组合数的计算。

对于l中的每个数i，先找出它能整除的所有数j1,j2...，j在i的后面。这样，以i开头的lucky triples的个数就是j1,j2...能整除的数的个数。

此时的时间复杂度是n2，时间效率应该能接受。当n为2000时，时间变为原来的1/2000，如果原来的时间是2000s（30多分钟），那么现在只需要1s，可见将时间复杂度降维的优化效果相当明显。

最终的答案很简单：

def answer(l):
    count = 0
    length = len(l)
    if length < 3:
        return 0
    # m = [[]] * length
    m=[]
    for i in range(length):
        m.append([])
    for i in range(length):
        for j in range(i+1, length):
            if l[j] % l[i] == 0:
                m[i].append(j)
    for divs in m:
        count = count + sum([len(m[div]) for div in divs])
    return count

终于完全pass了。我长叹了一口气！

完整的测试用例如下：

if __name__ == '__main__':
    from test_func import test

    input1=([1]*1999)
    input1.append(999999)
    test(answer2,
        [
          ([],0),
          ([1],0),
          ([1,2],0),
          ([1,2,3],0),

          ([1,2,4],1),
          ([1,1,1],1),

          ([2,1,4],0),

          ([2,2,2,3],1),
          ([1,1,1,2],4),

          ([1,1,2,3],2),
          ([3,2,2,4],1),
          ([1,2,2,2],4),
          ([2,2,4,4,1,1],4),

          ([1,1,1,1,1,1,1,1,1,1],120),
          ([1,2,4,8,16],10),

          ([1,2,3,4,5,6],3),

          ([1 for i in range(1000)],166167000),
          ([1 for i in range(2000)],1331334000),
          ([i+1 for i in range(1000)],16287),
          ([i+1 for i in range(2000)],40888),
          ([i+1000000 for i in range(1000)],0),
          ([1,2]*1000,665667000),# time limit exceeded
          (input1, 1331334000)
        ]
        )

输出

+---------------------------------+---------+------------+--------+--------+
|              func               | time(s) |   result   | expect | status |
+=================================+=========+============+========+========+
| answer([])                      | 0.000   | 0          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1])                     | 0.000   | 0          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 2])                  | 0.000   | 0          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 2, 3])               | 0.000   | 0          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 2, 4])               | 0.000   | 1          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 1, 1])               | 0.000   | 1          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([2, 1, 4])               | 0.000   | 0          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([2, 2, 2, 3])            | 0.000   | 1          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 1, 1, 2])            | 0.000   | 4          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 1, 2, 3])            | 0.000   | 2          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([3, 2, 2, 4])            | 0.000   | 1          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 2, 2, 2])            | 0.000   | 4          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([2, 2, 4, 4, 1, 1])      | 0.000   | 4          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 1, 1, ..., 1, 1, 1]) | 0.000   | 120        |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 2, 4, 8, 16])        | 0.000   | 10         |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 2, 3, 4, 5, 6])      | 0.000   | 3          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 1, 1, ..., 1, 1, 1]) | 0.142   | 166167000  |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 1, 1, ..., 1, 1, 1]) | 0.566   | 1331334000 |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 2, 3, ...999, 1000]) | 0.046   | 16287      |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 2, 3, ...999, 2000]) | 0.191   | 40888      |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1000000, ..., 1000999]) | 0.047   | 0          |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 2, 1, ..., 2, 1, 2]) | 0.485   | 665667000  |        | √      |
+---------------------------------+---------+------------+--------+--------+
| answer([1, 1, 1, ...1, 999999]) | 0.560   | 1331334000 |        | √      |
+---------------------------------+---------+------------+--------+--------+
all 23 test(s) passed

我这里所使用的测试工具会在下一篇博客中介绍。

总结

我想程序员最可悲的时候就是为了解一道题可以废寝忘食，解出一道题又能怎样，还有千千万万在有限的人生种解不出来的题。不管怎么说，面对道题的时候还是得好好读题。