lua read file is too slow

Published On June 11, 2017

category lua


I was disappointed about lua recently.

lua is not convenient enough for C programmers

lua is known as a fast scripting language (maybe the fastest in the world) because it’s just like a wrapper of c language. lua is efficient and lightweight so we use lua in many cases when performance and overhead are both critical. We always write lua code with extreme caution for the sake of performance. For example, caching the module functions in local variable is a common practice.

But many times I find it inconvenient and inefficient. For example, string.sub will always create a new string for the sub string because string is immutable and gc won’t know when to free memory without copy. This is not efficient if I just want to discard the first few bytes of long char array.

what happened?

Recently I need read a lot of files in lua. The read speed is very very important. But I found the speed is very slow compared with C.

benchmark

Below is a simple benchmark:

Both programs read a 200K file 1 hundred thousand times. It’s 20GB in total.

a.txt is generate by redirect output of print(string.rep('1', 204800))

> ls -hs a.txt
204K a.txt

Don’t feel strange about the speed. The read speed is extremingly fast because system will cache the file in RAM, but it doesn’t matter in this example.

lua

  • luajit version: LuaJIT 2.0.4
  • lua version: Lua 5.1.4

It takes 19 seconds in luajit and 48s in lua.

local f = io.open('a.txt', 'rb')

s = os.time()
for i=1,100000 do
    c=f:read(200*1024)
    f:seek("set")
end
print(os.time()-s)

ffi enable us to call C functions dynamicly in lua. It's really easy to use compared with writing lua C bindings.

I know there is a pread system call in C which is often used to read from a specified position of a file to save on seek() calls. I use ffi to call pread in lua.

But unfortunetely the result is even worser. It takes 27 seconds by calling pread in ffi.

local ffi=require "ffi"

ffi.cdef[[
typedef long int off_t;
typedef long int ssize_t;
int fileno(struct FILE* stream);
ssize_t pread(int fd, void *buf, size_t count, off_t offset);
]]

local function pread(fd, count, offset)
    -- allocates a byte buffer of this size
    local buf = ffi.new("uint8_t[?]", count)
    ffi.C.pread(fd, buf, count, offset)
    -- copy to a lua string
    -- how to avoid copying?
    return ffi.string(buf, count)
end

local count, offset = 200*1024, 0
local f = io.open('a.txt', 'rb')

local fd = ffi.C.fileno(f)

s = os.time()
for i=1,100000 do
    c = pread(fd, count, offset)
end
print(os.time()-s)

c

It only takes 1.73 seconds to finish in c.

#include <stdio.h>
#include <time.h>
#include <malloc.h>
#include <assert.h>

#define COUNT 200*1024

int main()
{

    int n,i,fd,tmp;
    FILE *ptr;
    clock_t t;

    ptr = fopen("a.txt", "rb");  // r for read, b for binary
    fd = fileno(ptr);
    t = clock();
    for(i=0;i < 100000; i++)
    {
        unsigned char* buffer = malloc(sizeof(char)*COUNT);
        n=fread(buffer,1,COUNT,ptr);
        assert(n==COUNT);
        // it takes 11s without free
        free(buffer);
        fseek (ptr , 0 , SEEK_SET);
    }
    t = clock() - t;
    printf("%f", ((float)t)/CLOCKS_PER_SEC);
}

I also notice some interesting things:

  1. If I define buffer out of loop which means share the same buffer every time, the time consumption is the same. It means malloc won't take too much time to complete.
  2. It will takes 11s without free the buffer. Doing this will eat up to 20GB memory in 10 seconds. So becautious if you want to reproduce the result yourself.

I checked the luajit io library's source code. It first calls lj_str_needbuf to ask for a buffer from Lua_State then calls fread to read data to it. It then calls lj_str_new to allocate a new string and copy the content of the file to it before returning to lua. In the c code below, it first read the file to a buffer then malloc a new buffer and copy like what lua does in io library.

But it takes only 4.09 seconds by rewriting in c.

#include <stdio.h>
#include <time.h>
#include <malloc.h>
#include <string.h>
#include <assert.h>

#define COUNT 200*1024

int main()
{

    int n,i,fd,tmp;
    FILE *ptr;
    clock_t t;
    unsigned char* buffer = malloc(sizeof(char)*COUNT);

    ptr = fopen("a.txt", "rb");  // r for read, b for binary
    fd = fileno(ptr);
    t = clock();
    for(i=0;i < 100000; i++)
    {
        n=fread(buffer,1,COUNT,ptr);
        fseek (ptr , 0 , SEEK_SET);

        // this seems more efficiently, but unfortunetly I can not see any improvement as the file is cached
        // n=pread(fd, buffer, COUNT, 0);

        assert(n==COUNT);

        unsigned char* new_buffer = malloc(sizeof(char)*COUNT);
        memcpy(new_buffer, buffer, COUNT);
        free(new_buffer);
    }
    t = clock() - t;
    printf("%f", ((float)t)/CLOCKS_PER_SEC);
}

It takes double time with an extra memcpy operation. It means read a file is as fast as memcpy when the file is cached in memory. So one more memcpy will take double time.

python

What about python?

  • python version: 2.7.3

It takes 1.84 seconds →_→ ⊙ˍ⊙

import time

size = 200*1024

f = open('a.txt')

start = time.time()
for i in range(100000):
    content = f.read(size)
    assert(len(content)==size)
    f.seek(0, 0) # f.seek(offset, from_what)
end = time.time()
elapsed = end - start
print(elapsed)
python does a wonderful job in this case.

Final remarks

I know the demo above is a memory-hungry program and the garbage collector is too heavy. But still I cannot avoid the creation of large amounts of memory and reuse the buffer in lua or find a way to tune the collector to improve performance in this particular case.

I am not familiar with the underlying of lua. Can someone point why lua is so slow in the example above and what I can do to improve the performance of the program?


qq email facebook github
© 2018 - Xurui Yan. All rights reserved
Built using pelican