THttpServer: send request response's binary content to client, but don't close request, continue

Hi,

This is a continuation of THttpServer serve big binary files (bigger than RAM) on the fly (or CGI?) - ROOT - ROOT Forum (which is closed and cannot be replied to).

With handling a HTTP request withTHttpCallArg (for example by overriding MissedRequest) can we set a response binary content with SetBinaryContent (std::string), send it to client, but then don’t close the request. Then we should be able to do another SetBinaryContent to send another chunk, etc.

This would allow lots of streamed applications.

How complex would it be to allow this? Would this require a lot of modifications in THttpServer or just a few lines to authorize this?
Maybe just one small function that would do the flush() (i.e. send the current response body to client but don’t close connection), which is surely alreay available in CivetWeb? If so, could we expose this function public in THttpServer?

Maybe @linev knows about this?

I am very interested as it would be useful for a current work-in-progress.

Have a good day!

PS: pseudo code:

void MyServer::MissedRequest(THttpCallArg \*arg) {

    // set Transfer-Encoding: chunked   or https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Transfer-Encoding#directives ) or any other mean

    arg->SetBinaryContent(part1);
    arg->Flush_Send_Response();  // the data is sent to client but the connection is not closed,

    arg->SetBinaryContent(chunk2);
    arg->Flush_Send_Response();

    ...
}

Hi,

Plain HTTP requests does not work this way.

When one replies on the http request - http header send first.
And one of the field in the header - Content-Length.
See it in THttpCallArg::FillHttpHeader() method.

Change API here will make it too complicated for little use.
I propose to try web sockets - there you can send data in any chunks
But you should be aware of client side implementation as well.

Regards,
Sergey

Thanks for your answer @linev.

Actually, HTTP supports the Transfer-Encoding: chunked mode, when Content-Length is not known in advance.
Here is a low-level example of serving 10 GB file with 10 MB chunks, with write, and flush. (I used Python as pseudo-code but we could do it in any language).

from http.server import HTTPServer, BaseHTTPRequestHandler
import os
class Handler(BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == "/get_data":
            self.send_response(200)
            self.send_header('Content-Type', 'application/octet-stream')
            self.send_header('Content-Disposition', 'attachment; filename="random_data.bin"')
            self.send_header('Transfer-Encoding', 'chunked')
            self.end_headers()
            for _ in range(1000):
                chunk = os.urandom(10 * 1024 * 1024)
                chunk_len = f"{len(chunk):X}\r\n".encode('ascii')
                self.wfile.write(chunk_len)
                self.wfile.write(chunk)
                self.wfile.write(b"\r\n")
                self.wfile.flush()
            self.wfile.write(b"0\r\n\r\n")
            self.wfile.flush()
httpd = HTTPServer(('', 80), Handler)
httpd.serve_forever()

Do you think we could try something similar by modifying ROOT: net/http/src/TCivetweb.cxx Source File
with mg_send_chunk? ROOT: net/http/civetweb/civetweb.c File Reference

I think the use-case of having multi gigabytes of data is common in scientific applications, so this would be a great addition.


TL;DR:
Would you see a quick fix / hack so that we could directly use mg_write or mg_send_chunk inside THttpServer?
Do you see a way to call these Civet’s internals functions from THttpServer without recompiling it?

ROOT: net/http/civetweb/civetweb.c Source File line 7081:

/* Send a chunk, if "Transfer-Encoding: chunked" is used */
CIVETWEB_API int
mg_send_chunk(struct mg_connection *conn,
              const char *chunk,
              unsigned int chunk_len)
{
   char lenbuf[16];
   size_t lenbuf_len;
   int ret;
   int t;
   /* First store the length information in a text buffer. */
   sprintf(lenbuf, "%x\r\n", chunk_len);
   lenbuf_len = strlen(lenbuf);
   /* Then send length information, chunk and terminating \r\n. */
   ret = mg_write(conn, lenbuf, lenbuf_len);
   if (ret != (int)lenbuf_len) {
      return -1;
   }
   t = ret;
   ret = mg_write(conn, chunk, chunk_len);
   if (ret != (int)chunk_len) {
      return -1;
   }
   t += ret;
   ret = mg_write(conn, "\r\n", 2);
   if (ret != 2) {
      return -1;
   }
   t += ret;
   return t;
}

Maybe something like this in TCivetWeb.cxx?

   void WriteBuf(const void *buf, int len) override
   {
      if (fWSconn)
         mg_write(fWSconn, (const char *)buf, len);
   }

and then I am totally free to write any custom request handler by using WriteBuf()?
With this could I write the (low-level) header myself, and then write the chunks?

I was not aware of such chuncked data transfer - and never used it.

Proposed approach with:

    arg->SetBinaryContent(part1);
    arg->Flush_Send_Response();  // the data is sent to client but the connection is not closed,

    arg->SetBinaryContent(chunk2);
    arg->Flush_Send_Response();   

cannot be implemented - while THttpCallArg has no access to civetweb functionality.
THttpServer processes requests and then engine sends result of processing.
This two parts of code fully decoupled and normally run in different threads.

I can imagine different approach.
One can mark THttpCallArg as chunked data transfer.
So THttpServer::ProcessRequests will be invoked many times with the same THttpCallArg instance - until chunked_end is returned.

Would it fit to your needs?

Thanks again @linev for your answer.

One can mark THttpCallArg as chunked data transfer.
So THttpServer::ProcessRequests will be invoked many times with the same THttpCallArg instance - until chunked_end is returned.

This would be great!

Just to be sure, would this send the headers only once, or for each chunk?
We need the headers to be sent only once (and not for each chunk) at the beginning.

Here is what is needed as the full request response. The important thing is that everything is written on the fly, and not buffered in one big buffer of 10 GB.

HTTP/1.1 200 OK\r\n
Content-Type: application/octet-stream\r\n
Transfer-Encoding: chunked\r\n
\r\n
A00000\r\n
First chunk of 10 MB  (A00000 is hex endcoding of 10*1024*1024)
....................................
\r\n\r\n
A00000\r\n
Second part data
A00000\r\n
Third chunk data
0\r\n
\r\n

Would this be possible with your proposed method?

PS:

  • I propose to try web sockets - there you can send data in any chunks
    But you should be aware of client side implementation as well.

    Unfortunately, for our application, WS is not adapted.

  • For testing, can I just duplicate TCivetweb.cxx into TCivetwebCustom.cxx (and same for .h), modify a few constants, and use this as a new custom engine? Then for example we would use:
    serv = new THttpServer("http_custom_engine:8080");
    and we would avoid to have to recompile the whole ROOT, is that correct?

Changes required in several places - not only in the TCivetweb.cxx.
At least also in THttpCallArg.

So the only way to use new code (once it is there) - use of master branch or apply patch to some previous versions.

Yes, I managed to duplicate TCivetweb.cxx into a second engine TCivetwebCustom.cxx, that I can call from THttpServer with server->CreateEngine("http_custom:80");. It works.

What do you think I should change in begin_request_handler to allow this chunked mode?

One can mark THttpCallArg as chunked data transfer.
So THttpServer::ProcessRequests will be invoked many times with the same THttpCallArg instance - until chunked_end is returned

see [http] implement chunked requests by linev · Pull Request #19823 · root-project/root · GitHub

1 Like

Thanks a lot @linev, I will experiment around this in the next few days!

PS: I see

CAUTION: Example is not able to handle multiple requests at the same time

Does this mean it’s blocking: if a 10 GB file is currently being downloaded by client, we cannot access to /index.html after the download has started?

If so, is it true also if the server is started with http:8080?thrds=5?

Thanks again.

It means - if several clients wants to access “/chunked.txt” one have implement bookkeeping to distinguish these clients by THttCallArg instance. All other requests will be handled absolutely independent

1 Like

Did you try code from PR? Does it work for you?

1 Like

Thanks for your message @linev.

On a Ubuntu, I have cloned your branch, built ROOT from source like in Building ROOT from source - ROOT, I did all the steps including 5. source <installdir>/bin/thisroot.shand it worked.

But then when trying to compile your tutorial: cd root_src/tutorials, I fail with cmake . -Wno-dev: CMake Error: File /root/root_src/tutorials/test/Event.h does not exist. CMake Error at CMakeLists.txt:441 (configure_file): configure_file Problem configuring file
Indeed these files are not present.

With which command do you build the tutorials file httpchunked.C?

No need to build something.

Just run root tutorials/http/httpchunked.C.
If you want - you can run it in compiled mode root tutorials/http/httpchunked.C+

And then try wget http://localhost:8080/chunked.txt

1 Like

It works well, thanks @linev!
I even can stream the result of a tar compression “on the fly” directly to the client! See the following working code.
The only little problem I have is that, when reading chunks of 16 KB into a binary buffer buffer, then when doing

arg->SetBinaryContent(buffer);

or

arg->SetBinaryContent(std::string(buffer));

it doesn’t work until the end of the pipe reading (it stops before bytesRead = 0)

If instead I do arg->SetBinaryContent("aaa"); then it works until bytesRead = 0, as expected. So the problem is probably about putting raw binary content as request response. Should we add something for binary? Do you know what?
Something with application/octet-stream?
Something else?

Any idea for this small issue?


Full code reproducible example

  • Runnable with root httpchunked.C, then wget http://localhost:8080/test.zip.
  • It uses a dummy 1GB file /root/testfile that you can easily create with dd if=/dev/urandom of=/root/testfile bs=1024 count=1000000.
#include "THttpServer.h"
#include "THttpCallArg.h"
#include <cstring>
#include <thread>
class TChunkedHttpServer : public THttpServer {
  protected:
    int fCounter = 0;
    FILE* pipe;
    char buffer[16*1024];
    std::size_t bytesRead;
    void MissedRequest(THttpCallArg *arg) override {
        std::cout << "foo" << fCounter << "\n";
        if (strcmp(arg->GetFileName(), "test.zip"))
            return;
        arg->SetChunked();
        if (fCounter == 0) {  // beginning of the request
            pipe = popen("tar -czf - /root/testfile", "r");  
                   // 1GB testfile created with: dd if=/dev/urandom of=/root/testfile bs=1024 count=1000000
            if (!pipe) {
                std::cout << "popen() failed!" << std::endl;
                return;
            }
        }
        fCounter++;
        bytesRead = fread(buffer, 1, 16*1024, pipe);
        if (bytesRead == 0) {
            std::cout << "bytesRead = 0" << "\n";
            fCounter = 0;
            arg->SetChunked(kFALSE);
        }
        arg->SetBinaryContent("aaa");   // with this, it works **until the end** of the "tar" pipe, i.e. until bytesRead == 0. It works successfully but outputs "aaa" instead of the real content
        // arg->SetBinaryContent(std::string(buffer));    // here sadly it stops *before* bytesRead == 0. Why
    }
  public:
    TChunkedHttpServer(const char *engine) : THttpServer(engine) {}
    ClassDefOverride(TChunkedHttpServer, 0)
};

void httpchunked() {
    auto serv = new TChunkedHttpServer("http:8080");
    serv->SetTimer(1);  // reduce to minimum timeout for async requests processing
}

Hi,

You can try to use separately: THttpCallArg::SetContent() and THttpCallArg::SetContentType()

THttpCallArg::SetBinaryContent() setting “application/x-binary” as content type.

Regards,
Sergey

Thanks @linev, I already tried this, but it still stops before the end. Would you have a few seconds to test: root httpchunked.C , then wget http://localhost:8080/test.zip with:

(the dummy 1GB file /root/testfile is created with dd if=/dev/urandom of=/root/testfile bs=1024 count=1000000)

// httpchunked.C
#include "THttpServer.h"
#include "THttpCallArg.h"
#include <cstring>
class TChunkedHttpServer : public THttpServer {
  protected:
    int fCounter = 0;
    FILE* pipe;
    char buffer[16*1024];
    std::size_t bytesRead;
    void MissedRequest(THttpCallArg *arg) override {
        std::cout << "foo" << fCounter << "\n";
        if (strcmp(arg->GetFileName(), "test.zip"))
            return;
        arg->SetContentType("application/octet-stream");        
        arg->SetChunked();
        if (fCounter == 0) {
            pipe = popen("tar -czf - /root/testfile", "r");  // 1GB testfile created with: dd if=/dev/urandom of=/root/testfile bs=1024 count=1000000
            if (!pipe) {
                std::cout << "popen() failed!" << std::endl;
                return;
            }
        }
        fCounter++;
        bytesRead = fread(buffer, 1, 16*1024, pipe);
        if (bytesRead == 0) {
            std::cout << "bytesRead = 0" << "\n";
            fCounter = 0;
            arg->SetChunked(kFALSE);
        }
        // arg->SetBinaryContent("aaa");   // works **until the end** of the "tar" pipe, it works successfully  (but outputs aaa instead of the real content)
        arg->SetContent(buffer);    // here it stops *before* bytesRead == 0
    }
  public:
    TChunkedHttpServer(const char *engine) : THttpServer(engine) {}
    ClassDefOverride(TChunkedHttpServer, 0)
};
void httpchunked() {
    auto serv = new TChunkedHttpServer("http:8080");
     serv->SetTimer(1);  // reduce to minimum timeout for async requests processing
}

I think 99,9% of the work is done ;-), but there is a very little something (the problem is not the pipe / tar process, because when outputing SetContent(“aaa”), we read everything until the end of the tar output).

Would you have an idea?

You have failure when assign binary content as const char *. It fails when there are symbol 0 inside. One should do:

        if (bytesRead == 0) {
            std::cout << "bytesRead = 0" << "\n";
            fCounter = 0;
            arg->SetChunked(kFALSE);
            arg->SetContent("");
        } else {
            std::string sbuf(buffer, bytesRead);
            arg->SetContent(std::move(sbuf));  
        }
1 Like

Oh yes that’s right.

Now it works perfectly, I think your branch can be merged, do you think too @linev?


BTW if you merge it, do you think THttpCallArg can expose a unique identifier ID (maybe present in CivetWeb?), such that, in a case of different “concurrent” calls of void MissedRequest(THttpCallArg *arg), we can distinguish calls coming from 2 different clients for the same URL?
If there is access to a UUID like arg->requestUniqueID? If so, I’ll know how to implement all the rest.

All the best

For the moment you can use pointer on THttpCallArg as id.

1 Like