How to send 1 mln requests asyncronously - mechanism? (lemm.ee)

submitted 11 months ago by cuenca@lemm.ee to c/programming@lemmy.ml

10 comments fedilink hide all child comments

One needs to send 1 mln HTTP requests concurrently, in batches, and read the responses. No more than 100 requests at a time.

Which way will it be better, recommended, idiomatic?

Send 100 ones, wait for them to finish, send another 100, wait for them to finish… and so on
Send 100 ones. As a a request among the 100 finishes, add a new one into the pool. “Done - add a new one. Done - add a new one”. As a stream.

top 10 comments

sorted by: hot top controversial new old

[-] deegeese@sopuli.xyz 6 points 11 months ago

LOL your last post got helpful answers until you were super rude and an admin deleted it.

[-] GammaGames@beehaw.org 4 points 11 months ago

Third time’s the charm!

[-] cuenca@lemm.ee 2 points 11 months ago

I knew that you'd like it.

[-] cuenca@lemm.ee 1 points 11 months ago* (last edited 11 months ago)

ZOG yes. Until you came in it.

[-] peter@feddit.uk 5 points 11 months ago

Try asking this question 1 million times

[-] cuenca@lemm.ee 1 points 11 months ago

That's what I'm doing.

[-] vmaziman@lemm.ee 2 points 11 months ago* (last edited 11 months ago)

Maybe producer consumer?

Producer spits out all the messages to send out onto a message queue, fifo or whatever suits u.

Parrallelizable consumers (think deployed containers) listen to queue and execute request, get response and save it

Scale consumer count up or down as you need to deal with ratelimits

[-] cuenca@lemm.ee 1 points 11 months ago

What question have you answered?

[-] paysrenttobirds@sh.itjust.works 2 points 11 months ago

Yes, I think the second. You have a pool of 100 http clients and a queue of one million requests and a queue to accept the responses as the clients complete, and a little machine that waits for capacity in the client queue to send the next request until there are no more requests. If the response is important to this process, your machine is also pulling from the response queue as available and computing whatever it needs from that, for example to decide whether to abort the rest of the requests. Any other use of the responses can be handled outside this loop.

The other way would work fine, but I think it's actually slightly more complicated and slower because you now have a queue of 10000 batches of 100 requests each and the machine has to watch for all one hundred clients to complete before sending off the next batch. Otherwise, it's the same situation.