ruby on rails - Puma or Unicorn VS Webbrick load test benchmark shows no improvement -
the setup
ok, running rails application on heroku(free tier).
i have 2 separate application releases, lets call them staging , fake-production.
in staging, using webbrick server. procfile
web: rails s -p $port
in fake-production, using puma server. procfile
is
bundle exec puma -c config/puma.rb
i have configured puma run 2 workers , 1 thread per worker. config/puma.rb
defined below (taken heroku's setting puma webserver)
workers integer(env['web_concurrency'] || 2) threads_count = integer(env['max_threads'] || 1) threads threads_count, threads_count preload_app! rackup defaultrackup port env['port'] || 3000 environment env['rack_env'] || 'development' on_worker_boot # worker specific setup rails 4.1+ # see: https://devcenter.heroku.com/articles/deploying-rails-applications- with-the-puma-web-server#on-worker-boot activerecord::base.establish_connection end
my database.yml
configured have connection pool of 20.
the test
in order load testing, used apachebench tool laptop hit api endpoint. api simple database query return fixed amount of records (does not change).
i hit both deployments following code:
ab -n 1000 -c 100 https://<some heroku endpoint>?access_token=f73f50514c
the results
the results here surprising. expecting puma deployment comepletely trash webbrick deployment, in reality, same. tried hitting different api endpoints different combination of puma workers , threads (at 1 point, 4 workers , 5 threads) , yet there wasnt visible improvements.
webbrick results
server software: webrick/1.3.1 server hostname: webbrick-build.herokuapp.com server port: 443 ssl/tls protocol: tlsv1,dhe-rsa-aes128-sha,2048,128 document path: /api/v1/packages?access_token=f73f50514c6 document length: 488 bytes concurrency level: 100 time taken tests: 21.484 seconds complete requests: 1000 failed requests: 0 total transferred: 995000 bytes html transferred: 488000 bytes requests per second: 46.55 [#/sec] (mean) time per request: 2148.360 [ms] (mean) time per request: 21.484 [ms] (mean, across concurrent requests) transfer rate: 45.23 [kbytes/sec] received connection times (ms) min mean[+/-sd] median max connect: 714 1242 278.1 1214 2012 processing: 248 842 493.6 699 2883 waiting: 247 809 492.3 677 2876 total: 1072 2085 643.5 1929 4845 percentage of requests served within time (ms) 50% 1929 66% 2039 75% 2109 80% 2168 90% 2622 95% 3821 98% 4473 99% 4646 100% 4845 (longest request)
memory impact
source=web.1 dyno=heroku.1234567899 sample#memory_total=198.41mb sample#memory_rss=197.60mb sample#memory_cache=0.30mb sample#memory_swap=0.51mb sample#memory_pgpgin=103879pages sample#memory_pgpgout=53216pages
puma results(more or less same regardless of worker/thread count)
server software: cowboy server hostname: puma-build.herokuapp.com server port: 443 ssl/tls protocol: tlsv1,dhe-rsa-aes128-sha,2048,128 document path: /api/v1/packages?access_token=fb7168c147adc2ccd83b2 document length: 489 bytes concurrency level: 100 time taken tests: 23.299 seconds complete requests: 1000 failed requests: 0 total transferred: 943000 bytes html transferred: 489000 bytes requests per second: 42.92 [#/sec] (mean) time per request: 2329.949 [ms] (mean) time per request: 23.299 [ms] (mean, across concurrent requests) transfer rate: 39.52 [kbytes/sec] received connection times (ms) min mean[+/-sd] median max connect: 743 1304 283.9 1287 2092 processing: 253 951 740.3 684 5353 waiting: 253 898 729.0 627 5196 total: 1198 2255 888.0 1995 7426 percentage of requests served within time (ms) 50% 1995 66% 2085 75% 2213 80% 2444 90% 3755 95% 4238 98% 5119 99% 5437 100% 7426 (longest request)
memory impact(4 workers, 5 threads)
source=web.1 dyno=heroku.1234567890 sample#memory_total=406.75mb sample#memory_rss=406.74mb sample#memory_cache=0.00mb sample#memory_swap=0.00mb sample#memory_pgpgin=151515pages sample#memory_pgpgout=47388pages
based on snippets above, puma deployment faster webbrick, while other times can slower (as shown in snippet). if faster, speed not significant, increase 1-5 requests/sec.
my question here is, doing wrong? database pool somehow @ fault?am benchmarking wrongly? using puma wrongly?
edit:
highest cpu load puma (5 worker , 5 threads each)
source=web.1 dyno=heroku.123456789 sample#load_avg_1m=2.98
most of time however, either 0.00 or smaller 0.1.
on top of that, code called in controller is:
@package = package.all
immediately after, followed rendering of json response declared in haml.
btw, package.all returns 5 records.
edit 2:
unicorn results
implemented unicorn according . running 3 unicorn workers.
server software: cowboy server hostname: unicorn-build.herokuapp.com server port: 443 ssl/tls protocol: tlsv1,dhe-rsa-aes128-sha,2048,128 document path: /api/v1/packages?access_token=f73f50514c6b8a3ea document length: 488 bytes concurrency level: 100 time taken tests: 22.311 seconds complete requests: 1000 failed requests: 0 total transferred: 942000 bytes html transferred: 488000 bytes requests per second: 44.82 [#/sec] (mean) time per request: 2231.135 [ms] (mean) time per request: 22.311 [ms] (mean, across concurrent requests) transfer rate: 41.23 [kbytes/sec] received connection times (ms) min mean[+/-sd] median max connect: 846 1326 294.5 1304 2720 processing: 245 627 342.8 540 3061 waiting: 244 532 313.6 470 3057 total: 1232 1954 463.0 1874 4875 percentage of requests served within time (ms) 50% 1874 66% 2016 75% 2161 80% 2250 90% 2466 95% 2799 98% 3137 99% 3901 100% 4875 (longest request)
one thing ive noticed running same ab load test code several times return different "requests per seconds". applies both unicorn , puma. both unicorn , puma, best "requests per seconds" 48-50 while worst 25-33.
either way, still not make sense. why are'nt either puma or unicorn crushing webbrick?
i trust have followed heroku's deploying rails applications puma web server guide thoroughly.
my guess test environment minimizes multi-threading advantages, or http server bottle-necked sql database.
your api calls, if cache database results, can cpu intensive. having 10 threads no advantage when cpu used 100% 1. managing threads can hinder performance in case.
multi-threading useful when worker threads waiting long time resources (databases, files, etc.), instead of using cpu.
the second possibility http server constrained database. may webrick moving fast database allowing it, leaving no room improvement switching better performing http server.
you should give this comprehensive benchmark report read.
you notice puma isn't 1 of speediest rails http servers. if care speed, try unicorn, or torquebox 4 if using jruby.
here's guide on how setup unicorn on heroku.
Comments
Post a Comment