Back in December, I did a detailed analysis for getting data into Vertica from MySQL using Tungsten Replicator, all within the Kodiak MemCloud.
I got some good numbers towards the end – 1.9 million rows/minute into Vertica. I did this using a standard replicator deployment, plus some tweaks to the Vertica environment. In particular:
- Integer hash for a partition for both the staging and base tables
- Some tweaks to the queries to ensure that we used the partitions in the most efficient manner
- Optimized the batching within the applier to hit the right numbers for the transaction counts
That last one is a bit of a cheat because in a real-world situation it’s much harder to be able to identify those transaction sizes and row counts, but for testing, we’re trying to get the best performance!
Next what I wanted to do was set up some bare metal and AWS servers that were of an equivalent configuration and see what I could do to repeat and emulate the tests and see what comparable performance we could get.
How I Load Masses of Data
Before I dip into that, however, I thought it would be worth seeing how I generate the information in the first place. With big data testing (mainly when trying to simulate the data that ultimately gets written into your analytics target) the primary concern is one of reproducing the quantity as well as the variety of the data.
It’s application dependent, but for some analytics tasks the inserts are quite high and the updates/deletes relatively low. So I’ve written a test script that generates up to a million rows of data, split to be around 65% inserts, 25% updates and 10% deletes.
I can tweak that of course, but I’ve found it gives a good spread of data. I can also configure whether that happens in one transaction or each row is a transaction of its own. That all gets dumped into an SQL file. A separate wrapper script and tool then load that information into MySQL, either using redirection within the MySQL command line tool or through a different lightweight C++ client I wrote.
The data itself is light, two columns, an auto-incrementing integer ID and a random string. I’m checking for row inserts here, not data sizes.
So, to summarise:
- Up to 1 million rows (although this is configurable)
- Single or multiple transactions
- Single schema/table or numerous schemas/tables
- Concurrent, multi-threaded inserts
The fundamental result here is that I can predict the number of transactions and rows, which is really important when you are trying to measure rows-per-time period to use as benchmarks with replication because I can also start and stop replication on the transaction count boundaries to get precise performance.
For the main testing that I use for the performance results, what I do is run a multi-threaded, simultaneous insert into 20 schemas/tables and repeat it 40 times with a transaction/row count size of 10,000. That results in 8,000,000 rows of data, first being inserted/updated/deleted into MySQL, then extracted, replicated, and applied to (in this case) Vertica.
For the testing, I then use the start/stop of sequence number controls in the replicator and then monitor the time I start and stop from those numbers.
This gives me stats within about 2 seconds of the probably true result, but over a period of 15-20 minutes, that’s tolerable.
It also means I can do testing in two ways:
- Start the load test into MySQL and test for completion into Vertica
or
- Load the data into THL, and test just the target applier (from network transfer to target DB)
For the real-world performance I use the full end-to-end (MySQL insert and target apply) testing
Test Environments
I tested three separate environments, the original MemCloud hosted servers, some bare metal hosts and AWS EC2 hosts:
MemCloud | Bare Metal | AWS | |
Cores |
4 |
12 |
16 |
Threads |
4 |
12 |
16 |
RAM |
64 |
192 |
122 |
Disk |
SSD |
SSD |
SSD |
Networking |
10GB |
10GB |
25GB |
It’s always difficult to perfectly match the environments across virtual and bare metal, particularly in AWS, but I did my best.
Results
I could go into all sorts of detailed results here, but I think it’s better to simply look at the final numbers because that is what really matters:
Rows Per Minute | |
Memcloud |
1900000 |
Bare Metal |
678222 |
AWS |
492893 |
Now what’s interesting here is that MemCloud is significantly faster, even though there are fewer CPUs and even lower RAM requirements. It’s perhaps even more surprising to note that MemCloud is more than 4.5x times faster than AWS, even on I/O optimized hosts (probably the limiting factor in Vertica applies).
Even against fairly hefty bare metal hosts, MemCloud is almost 3x faster!
I’ve checked in with the engineers on the Bare Metal which seem striking, especially considering these are really beefy hosts, but it may simply be the SSD interface and I/O that becomes a limiting factor. Within Vertica when writing data with the replicator a few things are happening, we write THL to disk, CSV to disk, read CSV from disk into a staging table, then merge the base and staging tables which involves shuffling a lot of blocks in memory (and ultimately disk) around. It may simply be that the high-memory focused environment of MemCloud allows for very much faster performance all round.
I also looked at the performance as I started to increase the number of MySQL sources feeding into the systems, this is to separate schemas, rather than the single, unified schema/table within Vertica.
Sources |
1 |
1 |
2 |
3 |
4 |
5 |
Target Schemas |
20 |
40 |
40 |
60 |
80 |
100 |
Rows Written |
8000000 |
8000000 |
16000000 |
24000000 |
32000000 |
40000000 |
Memcloud |
1900057 |
1972000 |
3617042 |
5531460 |
7353982 |
9056410 |
Bare Metal |
678222 |
635753 |
1051790 |
1874454 |
2309055 |
3168275 |
AWS |
492893 |
402047 |
615856 |
– | – | – |
What is significant here is that with MemCloud I noticed a much more linear ramp up in performance that I didn’t see to the same degree within the Bare metal or AWS. In fact, with AWS I couldn’t even remotely achieve the same levels and by the time I got to three simultaneous sources I got such wildly random results between executions that I gave up trying to test. From experience, I suspect this is due to the networking an IOPS environment, even on a storage optimized host.
The graph version shows the differences more clearly:
Bottom line, MemCloud seems really quick, and the statement I made in the original testing still seems to be valid:
The whole thing went so quick I thought it hadn’t executed at all!