Tag Archives: Commentary

Comparing MySQL to Vertica Replication under MemCloud, AWS and Bare Metal

Back in December, I did a detailed analysis for getting data into Vertica from MySQL using Tungsten Replicator, all within the Kodiak MemCloud.

I got some good numbers towards the end – 1.9 million rows/minute into Vertica. I did this using a standard replicator deployment, plus some tweaks to the Vertica environment. In particular:

  • Integer hash for a partition for both the staging and base tables
  • Some tweaks to the queries to ensure that we used the partitions in the most efficient manner
  • Optimized the batching within the applier to hit the right numbers for the transaction counts

That last one is a bit of a cheat because in a real-world situation it’s much harder to be able to identify those transaction sizes and row counts, but for testing, we’re trying to get the best performance!

Next what I wanted to do was set up some bare metal and AWS servers that were of an equivalent configuration and see what I could do to repeat and emulate the tests and see what comparable performance we could get.

How I Load Masses of Data

Before I dip into that, however, I thought it would be worth seeing how I generate the information in the first place. With big data testing (mainly when trying to simulate the data that ultimately gets written into your analytics target) the primary concern is one of reproducing the quantity as well as the variety of the data.

It’s application dependent, but for some analytics tasks the inserts are quite high and the updates/deletes relatively low. So I’ve written a test script that generates up to a million rows of data, split to be around 65% inserts, 25% updates and 10% deletes.

I can tweak that of course, but I’ve found it gives a good spread of data. I can also configure whether that happens in one transaction or each row is a transaction of its own. That all gets dumped into an SQL file. A separate wrapper script and tool then load that information into MySQL, either using redirection within the MySQL command line tool or through a different lightweight C++ client I wrote.

The data itself is light, two columns, an auto-incrementing integer ID and a random string. I’m checking for row inserts here, not data sizes.

So, to summarise:

  • Up to 1 million rows (although this is configurable)
  • Single or multiple transactions
  • Single schema/table or numerous schemas/tables
  • Concurrent, multi-threaded inserts

The fundamental result here is that I can predict the number of transactions and rows, which is really important when you are trying to measure rows-per-time period to use as benchmarks with replication because I can also start and stop replication on the transaction count boundaries to get precise performance.

For the main testing that I use for the performance results, what I do is run a multi-threaded, simultaneous insert into 20 schemas/tables and repeat it 40 times with a transaction/row count size of 10,000. That results in 8,000,000 rows of data, first being inserted/updated/deleted into MySQL, then extracted, replicated, and applied to (in this case) Vertica.

For the testing, I then use the start/stop of sequence number controls in the replicator and then monitor the time I start and stop from those numbers.

This gives me stats within about 2 seconds of the probably true result, but over a period of 15-20 minutes, that’s tolerable.

It also means I can do testing in two ways:

  • Start the load test into MySQL and test for completion into Vertica

or

  • Load the data into THL, and test just the target applier (from network transfer to target DB)

For the real-world performance I use the full end-to-end (MySQL insert and target apply) testing

Test Environments

I tested three separate environments, the original MemCloud hosted servers, some bare metal hosts and AWS EC2 hosts:

MemCloud Bare Metal AWS
Cores

4

12

16

Threads

4

12

16

RAM

64

192

122

Disk

SSD

SSD

SSD

Networking

10GB

10GB

25GB

It’s always difficult to perfectly match the environments across virtual and bare metal, particularly in AWS, but I did my best.

Results

I could go into all sorts of detailed results here, but I think it’s better to simply look at the final numbers because that is what really matters:

Rows Per Minute
Memcloud

1900000

Bare Metal

678222

AWS

492893

Now what’s interesting here is that MemCloud is significantly faster, even though there are fewer CPUs and even lower RAM requirements. It’s perhaps even more surprising to note that MemCloud is more than 4.5x times faster than AWS, even on I/O optimized hosts (probably the limiting factor in Vertica applies).

graph1

 

Even against fairly hefty bare metal hosts, MemCloud is almost 3x faster!

I’ve checked in with the engineers on the Bare Metal which seem striking, especially considering these are really beefy hosts, but it may simply be the SSD interface and I/O that becomes a limiting factor. Within Vertica when writing data with the replicator a few things are happening, we write THL to disk, CSV to disk, read CSV from disk into a staging table, then merge the base and staging tables which involves shuffling a lot of blocks in memory (and ultimately disk) around. It may simply be that the high-memory focused environment of MemCloud allows for very much faster performance all round.

I also looked at the performance as I started to increase the number of MySQL sources feeding into the systems, this is to separate schemas, rather than the single, unified schema/table within Vertica.

Sources

1

1

2

3

4

5

Target Schemas

20

40

40

60

80

100

Rows Written

8000000

8000000

16000000

24000000

32000000

40000000

Memcloud

1900057

1972000

3617042

5531460

7353982

9056410

Bare Metal

678222

635753

1051790

1874454

2309055

3168275

AWS

492893

402047

615856

What is significant here is that with MemCloud I noticed a much more linear ramp up in performance that I didn’t see to the same degree within the Bare metal or AWS. In fact, with AWS I couldn’t even remotely achieve the same levels and by the time I got to three simultaneous sources I got such wildly random results between executions that I gave up trying to test. From experience, I suspect this is due to the networking an IOPS environment, even on a storage optimized host.

The graph version shows the differences more clearly:

graph2

 

Bottom line, MemCloud seems really quick, and the statement I made in the original testing still seems to be valid:

The whole thing went so quick I thought it hadn’t executed at all!

MariaDB to Hadoop in Spanish

Nicolas Tobias has written an awesome guide to setting up replication from MariaDB to Hadoop/HDFS using Tungsten Replicator, in Spanish! He’s planning more of these so if you like what you see, please let him know!

Semana santa y yo con nuevas batallas que contar.
Me hayaba yo en el trabajo, pensando en que iba a invertir la calma que acompa;a a los dias de vacaciones que libremente podemos elegir trabajar y pense: No seria bueno terminar esa sincronizacion entre los servidores de mariaDB y HIVE?

Ya habia buscado algo de info al respecto en Enero hasta tenia una PoC montada con unas VM que volvi a encender, pero estaba todo podrido: no arrancaba, no funcionba ni siquiera me acordaba como lo habia hecho y el history de la shell er un galimatias. Decidi que si lo rehacia todo desde cero iba a poder dejarlo escrito en un playbook y ademas, aprenderlo y automatizarlo hasta el limite de poder desplegar de forma automatica on Ansible.

via De MariaDB a HDFS: Usando Continuent Tungsten. Parte 1 | run.levelcin.co

2015 is that way, 2016 is this way

The last year has been something of a change in direction in my life. Not only was it a year of a large number of ‘firsts’ for me, in all sorts of ways, I also changed a lot of what I was doing to better suit me. Actually that’s really important.

2015 turned out to be a really significant year for me, not because of any huge life changes, but because so many different and interesting things happened to me

What did I change?

‘Official’ Studying – I have for many years been doing a degree in Psychology with the Open University. I was actually on my last year – well, 20 months as it was part time. I had my final two modules to go, and although I was hugely enjoying the course, it was a major sap on my personal time; what little I have of it after work and other obligations (see below). I also reached a crunch point; due to the way the course worked, changes in the rules, and the duration of the work (I started studying back in 2007), I had to finish the course by June 2016, and that meant there were no opportunities for retakes or doing the entire course all over again. I either had to get it right, first time, for each remaining course, or I would have to start again. That kept the pressure on me to get good marks massive when I have a very busy day job, and it got harder to dedicate the required time. In the end I decided that having the piece of paper was less important than having the personal interest in the topic. And that was the other of problem. I’d already stopped reading, I stopped playing games, I stopped going out, all to complete a course. I realised that my interest in Psychology wont disappear just because I stop studying. I can still read the books, magazines, articles that interest me without feeling pressured to do so.

Book/Article Writing – Given the above, the lack of activity on here, it wont surprise you that writing books and articles was something else I stopped. I deliberately changed my focus to the Psychology degree. But I also stopped doing anything outside work in any of the areas I’m interested in, despite some offers. I was working on a book, actually two books, but ultimately dropped them due to other pressures. Hopefully I’ll be converting some of that material into posts here over the course of the year.

Working Hours – I have very strange sleep patterns; I sleep very little, and have done since the day I was born. As such that means I normally get up very early (2am is not unusual) having gone to bed at 10 or 11pm the previous night. However, last I spent even more time up late on the phone with meetings and phone calls to people in California. That would make for a long day, so I switched my day entirely so that I now start working later and finish later, doing most of my personal stuff in the early morning. It’s nice and quiet then to.

2015 Firsts

  • First time staying in a B&B – I know, this seems like an odd, but I have honestly never stayed in a B&B before. But I did, three times, while on a wonderful touring holiday of the North of Scotland, taking in Inverness, Skye, Loch Ness and many other places.
  • First touring holiday (road trip) – See above. For the first time ever, I didn’t go to one place, stay there, and travel around the area. We drove miles. In fact, I did about 2,800 over the course of a week.
  • First time to the very north of Scotland – Part of the same road trip. I’ve done Dunbar, North Berwick, the borders, Edinburgh.
  • First music concert (in ages) – I went to two, in fact. One in Malaga and one in San Francisco about two weeks later. Enjoyed both. Want to do more.
  • First time driving in the US – I’ve been regularly going to the US since 2003, when I first started working Microsoft, and even for companies in Silicon Valley, I’ve always taken rides from friends, or taxis. In April, I hired a car and drove around. A lot. I did about 600 miles over the course of two weeks.
  • First Spanish train journey – I flew to Madrid on business, and then took the train from there down to see a friend in Malaga. The AVE train is lovely, and a beautiful way to travel, especially at 302km/h.
  • First Cruise – I’ve wanted to go on a cruise to see the Fjords of Norway since I was a teenager. I love the cold, I love the idea of being relatively isolated on a boat with lots of time to myself. In the end, I spent way more time interacting with other people than I expected, and did so little on my own, but I wouldn’t have changed it for the world. I went from Bergen to Kirkenes in the Arctic circle and back on the Hurtigruten and it was one of the most amazing trips of my life.
  • First time travelling on my own not for business – I travel so much for work (I did 16 journeys in 2015, most to California) it made a nice, if weird, change to do s full trip on my own. I enjoyed it immensely and recommend it to everybody.

What’s planned for 2016?

I’m starting to publish my fictional work on Patreon with the express intention of getting book content that I’ve been working on for many many years out there in front of other people. I’ve got detailed notes and outlines on about nine different fictional titles, crossing a range of different genres. I’ve started with two of my larger ‘worlds’ – NAPE and Kings Courier and will be following up with regular chapters and content over the coming months.

I’ve also created a new blog to capture all of my travel. Not the work stuff, but things like the Scotland tour and the Norwegian Cruise, plus whatever else comes up this year and beyond. Current thoughts are Antartica, Alaska or Iceland, work and personal commitments permitting. Plus I’m in Spain in August with my family and friends.

Converting my unfinished technical books to blog posts. I’ve worked on a number of books, some of which contain fresh, brand new material I’d like to share with other people, including the book content I was working on last year. I’m still trying to reformat it for the blog so that it looks good, but I will get there.


2015 is that way, 2016 is this way

The last year has been something of a change in direction in my life. Not only was it a year of a large number of ‘firsts’ for me, in all sorts of ways, I also changed a lot of what I was doing to better suit me. Actually that’s really important.

2015 turned out to be a really significant year for me, not because of any huge life changes, but because so many different and interesting things happened to me

What did I change?

‘Official’ Studying – I have for many years been doing a degree in Psychology with the Open University. I was actually on my last year – well, 20 months as it was part time. I had my final two modules to go, and although I was hugely enjoying the course, it was a major sap on my personal time; what little I have of it after work and other obligations (see below). I also reached a crunch point; due to the way the course worked, changes in the rules, and the duration of the work (I started studying back in 2007), I had to finish the course by June 2016, and that meant there were no opportunities for retakes or doing the entire course all over again. I either had to get it right, first time, for each remaining course, or I would have to start again. That kept the pressure on me to get good marks massive when I have a very busy day job, and it got harder to dedicate the required time. In the end I decided that having the piece of paper was less important than having the personal interest in the topic. And that was the other of problem. I’d already stopped reading, I stopped playing games, I stopped going out, all to complete a course. I realised that my interest in Psychology wont disappear just because I stop studying. I can still read the books, magazines, articles that interest me without feeling pressured to do so.

Book/Article Writing – Given the above, the lack of activity on here, it wont surprise you that writing books and articles was something else I stopped. I deliberately changed my focus to the Psychology degree. But I also stopped doing anything outside work in any of the areas I’m interested in, despite some offers. I was working on a book, actually two books, but ultimately dropped them due to other pressures. Hopefully I’ll be converting some of that material into posts here over the course of the year.

Working Hours – I have very strange sleep patterns; I sleep very little, and have done since the day I was born. As such that means I normally get up very early (2am is not unusual) having gone to bed at 10 or 11pm the previous night. However, last I spent even more time up late on the phone with meetings and phone calls to people in California. That would make for a long day, so I switched my day entirely so that I now start working later and finish later, doing most of my personal stuff in the early morning. It’s nice and quiet then to.

2015 Firsts

  • First time staying in a B&B – I know, this seems like an odd, but I have honestly never stayed in a B&B before. But I did, three times, while on a wonderful touring holiday of the North of Scotland, taking in Inverness, Skye, Loch Ness and many other places.
  • First touring holiday (road trip) – See above. For the first time ever, I didn’t go to one place, stay there, and travel around the area. We drove miles. In fact, I did about 2,800 over the course of a week.
  • First time to the very north of Scotland – Part of the same road trip. I’ve done Dunbar, North Berwick, the borders, Edinburgh.
  • First music concert (in ages) – I went to two, in fact. One in Malaga and one in San Francisco about two weeks later. Enjoyed both. Want to do more.
  • First time driving in the US – I’ve been regularly going to the US since 2003, when I first started working Microsoft, and even for companies in Silicon Valley, I’ve always taken rides from friends, or taxis. In April, I hired a car and drove around. A lot. I did about 600 miles over the course of two weeks.
  • First Spanish train journey – I flew to Madrid on business, and then took the train from there down to see a friend in Malaga. The AVE train is lovely, and a beautiful way to travel, especially at 302km/h.
  • First Cruise – I’ve wanted to go on a cruise to see the Fjords of Norway since I was a teenager. I love the cold, I love the idea of being relatively isolated on a boat with lots of time to myself. In the end, I spent way more time interacting with other people than I expected, and did so little on my own, but I wouldn’t have changed it for the world. I went from Bergen to Kirkenes in the Arctic circle and back on the Hurtigruten and it was one of the most amazing trips of my life.
  • First time travelling on my own not for business – I travel so much for work (I did 16 journeys in 2015, most to California) it made a nice, if weird, change to do s full trip on my own. I enjoyed it immensely and recommend it to everybody.

What’s planned for 2016?

I’m starting to publish my fictional work on Patreon with the express intention of getting book content that I’ve been working on for many many years out there in front of other people. I’ve got detailed notes and outlines on about nine different fictional titles, crossing a range of different genres. I’ve started with two of my larger ‘worlds’ – NAPE and Kings Courier and will be following up with regular chapters and content over the coming months.

I’ve also created a new blog to capture all of my travel. Not the work stuff, but things like the Scotland tour and the Norwegian Cruise, plus whatever else comes up this year and beyond. Current thoughts are Antartica, Alaska or Iceland, work and personal commitments permitting. Plus I’m in Spain in August with my family and friends.

Converting my unfinished technical books to blog posts. I’ve worked on a number of books, some of which contain fresh, brand new material I’d like to share with other people, including the book content I was working on last year. I’m still trying to reformat it for the blog so that it looks good, but I will get there.


Office 365 Activation Wont Accept Password

So today I signed up for Office 365, since it seemed to be the easiest way to get hold of Office; although I have a license and subscription, I also have more machines.

To say I was frustrated when I tried to activate Office 365 was an understatement. Each time I went through the process, it would reject the password saying there was a problem with my account.

I could login with my email and password online, but through the activation, no dice. Some internet searches, including with the ludicrously bad Windows support search didn’t elicit anything useful.

Then it hit me. Office 2011 for Mac through an Office 365 subscription probably doesn’t know about secondary authentication.

Sure enough, I created and application specific password, logged in with that, and yay, I now have a running Office 365 subscription.

If you are experiencing the same problem, using a application specific password might just help you out.

Replicating Oracle Webinar Question Follow-up

We had really great webinar on Replicating to/from Oracle earliest this month, and you can view the recording of that Webinar here.

A good sign of how great a Webinar was is the questions that come afterwards, and we didn’t get through them all. so here are all the questions and answers for the entire webinar.

Q: What is the overhead of Replicator on source database with asynchronous CDC?

A: With asynchronous operation there is no substantial CPU overhead (as with synchronous), but the amount of generated redo logs becomes bigger requiring more disk space and better log management to ensure that the space is used effectively.

Q: Do you support migration from Solaris/Oracle to Linux/Oracle?

A: The replication is not certified for use on Solaris, however, it is possible to configure a replicator to operate remotely and extract from a remote Oracle instance. This is achieved by installing Tungsten Replicator on Linux and then extracting from the remote Oracle instance.

Q: Are there issues in supporting tables without Primary Keys on Oracle to Oracle replication?

A: Non-primary key tables will work, but it is not recommended for production as it implies significant overhead when applying to a target database.

Q: On Oracle->Oracle replication, if there are triggers on source tables, how is this handled?

A: Tungsten Replicator does not automatically disable triggers. The best solution is to remove triggers on slaves, or rewrite triggers to identify whether a trigger is being executed on the master or slave and skip it accordingly, although this requires rewriting the triggers in question.

Q: How is your offering different/better than Oracle Streams replication?

A: We like to think of ourselves as GoldenGate without the price tag. The main difference is the way we extract the information from Oracle, otherwise, the products offer similar functionality. For Tungsten Replicator in particular, one advantage is the open and flexible nature, since Tungsten Replicator is open source, released under a GPL V2 license, and available at https://code.google.com/p/tungsten-replicator/.

Q: How is the integrity of the replica maintained/verified?

A: Replicator has built-in real-time consistency checks: if an UPDATE or DELETE doesn’t update any rows, Replicator will go OFFLINE:ERROR, as this indicates an inconsistent dataset.

Q: Can configuration file based passwords be specified using some form of encrypted value for security purposes to keep them out of the clear?

A: We support an INI file format so that you do not have to use the command-line installation process. There is currently no supported option for an encrypted version of these values, but the INI file can be secured so it is only readable by the Tungsten user.

Q: Our source DB is Oracle RAC with ~10 instances. Is coherency maintained in the replication from activity in the various instances?

A: We do not monitor the information that has been replicated; but CDC replicates row-based data, not statements, so typical sequence insertion issues that might occur with statement based replication should not apply.

Q: Is there any maintenance of Oracle sequence values between Oracle and replicas?

A: Sequence values are recorded into the row data as extracted by Tungsten Replicator. Because the inserted values, not the sequence itself, is replicated, there is no need to maintain sequences between hosts.

Q: How timely is the replication? Particularly for hot source tables receiving millions of rows per day?

A: CDC is based on extracting the data at an interval, but the interval can be configured. In practice, assuming there are regular inserts and updates on the Oracle side, the data is replicated in real-time. See https://docs.continuent.com/tungsten-replicator-3.0/deployment-oracle-cdctuning.html for more information on how this figure can be tuned.

Q: Can parallel extractor instances be spread across servers rather than through threads on the same server (which would be constrained by network or HBA)?

A: Yes. We can install multiple replicators and tune the extraction of the parallel extractor accordingly. However, that selection would need to be manual, but certainly that is possible.

Q: Do you need the CSV file (to select individual tables with the setupCDC.sh configuration) on the master setup if you want all tables?

A: No.

Q: If you lose your slave down the road, do you need to re-provision from the initial SCN number or is there a way to start from a later point?

A: This is the reason for the THL Sequence Number introduced in the extractor. If you lose your slave, you can install a new slave and have it start at the transaction number where the failed slave stopped if you know it, since the information will be in the THL. If not, you can usually determine this by examining the THL directly. There should be no need to re-provision – just to restart from the transaction in the THL on the master.

Q: Regarding a failed slave, what if it failed such that we don’t have a backup or wanted to provision a second slave such that it had no initial data.

A: If you had no backups or data, yes, you would need to re-provision with the parallel extractor in order to seed the target database.

Q: Would you do that with the original SCN? If it had been a month or two, is there a way to start at a more recent SCN (e.g. you have to re-run the setupCDC process)?

A: The best case is to have two MySQL slaves and when one fails, you re-provision it from the healthy one. This avoids setupCDC stage.

However, the replication can always be started from a specific event (SCN) provided that SCN is available in the Oracle undo log space.

Q: How does Tungsten handle Oracle’s CLOB and BLOB data types

A: Providing you are using asynchronous CDC these types are supported; for synchronous CDC these types are not supported by Oracle.

Q: Can different schemas in Oracle be replicated at different times?

A: Each schema is extracted by a separate service in Replicator, so they are independent.

Q: What is the size limit for BLOB or CLOB column data types?

A: This depends on the CDC capabilities in Oracle, and is not limited within Tungsten Replicator. You may want to refer to the Oracle Docs for more information on CDC: http://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm

Q: With different versions of Oracle e.g. enterprise edition and standard edition one be considered heterogeneous environments?

A: Essentially yes, although the nomenclature is really only a categorization, it does not affect the operation, deployment or functionality of the replicator. All these features are part of the open source product.

Q: Can a 10g database (master) send the data to a 11g database (slave) for use in an upgrade?

A: Yes.

Q: Does the Oracle replicator require the Oracle database to be in archive mode?

A: Yes. This is a requirement for Oracle’s CDC implementation.

Q: How will be able to revisit this recorded webinar?

A: Slides and a recording from today’s webinar will be available at http://www.slideshare.net/Continuent_Tungsten

 

A New Home for Tungsten in the UK

I was suitably heartened to hear about the new mine opening up in the Devon here in the UK to mine the element Tungsten.

I comment on this to my associates at Continuent, where comments were made by Csaba as to the appropriate quotes in the article:

“Tungsten is an extraordinary metal.”

“It’s almost as hard as a diamond and has one of the highest melting points of any mineral.”

“Adding a small amount to steel makes it far harder, far more resistant to stress and heat. The benefits to industry are obvious.”

Leading to him to suggest Adding a small amount of Tungsten to MySQL makes it far harder, far more resistant to stress and failures. The benefits to industry are obvious.

I couldn’t possibly agree more!

 

Revisiting ZFS and MySQL

While at Percona Live this year I was reminded about ZFS and running MySQL on top of a ZFS-based storage platform.

Now I’m a big fan of ZFS (although sadly I don’t get to use it as much as I used to after I shutdown my home server farm), and I did a lot of different testing back while at MySQL to ensure that MySQL, InnoDB and ZFS worked correctly together.

Of course today we have a completely new range of ZFS compatible environments, not least of which are FreeBSD and ZFS on Linux, I think it’s time to revisit some of my original advice on using this combination.

Unfortunately the presentations and MySQL University sessions back then have all been taken down. But that doesn’t mean the advice is any less valid.

Some of the core advice for using InnoDB on ZFS:

  • Configure a single InnoDB tablespace, rather than configuring multiple tablespaces across different disks, and then let ZFS manage the underlying disk using stripes or mirrors or whatever configuration you want. This avoids you having to restart or reconfigure your tablespaces as your data grows, and moves that out to ZFS which can do it much more easily and while the filesystem and database remain online. That means we can do:
innodb_data_file_path = /zpool/data/ibdatafile:10G:autoextend
  • While we’re taking about the InnoDB data files, the best optimisation you can do is to set the ZFS block size to match the InnoDB block size. You should do this *before* you start writing data. That means creating the filesystem and then setting the block size:
zfs set recordsize=8K zpool/data
  • What you can also do is configure a separate filesystem for the InnoDB logs that has a ZPool record size of 128K. That’s less relevant in later versions of ZFS, but actually it does no harm.
  • Switch on I/O compression. Within ZFS this improves I/O time (because less data is read/written physically from/to disk), and therefore improves overall I/O times. The compression is good enough and passive to be able to handle the load while still reducing the overall time.
  • Disable the double-write buffer. The transactional nature of ZFS helps to ensure the validity of data written down to disk, so we don’t need two copies of the data to be written to ensure valid recovery in the case of failure that are normally caused by partial writes of the record data. The performance gain is small, but worth it.
  • Using direct IO (O_DIRECT in your my.cnf) also improves performance for similar reasons. We can be sure with direct writes in ZFS that the information is written down to the right place. EDIT: Thanks to Yves, this is not currently supported on Linux/ZFS right now.
  • Limit the Adjustable Replacement Cache (ARC); without doing this you can end up with ZFS using a lot of cache memory that will be better used at the database level for caching record information. We don’t need the block data cache as well.
  • Configure a separate ZFS Intent Log (ZIL), really a Separate Intent Log (SLOG) – if you are not using SSD throughout, this is a great place to use SSD to speed up your overall disk I/O performance. Using SLOG stores immediate writes out to SSD, enabling ZFS to manage the more effective block writes of information out to slower spinning disks. The real difference is that this lowers disk writes, lowers latency, and lowers overall spinning disk activity, meaning they will probably last longer, not to mention making your system quieter in the process. For the sake of $200 of SSD, you could double your performance and get an extra year or so out the disks.

Surprisingly not much has changed in these key rules, perhaps the biggest different is the change in price of SSD between when I wrote these original rules and today. SSD is cheap(er) today so that many people can afford SSD as their main disk, rather than their storage format, especially if you are building serious machines.


Percona Live 2013, MySQL, Continuent and an ever-healthy Ecosystem

I’m sitting here in the lounge at SFO thinking back on the last week, the majority of which has been spent meeting my new workmates and attending the Percona MySQL conference.

For me it has been as much of a family reunion as it has been about seeing the wonderful things going on in MySQL.

Having joined Continuent last month after an ‘absence’ in NoSQL land of almost 2.5 years, joining the MySQL community again just felt like coming home after a long absence. And that’s no bad thing. On a very personal level it was great to see so many of my old friends, many of whom were not only pleased to see me, but pleased to see me working back in the MySQL fold. Evidently many people think this is where I belong.

What was great to see is that the MySQL community is alive and well. Percona may be the drivers behind the annual MySQL conference that we have come to know, but behind the name on the passes and over the doors, nothing has changed in terms of the passion behind the core of the project.

Additionally, it’s great to see that despite all of the potential issues and tragedies that were predicted when Oracle took over the reins of MySQL, as Baron says, they are in fact driving and pushing the project forward. The features in 5.6 are impressive and useful, rather than just a blanket cycling of the numbers. I haven’t had the time to look at 5.7, but I doubt it is just an annual increment either. When I left Oracle, people were predicting MySQL would be dead in two years as an active project at Oracle, but in fact what seems to have happened is that the community has rallied round it and Oracle have seen the value and expertly steered it forward.

It’s also interesting to me – as someone who moved outside the MySQL fold – to note that other databases haven’t really supplanted the core of the MySQL foothold. Robert Hodge’s Keynote discussed that in more depth, and I see no reason to disagree with him.

I’m pleased to see that my good friend Giuseppe had his MySQL Sandbox when application of the year 2013 – not soon enough in my eyes, given that as a solution for running MySQL it has been out there for more years than I care to remember.

I’m also delighted of course that Continuent won Corporate contributor of the year. One of the reasons I joined the company is because I liked what they were doing. Replication in MySQL is unnecessarily hard, particularly when you get more than one master, or want to do clever things with topologies beyond the standard master/slave. I used Federated tables to do it years ago, but Tungsten makes the whole process easier.  What Continuent does is provide an alternative to MySQL native replication which is not only more flexible, but also fundamentally very simple. In my experience, simple ideas are always very powerful, because their simplicity makes them easy to adapt, adopt and add to.

Of course, Continuent aren’t the only company producing alternatives for clustering solutions with MySQL, but to me that shows there is a healthy ecosystem willing to build solutions around the powerful product at the centre. It’s one thing to choose an entirely different database product, but another to use the same product with one or more tools that produces an overall better solution to the problem. *AMP solutions haven’t gone away, we’ve just extended and expanded on top of them. That must mean that MySQL is just as powerful and healthy a core product as it ever was.

That’s my key takeaway from this conference – MySQL is alive and well, and the ecosystem it produced is as organic and thriving as it ever has been, and I’m happy to be back in the middle of that.


Pert & Simple

I’m a huge fan of Kickstarter and back last year I was working on trying to expand my range a little from the rather narrow range of tech-related projects I’d been funding. One particular art project caught my eye. Firstly, it was local – well, within 50 miles – which is a surprise when so many are either US or frequently London based. The other was the name, and you cannot help be caught by the expression Pert & Simple. See http://www.kickstarter.com/projects/hollyjames/exhibition-pert-and-simple?ref=live.

Unfortunately for Holly and Anna, the Kickstarter process didn’t come up with the goods, but I was happy enough to support a local project and contributed some money to them directly so that they could put on the exhibition.

Holly James’ and Anna Salmane’s work fits into the category of what I refer to as ‘thought provoking art’. That is not to say that all art is not thought provoking, but hat some art is specifically developed because of the emotions and thoughts it is designed to instil.

To best illustrate that point I’ll start by talking about Holly’s work, because, from a psychological perspective (which is one of my personal interests) this had the most impact on me. Holly has been working with patterns and in particular weaving. Originally starting with white thread, Holly moved on to using fluorescent so that the weaving would stand out against your typical white wall. Despite all of the work that went into the weaving, the chosen presentation mechanism was simply hanging it on nails on the wall. This results in a dramatic, unconventional, if not elegant, demonstration of the work.

For Holly though the most critical part occurred when people started to view the work. People would come in and touch the fabric. Of course, this is a perfectly normal experience. How many of us walk into any clothing shop and starting touching up the fabric? We don’t look at these items from a far and judge how well they will fit. It’s a textile, we are going to touch it. But this is art. You wouldn’t go up and touch a painting, would you? Why touch a textile that is a piece of art? Because that is human behaviour.

So what did Holly do next? At the exhibition, she was working on a “Making a Fluorescent Doormat”. Why a doormat? because if people are going to touch your art, why not let people just walk all over it? Of course, this leads to an interesting paradox – the chances are that humans will refuse to walk on something that (a) they are told to walk on, and (b) are explicitly allowed to walk on and (c) would normally be walked on.

That habit of human behaviour, to do the opposite of what you should do, or even what you have been told to do, is such a core part of the human condition it is hard to get away from. To force people into a situation where they are allowed to do something that would be perfectly natural if it wasn’t for the fact that you had been told you are allowed to do it serves up an interesting position on the human condition. I would love to be able to watch people and count how many actually walk on the doormat.

There is also a separate aspect, that of the artist being disrespected for their work – people walking all over Holly’s doormat directly reflects how she felt when people touched her “Fluorescent Weaving” piece. Does her giving them permission lessen that effect, or give people the right to walk over her?

Holly’s work is beautiful, but also has a lot of meaning, both to her, and to those who understand how these items were produced.

And that leads me on to Anna’s artwork. Anna’s work is also very personal. Sadly, there are no cat-related artworks from Anna yet (although she has promised them!), but they are no less intriguing. “Sorry” for example represents a count of how many times Anna said ‘Sorry’ over the course of the year, and the counter used to tally up the results. Sorry has itself become such a throw-away statement. We say it when we bump into someone, when we we don’t respond quick enough, just for not doing something, and yet really sorry has a very specific, empathic, meaning of sympathy for somebody else’s feelings. Yet we will say it to strangers as much as we do to friends and family.

At the launch of the exhibition, one person’s first question was ‘what happened in March for you to say sorry so much?’ – an apt demonstration of how much importance we attach to the word, despite how frequently it is used.

Anna’s “Starcounter” is a personal view of the inability to count the stars in the sky – as explained to her by her father. Anna chose to represent this with a simulation of a night sky and a recording of Anna counting to 6000. It takes Anna hours to read through the numbers, and it took her even longer to actually record the material. Does this represent the futility of counting so many objects, or the endless-ness of the stars in the sky? Or both?

The other piece that resonated with me, but forgive me I have forgotten the name, is that of a slideshow of headstones at a war memorial. Each headstone is essentially identical – same shape, name, lifespan, and yet each one represents a unique person, with families, loved ones, and a whole life of experiences. Yet they are so simply, and consistently, represented, so as to almost make they lives meaningless.

These are personal, and yet hugely descriptive examples that to me speak of the dichotomoy between meaningless and meaningful, represented by the same objects.

From this description you would imagine that the entire exhibition is incredibly some and serious. But it’s not. Behind the works by both Holly and Anna is actually a layer of humility, if not outright satire as all of the pieces often have a simpler, and, as the title suggests, impertinent edge. Holly’s desire for people to walk on the doormat is a tongue-in-cheek poke at the way people treat art. Go ahead, she’s daring you to walk on that doormat with a broad smile on her face. Go on. Anna, meanwhile, is remembering her fun summer evenings of counting the stars as a child, and the inanity of saying sorry.

It’s only complicated art, with additional levels, meanings, and complexities, if you add them, and that, just as they are hugely personal to Anna and Holly makes them exclusively personal to the viewer. You can get caught up in the meanings if it suits you. Or you can have wry smile to yourself as the artist introduces the works.

Meeting Holly and Anna and talking to them clearly shows their passion for their work, but also the humour and relaxed attitude to the meaning of the pieces. I spent a few hours with them both at their launch event in Nottingham. Anna is working on her BA at Goldsmiths, and Holly has completed hers and, hopefully, working for her MA with the Royal College. I certainly wish them all the success that deserve from work they displayed at the exhibition. It is no word of a lie to say that I feel honoured to have attended and more importantly understood the work of two exceptionally talented artists. Both from their perspectives, and my own.

You can see pictures of the exhibition at http://www.hollyjames.org/pert-and-simple

More work by Holly is available here: http://www.hollyjames.org

Anna’s work can be viewed on http://www.salmane.co.uk