Category Archives: Coalface

Getting Data into Hadoop in real-time

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.

Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:

http://www.ibm.com/developerworks/library/bd-sqltohadoop1/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop3/

The problem with the current techniques, Sqoop included, is that they rely on a relatively manual, even basic, transfer process. Dump your data out, reload it back again into Hadoop.

Even with Sqoop, although it automates much of the process, it is not entirely reliable, especially if you want to do more than simple dump and load. Serial loading, or incrementally transferring data from MySQL or Oracle, is fraught with problems, not least of which is that it requires adding a timestamp to your data structure to get the best results out of it.

Perhaps worse though is that Sqoop is an intermittent, even periodic transfer system. Incremental loading works by copying all the changed records since a specific point in time. Running it too frequently is counter productive, which means you end up using a 15-minute or every-couple-of-hour period, depending on your database activity.

Most databases have some kind of stream of changes that enables you to see everything that has happened on the database. With MySQL, that’s the binary log. And with the Open Source Tungsten Replicator tool we take advantage of that so that we can replicate into MySQL, and indeed into Oracle, MongoDB and Vertica, among others.

51c6a7b5abcca6c30f7d79ea8eba17f0

Reading the data out from MySQL is lightweight since the master just reads the contents of the binary log; especially compared to Sqoop, which uses read locks and SELECT * with and without LIMIT clauses.

Right now we’re working on an applier that writes that data into Hadoop in real time from MySQL. Unlike Sqoop, we provide a continuous stream of changes from MySQL into the immutable store of Hadoop.

But the loading and nature of Hadoop presents some interesting issues, not least of which (if you’ve been following my other articles) is the fact that data written into Hadoop is immutable. For data that is constantly changing, an immutable store is not the obvious destination.

We get round that by using the batch loading system to create CSV files that contain the data, changes and sequence numbers, and then loading that information into Hadoop. In fact, Robert has updated the batch loader to use a new JavaScript based system (of which more in a future blog post) that simplifies the entire process, without requiring a direct connection or interface to Hadoop (although we can write directly into HDFS).

For example, the MySQL row:


| 3 | #1 Single | 2006 | Cats and Dogs (#1.4) |

Is represented within the staging files generated as:


I^A1318^A3^A3^A#1 Single^A2006^ACats and Dogs (#1.4)

That’s directly accessible by Hive. In fact, using our ddlscan tool, we can even create the Hive table definitions for you:


ddlscan -user tungsten -url 'jdbc:mysql://host1:13306/test' -pass password \
-template ddl-mysql-hive-0.10.vm -db test

Then we can use that record of changes to create a live version of the data, using a straightforward query within Hive. In fact, Hive provides the final crucial stage of the loading process by giving us that live view of the change data, and we simplify that element by providing the core data, and ensuring that the CSV data is in the right format for Hive to use the files without changes.

The process is quite remarkable; speed-wise for direct dumps, Tungsten Replicator is comparable to Sqoop, but when it comes to change data, the difference is that we have the information in real time. You don’t have to wait for the next Sqoop load, or for the incremental loading and row selection of Sqoop, instead, we just apply the changes written into the binary log.

Of course, we can fine tune the intervals of the writes of the CSV change data into Hadoop using the block commit properties (see http://docs.continuent.com/tungsten-replicator-2.2/performance-block.html). For example, this means by default we commit into Hadoop every 10s or 10,000 rows, but we can change it to commit every 5s or 1,000 rows if your data is critical and busy.

We’re still optimising and improving the system, but I can tell you that in my own tests we can handle GB of change data and information in a live fashion, both across single-table and multi-table/multi-schema datasets. What’s particularly cool is that if you are using Hadoop as a concentrator for all of your MySQL data for analysis, we can transfer from multiple MySQL servers into Hadoop simultaneously and take advantage of the multi-node Hadoop environment to cope with the load.


Anonymizing Data During Replication

If you happen to work with personal data, chances are you are subject to SOX (Sarbanes-Oxley) whether you like it or not.

One of the worst aspects of this is that if you want to be able to analyse your data and you replicate out to another host, you have to find a way of anonymizing the information. There are of course lots of ways of doing this, but if you are replicating the data, why not anonymize it during the replication?

Of the many cool features in Tungsten Replicator, one of my favorites is filtering. This allows you to process the stream of changes that are coming from the data extracted from the master and perform operations on it. We use it a lot in the replicator for ignoring tables, schemas and columns, and for ensuring that we have the correct information within the THL.

Given this, let’s use it to anonymize the data as it is being replicated so that we don’t need to post-process it for analysis, and we’re going to use JavaScript to do that.

For the actual anonymization, we’re going to use a simple function that devolves the content into an anonymizer. For this, I’m going to use the md5 in JavaScript function provided by Paul Johnston here: http://pajhome.org.uk/crypt/md5, although others are available. The main benefit of using md5 is that the same string of text will always be hashed into a consistent value. This is important because it means that we can still run queries with joins between the data, knowing that, for example, the postcode ‘XQ23 1LD’ will be based into ‘d41d8cd98f00b204e9800998ecf8427e’ in every table. Joins still work, and data analysis is entirely valid.

Better still, because it’s happening during replication I always have a machine with anonymised information that I can use to query my data without worrying about SOX tripping me up.

Within the JS filter environment there are two key functions we need to use. One is prepare(), which is called when the replicator goes online, and the other is filter() which processes each event within the THL.

In the prepare() function, I’m going to identify from the configuration file which fields from the configuration file we are going to perform the actual hashing operation on. We do that by creating a hash structure within JavaScript that maps the schema, table name, and field. For example, to anonymise the field ‘postcode’ in ‘address’ in the schema ‘customers’:

stcspec=customers.address.postcode

For this to work, we must have the colnames filter enabled.

The function itself just splits the stcspec parameter from the configuration file into a hash of that combo:

var stcmatch = {};
function prepare()
{
  logger.info("anonymizer: Initializing...");  

    stcspec = filterProperties.getString("stcspec");
    stcarray = stcspec.split(",");
    for(i=0;i<stcarray.length;i++) { 
        stcmatch[stcarray[i]] = 1;
    }

}

The filter() function is provided one value, the event object from the THL. We operate only on ROW-based data (to save us parsing the SQL statement), and then supply the event to an anonymize() function for the actual processing:

function filter(event)
{
  data = event.getData();
  if(data != null)
  {
    for (i = 0; i < data.size(); i++)
    {
      d = data.get(i);

      if (d != null && d instanceof com.continuent.tungsten.replicator.dbms.StatementData)
      {
          // Ignore statements
      }
      else if (d != null && d instanceof com.continuent.tungsten.replicator.dbms.RowChangeData)
      {
          anonymize(event, d);
      }
    }
  }
}

Within the anonymise event, we extract the schema and table name, and then look at each column, and if it exists in our earlier stcspec hash, we change the content of the THL on the way past to be the hashed value, in place of the original field value. To do this we iterate over the rowChanges, then over the columns, then over the individual rows:

function anonymize(event, d)
{
  rowChanges = d.getRowChanges();

  for(j = 0; j < rowChanges.size(); j++)
  {
    oneRowChange = rowChanges.get(j);
    var schema = oneRowChange.getSchemaName();
    var table = oneRowChange.getTableName();
    var columns = oneRowChange.getColumnSpec();

    columnValues = oneRowChange.getColumnValues();
    for (c = 0; c < columns.size(); c++)
    {
      columnSpec = columns.get(c);
          columnname = columnSpec.getName();

      rowchangestc = schema + '.' + table + '.' + columnname;

      if (rowchangestc in stcmatch) {

        for (row = 0; row < columnValues.size(); row++)
        {
            values = columnValues.get(row);
            value = values.get(c);
            value.setValue(hex_md5(value.getValue()));

        }
      }
    }
  }
}

Append the md5() script from Paul Johnston (or indeed whichever md5 / hashing algorithm you want to use) to the end of the entire script text:

var stcmatch = {};
function prepare()
{
  logger.info("anonymizer: Initializing...");  

    stcspec = filterProperties.getString("stcspec");
    stcarray = stcspec.split(",");
    for(i=0;i<stcarray.length;i++) { 
        stcmatch[stcarray[i]] = 1;
    }

}

function filter(event)
{
  data = event.getData();
  if(data != null)
  {
    for (i = 0; i < data.size(); i++)
    {
      d = data.get(i);

      if (d != null && d instanceof com.continuent.tungsten.replicator.dbms.StatementData)
      {
          // Ignore statements
      }
      else if (d != null && d instanceof com.continuent.tungsten.replicator.dbms.RowChangeData)
      {
          anonymize(event, d);
      }
    }
  }
}

function anonymize(event, d)
{
  rowChanges = d.getRowChanges();

  for(j = 0; j < rowChanges.size(); j++)
  {
    oneRowChange = rowChanges.get(j);
    var schema = oneRowChange.getSchemaName();
    var table = oneRowChange.getTableName();
    var columns = oneRowChange.getColumnSpec();

    columnValues = oneRowChange.getColumnValues();
    for (c = 0; c < columns.size(); c++)
    {
      columnSpec = columns.get(c);
          columnname = columnSpec.getName();

      rowchangestc = schema + '.' + table + '.' + columnname;

      if (rowchangestc in stcmatch) {

        for (row = 0; row < columnValues.size(); row++)
        {
            values = columnValues.get(row);
            value = values.get(c);
            value.setValue(hex_md5(value.getValue()));

        }
      }
    }
  }
}

Hint

Depending on your configuration, datatypes and version, the return from getValue() is a byte array, not a character string; in that case, add this function:

function byteArrayToString(byteArray) 
{ 
   str = ""; 
   for (i = 0; i < byteArray.length; i++ ) 
   { 
      str += String.fromCharCode(byteArray[i]); 
   } 
   return str; 
}

And change:

value.setValue(hex_md5(value.getValue()));

to:

value.setValue(hex_md5(byteArrayToString(value.getValue())));

That will correctly convert it into a string.

If the error hits, Tungsten Replicator will just stop on that event, not apply bad data. Putting it ONLINE again after changing the script will re-read the event, re-process it through the filter, and then apply the data.

end Hint

Now we need to manually update the configuration. On a Tungsten Replicator slave, open the static-SERVICENAME.properties file in /opt/continuent/tungsten/tungsten-replicator/conf and then add the following lines within the filter specification area (about 90% of the way through):

replicator.filter.anonymize=com.continuent.tungsten.replicator.filter.JavaScriptFilter 
replicator.filter.anonymize.script=/opt/continuent/share/anonymizer.js 
replicator.filter.anonymize.stcspec=customers.address.postcode

The first line defines a filter called anonymize that uses the JavaScriptFilter engine, the second line specifies the location of the JavaScript file, and the third contains the specification of which fields we will change, separated by a comma.

Now, find the line containing “replicator.stage.q-to-dbms.filters” around about line 200 or so and add ‘anonymize’ to the end of the filter list.

Finally, make sure you copy the anonymizer.js script into the /opt/continuent/share directory (or wherever you want to put it that matches the paths specified above).

Now restart the replicator:

$ replicator restart

On your master, make sure you have the colnames filter-enabled. You can do this in master’s static-SERVICENAME.properties like this:

replicator.stage.binlog-to-q.filters=colnames,pkey

Now restart the master replicator:

$ replicator restart

Double check that the replicator is online using trepctl; it will fail if the config is wrong or the JavaScript isn’t found. If everything is running, go to your master and make sure you enable row-based logging (my.cnf: binlog-format=’ROW’), and then try inserting some data into the table that we are anonymizing:

mysql> insert into address values(0,'QX17 1LG');

Now check the value on the slave:

| 711 | dc889465b382 |

Woohoo!

Anonymized data now exists on the slave without having to manually run the process to clean the data.

If you want to extend the fields this is applied to, add them to the stcspec in the configuration, separating each one by a comma, and make sure you restart the replicator.


Percona Live 2013, MySQL, Continuent and an ever-healthy Ecosystem

I’m sitting here in the lounge at SFO thinking back on the last week, the majority of which has been spent meeting my new workmates and attending the Percona MySQL conference.

For me it has been as much of a family reunion as it has been about seeing the wonderful things going on in MySQL.

Having joined Continuent last month after an ‘absence’ in NoSQL land of almost 2.5 years, joining the MySQL community again just felt like coming home after a long absence. And that’s no bad thing. On a very personal level it was great to see so many of my old friends, many of whom were not only pleased to see me, but pleased to see me working back in the MySQL fold. Evidently many people think this is where I belong.

What was great to see is that the MySQL community is alive and well. Percona may be the drivers behind the annual MySQL conference that we have come to know, but behind the name on the passes and over the doors, nothing has changed in terms of the passion behind the core of the project.

Additionally, it’s great to see that despite all of the potential issues and tragedies that were predicted when Oracle took over the reins of MySQL, as Baron says, they are in fact driving and pushing the project forward. The features in 5.6 are impressive and useful, rather than just a blanket cycling of the numbers. I haven’t had the time to look at 5.7, but I doubt it is just an annual increment either. When I left Oracle, people were predicting MySQL would be dead in two years as an active project at Oracle, but in fact what seems to have happened is that the community has rallied round it and Oracle have seen the value and expertly steered it forward.

It’s also interesting to me – as someone who moved outside the MySQL fold – to note that other databases haven’t really supplanted the core of the MySQL foothold. Robert Hodge’s Keynote discussed that in more depth, and I see no reason to disagree with him.

I’m pleased to see that my good friend Giuseppe had his MySQL Sandbox when application of the year 2013 – not soon enough in my eyes, given that as a solution for running MySQL it has been out there for more years than I care to remember.

I’m also delighted of course that Continuent won Corporate contributor of the year. One of the reasons I joined the company is because I liked what they were doing. Replication in MySQL is unnecessarily hard, particularly when you get more than one master, or want to do clever things with topologies beyond the standard master/slave. I used Federated tables to do it years ago, but Tungsten makes the whole process easier.  What Continuent does is provide an alternative to MySQL native replication which is not only more flexible, but also fundamentally very simple. In my experience, simple ideas are always very powerful, because their simplicity makes them easy to adapt, adopt and add to.

Of course, Continuent aren’t the only company producing alternatives for clustering solutions with MySQL, but to me that shows there is a healthy ecosystem willing to build solutions around the powerful product at the centre. It’s one thing to choose an entirely different database product, but another to use the same product with one or more tools that produces an overall better solution to the problem. *AMP solutions haven’t gone away, we’ve just extended and expanded on top of them. That must mean that MySQL is just as powerful and healthy a core product as it ever was.

That’s my key takeaway from this conference – MySQL is alive and well, and the ecosystem it produced is as organic and thriving as it ever has been, and I’m happy to be back in the middle of that.


Pert & Simple

I’m a huge fan of Kickstarter and back last year I was working on trying to expand my range a little from the rather narrow range of tech-related projects I’d been funding. One particular art project caught my eye. Firstly, it was local – well, within 50 miles – which is a surprise when so many are either US or frequently London based. The other was the name, and you cannot help be caught by the expression Pert & Simple. See http://www.kickstarter.com/projects/hollyjames/exhibition-pert-and-simple?ref=live.

Unfortunately for Holly and Anna, the Kickstarter process didn’t come up with the goods, but I was happy enough to support a local project and contributed some money to them directly so that they could put on the exhibition.

Holly James’ and Anna Salmane’s work fits into the category of what I refer to as ‘thought provoking art’. That is not to say that all art is not thought provoking, but hat some art is specifically developed because of the emotions and thoughts it is designed to instil.

To best illustrate that point I’ll start by talking about Holly’s work, because, from a psychological perspective (which is one of my personal interests) this had the most impact on me. Holly has been working with patterns and in particular weaving. Originally starting with white thread, Holly moved on to using fluorescent so that the weaving would stand out against your typical white wall. Despite all of the work that went into the weaving, the chosen presentation mechanism was simply hanging it on nails on the wall. This results in a dramatic, unconventional, if not elegant, demonstration of the work.

For Holly though the most critical part occurred when people started to view the work. People would come in and touch the fabric. Of course, this is a perfectly normal experience. How many of us walk into any clothing shop and starting touching up the fabric? We don’t look at these items from a far and judge how well they will fit. It’s a textile, we are going to touch it. But this is art. You wouldn’t go up and touch a painting, would you? Why touch a textile that is a piece of art? Because that is human behaviour.

So what did Holly do next? At the exhibition, she was working on a “Making a Fluorescent Doormat”. Why a doormat? because if people are going to touch your art, why not let people just walk all over it? Of course, this leads to an interesting paradox – the chances are that humans will refuse to walk on something that (a) they are told to walk on, and (b) are explicitly allowed to walk on and (c) would normally be walked on.

That habit of human behaviour, to do the opposite of what you should do, or even what you have been told to do, is such a core part of the human condition it is hard to get away from. To force people into a situation where they are allowed to do something that would be perfectly natural if it wasn’t for the fact that you had been told you are allowed to do it serves up an interesting position on the human condition. I would love to be able to watch people and count how many actually walk on the doormat.

There is also a separate aspect, that of the artist being disrespected for their work – people walking all over Holly’s doormat directly reflects how she felt when people touched her “Fluorescent Weaving” piece. Does her giving them permission lessen that effect, or give people the right to walk over her?

Holly’s work is beautiful, but also has a lot of meaning, both to her, and to those who understand how these items were produced.

And that leads me on to Anna’s artwork. Anna’s work is also very personal. Sadly, there are no cat-related artworks from Anna yet (although she has promised them!), but they are no less intriguing. “Sorry” for example represents a count of how many times Anna said ‘Sorry’ over the course of the year, and the counter used to tally up the results. Sorry has itself become such a throw-away statement. We say it when we bump into someone, when we we don’t respond quick enough, just for not doing something, and yet really sorry has a very specific, empathic, meaning of sympathy for somebody else’s feelings. Yet we will say it to strangers as much as we do to friends and family.

At the launch of the exhibition, one person’s first question was ‘what happened in March for you to say sorry so much?’ – an apt demonstration of how much importance we attach to the word, despite how frequently it is used.

Anna’s “Starcounter” is a personal view of the inability to count the stars in the sky – as explained to her by her father. Anna chose to represent this with a simulation of a night sky and a recording of Anna counting to 6000. It takes Anna hours to read through the numbers, and it took her even longer to actually record the material. Does this represent the futility of counting so many objects, or the endless-ness of the stars in the sky? Or both?

The other piece that resonated with me, but forgive me I have forgotten the name, is that of a slideshow of headstones at a war memorial. Each headstone is essentially identical – same shape, name, lifespan, and yet each one represents a unique person, with families, loved ones, and a whole life of experiences. Yet they are so simply, and consistently, represented, so as to almost make they lives meaningless.

These are personal, and yet hugely descriptive examples that to me speak of the dichotomoy between meaningless and meaningful, represented by the same objects.

From this description you would imagine that the entire exhibition is incredibly some and serious. But it’s not. Behind the works by both Holly and Anna is actually a layer of humility, if not outright satire as all of the pieces often have a simpler, and, as the title suggests, impertinent edge. Holly’s desire for people to walk on the doormat is a tongue-in-cheek poke at the way people treat art. Go ahead, she’s daring you to walk on that doormat with a broad smile on her face. Go on. Anna, meanwhile, is remembering her fun summer evenings of counting the stars as a child, and the inanity of saying sorry.

It’s only complicated art, with additional levels, meanings, and complexities, if you add them, and that, just as they are hugely personal to Anna and Holly makes them exclusively personal to the viewer. You can get caught up in the meanings if it suits you. Or you can have wry smile to yourself as the artist introduces the works.

Meeting Holly and Anna and talking to them clearly shows their passion for their work, but also the humour and relaxed attitude to the meaning of the pieces. I spent a few hours with them both at their launch event in Nottingham. Anna is working on her BA at Goldsmiths, and Holly has completed hers and, hopefully, working for her MA with the Royal College. I certainly wish them all the success that deserve from work they displayed at the exhibition. It is no word of a lie to say that I feel honoured to have attended and more importantly understood the work of two exceptionally talented artists. Both from their perspectives, and my own.

You can see pictures of the exhibition at http://www.hollyjames.org/pert-and-simple

More work by Holly is available here: http://www.hollyjames.org

Anna’s work can be viewed on http://www.salmane.co.uk


Gilmerton Cove

I was fortunate enough recently to visit Gilmerton Cove. Despite the name, the cove is not located near the sea at all, but outside Edinburgh, and it is one of those places that makes you wonder how such a wonderful and intriguing location can have been kept as such a fantastic secret. We’ve visited Edinburgh lots of times and been to the castle, on the wonderful walks provided by Mercat tours and many other sites and locations. There is a lot of history and background to the city, all of it rich and colourful, and some of it surprising. I never knew, for example, about the extensive underground caves and crypts which use to fuel the lifeblood of the city producing matches, pins and more.

Gilmerton Cove though is different. It is located underneath a crossroads in Gilmerton and is a short bus ride from the centre of the city, it takes about 25 minutes from South Bridge (which sits above Waverley station). What you are looking for is the Gilmerton crossroads and Ladbrokes. Yes, you read that right. The cove is next, and more importantly, underneath, Ladbrokes.

Underneath gives you the main clue to what makes Gilmerton Cove such an interesting place. Entry is through a small door further down the road (you can just about see the sign from the crossroads), and once you get inside you are greeted by the wonderful staff. We were lucky to visit out of season and so there were only four of us, which gave the tour a little more of the personal touch, and our guide was the amazing Margaretanne.

The key about the cove is that it is really a cave, but it is man made. The mystery is that we don’t really know when or where. There are lots of clues beneath the ground of what it might have been, or what might have created it, but not enough information to provide a definitive answer.

IMG_0021

The first thing you notice about the cove is that the walls have quite clearly been worked, by person or more probably persons unknown. The area itself is quite large. Think generous 3-bedroom modern house and, in some ways, with a similar structure. There are individual rooms, some with quite clear doorways and entrances. There are also communal rooms and areas. One of these is quite clearly arranged as a sort of dining area, with a clear table up the middle of the room with seating either side.

IMG_0016

Note the careful modelling. This is not a convenient natural location; that table is too smooth and deliberate, and the proportions were perfect as a seating area. Also note in the top of the picture a gap which leads up, probably to allow either light in, smoke out, or both.

Another, larger, area is even more intriguing. Another seating area, with a carefully crafted pillar helping to support the ceiling. Not very visible in the foreground of this short is a hole, about the size of a large saucepan, quite obviously carved into the rock. The sides and base are smooth, and there’s even a lip around the edge. To me, it was very reminiscent of similar holes in the stonework in Pompeii which were used for keeping food hot or cold. You could easily seat 10 or 12 people here, and while you could hold a banquet, it would be convenient enough to hold a number of people for illicit drinks. And that, they believe, is one of the main uses for this in the past, as a drinking den.

IMG_0020

Further rooms, bedrooms?, lead off the main corridor. Some are large, some have dedicated fireplaces, and there are even alcoves and areas to display items. There’s even what appears to be a forge, although probably not a particularly efficient one. There are also a number of collapsed passageways, some of which lead in directions such as local churches, and even to just across the road. These are yet to be fully explored.

The only known information about Gilmerton Cove is that of George Paterson, a blacksmith, who supposedly created the ‘house’ after 5 years of hard labour in 1724. That’s a lot of house to have dug out by hand in just 5 years. For one man, even for a group of men, that is quite an achievement. With modern tools, yes, in 1724? There wasn’t a jackhammer, pneumatic drill, and the work is too precise to have been achieved with gunpowder, especially when you consider that even today, the shops above are feet and and inches above your head. This is not a deep construction – at one point you can actually reach up to touch the underneath of Ladbrokes next door.

There are carvings and initials around the caves – the sandstone is comparatively soft – but these could have been etched by people 5 years ago or 300 years ago, there’s no way to tell.

And the mystery remains. There are many theories, from the Hellfire Club, Knights Templar, and having been created by the Covenanters (http://www.sorbie.net/covenanters.htm) , those people who opposed the interference of the kings of the Stuart era messing the Scottish presbyterian church.

Gilmerton Cove is beautiful from an architectural perspective, but oh so intriguing because we know so little about how it came about. It’s a substantial place, and I’ve certainly been in houses and homes that are smaller today and they are constructed much more easily from bricks and mortar. Ignoring how it was constructed, who constructed it, and why? It’s not deep enough to be particularly secret, it doesn’t seem to have the reverence to be a holy place (although it could easily have been the location of a more subversive organisation), it doesn’t seem to be smokey or sooty enough for a blacksmiths workshop, or even for a home without some serious ventilation. If it had been populated by many people, you would expect more signs of wear and tear. It doesn’t look abused, but carefully looked after, and remarkably clean.

Please, go and see Gilmerton Cove (http://www.gilmertoncove.org.uk/)  for yourself and see if you can solve the mystery!


Joining Continuent

I’ve just completed my first month here at Continuent, strangely back into the MySQL ecosystem which I have been working in for some time before I joined CouchOne, and then Couchbase, two and half years ago. Making the move back to MySQL is both an experience, and somehow, comfortable…

Continuent produce technology that makes for easier replication between MySQL servers and, more importantly, more flexible solutions when you need to scale out by providing connector and management functionality for your MySQL cluster. That means that you can easily backup, add slaves, and create complex replication scenarios such as multi-master, and even multiple-site, multiple-master topologies. This functionality is split over two products, Continuent Tungsten, which is the cluster management product, and the open source Tungsten Replicator, which provides the basic replication functionality.

Those who know me well will know that I am no fan of the native MySQL replication, and that’s almost entirely because of the complexities of first of all getting it to work, followed by keeping it working, and ultimately because of the variability of the replication in the first place. There’s no reliable way to know if the replication stream has successfully been applied to the slaves or, from a client perspective, how far behind slaves are so that you can make an educated guess about which slave you should be talking to. Let’s not even get into the complexities of having to handle the read/write splitting required to make the master/slave relationship work in the first place.

Continuent solves this problem by using the binary log stream from MySQL, but handling the transfer and application of those bin log entries. Using this method enables Continuent to monitor and manage the replication process. Continuent knows when a statement has been applied to a slave, and it can work to ensure that the application of the changes is applied. With Continuent Tungsten, we go a stage further and provide a connector that sits between your application servers and your MySQL servers. Because Continuent Tungsten handles the replication, it knows the cluster topology, and can redirect queries to the master or the slave, and handle failures by directing the client queries to working slaves. Like all good software, it’s simple, but very very effective, and ergo very powerful.

So what am I doing at Continuent? Building out the documentation and helping users to help themselves, in combination with working with the developers to make sure that the software is as easy, intuitive, and foolproof to use as possible. In the short term, that means ensuring we have the core documentation required to get Continuent working for you.

If there’s more information you need, or something you specifically want in the documentation, let me know.

Moving from MySQL to Couchbase

Before moving to Couchbase and working with NoSQL technology I had for years been a MySQL user. Making that leap from MySQL to NoSQL requires a number of changes, not least of which to the way you structure your data and then query it. 

I’ve tried to distil the first part of that process down into a simpler form and steps in a new blog post oat the Couchbase blog http://blog.couchbase.com/how-move-mysql-couchbase-server-20-part-1