Hilbert Curve for Displaying Genomic Data

The Hilbert curve is a neat fractal that can turn a single line into a square surface (see gif above which was kindly stolen from wikipedia). An interesting behavior of Hilbert curves is that locality is maintained, meaning that traits near each other in the line cluster together when mapped onto a Hilbert curve.

Hilbert curves can be useful for visualizing data that is linear and very, very long. Genomic data is indeed very long. Plotting a chromosome in a simple line plot would make it way too long to visualize at once (or if it is made with an acceptable width, then many interesting events would be too small to see).

In a Hilbert curve we can also plot up to three colors together with each color representing a different “thing” (technical word usage):

In this example we show chromosomes one to four:

  • We see which areas are replicated first during cell division (RepliChip; red)
  • We see areas with methylation in histone 3 on the lysine 27 (H3K27me3; green)
  • And we see areas with methylation in histone 3 on the lysine 4 (H3K4me3; blue).

When combined, it is possible to see correlations which would be hard to visualize. You can try it yourself: https://github.com/aLahat/hilbert_curve.

