Rustopals - Part 1

Having always been interested in information security, I was very excited when I learned about the Cryptopals challenge by Matasano security a few years ago. But as it happens with many things in life, I put it in my queue of projects and it never bubbled up again until now.

Fortunately, being a PhD student has some advantages such as letting me pursue a lot of side projects. Considering that my weapon of choice at the moment is Rust, I decided it would be a good time to solve the Cryptopals challenges using Rust, both as a means of continue to learn Rust as well as finally diving into the mystical world of crypto-analysis.

This series makes the only assumption that you have a working Rust development environment and the desire to learn more about both Rust and Cryptoanalysis. So everyone is welcome!

At the moment of writing this article, there are 64 challenges of varying complexity in the Matasano website. For each article in the series, I’ll group a few of them together and try to cover the most important aspects of each one.

Without further ado, lets get started with the challenges.

Challenge 1: Convert hex to base64

The string:

49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

Should produce:

SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

This challenge is a warm-up without much difficulty, however, it sets some of the basics we will use throughout all other challenges. In fact, the website gives us a rule-of-thumb for all of the challenges:

Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing.

The work required to solve this in Rust is to figure out how to encode and decode hex and base64 strings. Rust does not support hex/base64 encoding/decoding functionality by default, but one of the strengths of Rust is its strong community and how easy it is to add crates (libraries) to extend the standard library.

A quick search in crates.io (the default location for indexing published crates) for “hex” or “base64” shows that there are many crates that can provide this functionality. In fact, there are exact matches for both search strings, and the website shows in their respective statistics that they are the most downloaded crates from their respective searches, so those are likely good crates to try out.

To get started, we create a new Rust project using cargo:

cargo new set1_cha1

Having picked our crates for hex and base64 manipulation, we add the dependencies to our Cargo.toml file:

[dependencies]
base64 = "0.8.0"
hex = "0.3.1"

Now, the challenge gives us a string that represents hex data and wants us to produce another string with its equivalent base64 representation. This means that input is a hex-encoded string, and the output should be a base64-encoded string. As the rule of thumb given by Cryptopals indicates, we should always operate on raw bytes, so that means we need to transform the hex-encoded string to bytes (hex decoding), and then encode those same bytes as a base64 string (base64 encoding). Easy enough, right?

To tie the whole thing together, we’ll create a new test in our project that checks our work:

extern crate base64;
extern crate hex;

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_challenge1() {
        /* This creates a variable binding of type &str with the input data */
        let input = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d";
        /* This creates a variable binding of type &str with the output data */
        let expected_output = "SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t";

        /* The decode function returns a Result with an underlying Vec<u8>. For now, we unwrap the result
           and get the vector of bytes */
        let raw_data = hex::decode(input.as_bytes()).unwrap();
        /* The base64::encode takes a slice of bytes (&[u8]) so we convert the vector to the proper type */
        let output = base64::encode(raw_data.as_slice());

        /* Now compare the output with the expect output. Do note that since base64::encode returns a String
           object, and the expected output is a &str, we pass a reference to the output so we'll have the same type.*/
        assert_eq!(expected_output, &output);
    }
}
                                                                                                                                                                                                         

Now run our code with cargo test and we should see the following:

running 1 test
test tests::test_challenge1 ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests challenge1

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Well, that was easy! We now have 1 point on the scoreboard, so let’s keep the positive inertia and tackle the next problem. Before doing that though, it’s worth mentioning that if the test above fails for you, cargo will tell you the assertion failed and print the left and right values. This gives you an idea of where things might have failed and you can either attach your favourite debugger or start printing out the state of your program to track down the problem.

Challenge 2

Fixed XOR

Write a function that takes two equal-length buffers and produces their XOR combination.

If your function works properly, then when you feed it the string:

1c0111001f010100061a024b53535009181c

… after hex decoding, and when XOR’d against:

686974207468652062756c6c277320657965

… should produce:

746865206b696420646f6e277420706c6179

Alright, we are now getting a bit more into the subject. This next challenge asks us to take a hex-encoded string and xor-it against another hex-encoded string of the same length. Simple enough. For this new challenge, we can either create a new project or simply create a new test in the same project, since we’ll require some of the same dependencies. Regardless of what you choose, let’s create a function that does the work:

fn naive_xor_slices(a : &[u8], b : &[u8]) -> Vec<u8> {
    /* Check the sizes of the slices */
    assert!(a.len() == b.len());

    /* Create the result vector. Remember to make it mutable */
    let mut res = Vec::new();
    
    /* Iterate through the buffers. Since the buffers have the same length, no need to do boundary checking. */
    for i in 0..a.len() {
        /* xor the ith byte of each buffer and push it into the result buffer */
        res.push(a[i] ^ b[i]);
    }

    res
}

The function is quite simple. It takes two slices of byte vectors and iterates over them “c-style”, applying an xor operation on the ith byte of each vector and saving the results. Once again, we create a test to make sure our work is correct:

    #[test]
    fn challenge2() {
        /* Remember, we always work on raw data, so decode the input strings from the getgo*/
        let input = hex::decode("1c0111001f010100061a024b53535009181c").unwrap();
        let xor_string = hex::decode("686974207468652062756c6c277320657965").unwrap();
        let expected_output = hex::decode("746865206b696420646f6e277420706c6179").unwrap();
        let output = naive_xor_slices(input.as_slice(), xor_string.as_slice());

        assert_eq!(expected_output, output);
    }
}

And as expected, it works! Now, I mentioned before that the way we wrote the function was “c-style”, which is a perfectly fine style for writing effective code on Rust. However, Rust implements many features of functional languages that help you write more compact code. One such feature that could us write a more Rust-idiomatic solution would be iterators.

Iterators provide a common interface to collections that allows us to traverse its underlying collection element-by-element and potentially perform some operations on each step. They might sound simple or redundant (since it’s basically a folded-down loop) but when combined together, they can be quite powerful.

Let’s re-write the naive_xor_slices function to take advantage of iterators:

fn xor_slices(a : &[u8], b : &[u8]) -> Vec<u8> {
    a.iter().zip(b.iter()).map(|(x, y)| x ^ y).collect()
}

What? Is that it? Yes. This one-liner function is semantically equivalent to our previous more explicit version. Let’s break it down into its pieces:

a.iter()

This function call gives us an iterator over the first buffer. This is the first step needed to start combining the iterator functionality. It’s worth mentioning that at this point, each of the elements of the iterator is of the same type that each of the elements of the original collection (u8).

.zip(b.iter())

After getting an iterator for a, we call the zip() function on it, which takes as parameter a second iterator and produces a new iterator that steps over each one of its underlying iterators in lock-step. Meaning, on each step, we get the next element for each collection. Neat!

map(|(x, y)| x ^ y)

Afterwards, we use our shiny new zip iterator and call its map() function to create a new iterator that will apply the defined closure (anonymous function) passed to it to each of the elements of the iterator. The state defined between the pipes of the closure are the elements of the iterator we wish to use inside the closure. So in this case, declaring |(x, y)| gives us access each of the elements of the two underlying buffers. Afterwards, we simply apply the xor operator ^ on both elements.

Finally, .collect() takes the resulting operator from the map() call and converts it back to a suitable collection, thus returning a vector of u8 as desired.

Coming up next in Rustopals…

And that’s it for this edition of Rustopals. In the next article of the series we’ll tackle challenge #3 and start to flex our cryptoanalysis muscles. If you have any questions or corrections, please let me know in the comments below.

Thanks for reading and stay tuned for more coming soon!

Written on January 17, 2018