Rick

Rick
Rick

Sunday, November 10, 2013

Stackoverflow question on posting an HTTP form with Java


A users asks:
RE: "I wish to post a form in java that has both string and binary parameters e.g.
name=sam&photo=<...binary data...>
Unfortunately the available documentation only covers uploading either strings or binary data separately. How can I combine the two?"

My answer:

comment:

java.net.URL is no fun to work with either. I wrote a utility called HTTP that I was using to post REST calls w/o including Apache Commons HTTPRequest (not my thing). HTTP uses java.net.URL. I added a new form poster and my own URLEncoder for byte arrays. I included it below. You can cut and paste it or use boon (I don't care, not looking for boon converts, but I troll stackoverflow looking for things that people do, and add it too boon for practice)

Basically you need to send the mime-type `application/x-www-form-urlencoded'. The fields have to be text.

The field names and values are escaped/encoded, e.g., space characters are replaced by +', reserved characters are escaped using URL encoding. Oh and that is not all... Non-alphanumeric characters are replaced by%HH' as in %20 for space (but space is + so)....

So two hexadecimal digits representing the ASCII code of the character. If only Java could some how do this for you..... Oh wait it can... But is is a new class. It has only been around since Java 1.0. Check out URLEncoder, it is a Utility class for HTML form encoding. But it does not work with bytes like you want... :)

URLEncoder, included with the JDK, contains static methods for converting a String to the application/x-www-form-urlencoded MIME format. You can learn more about HTML form encoding, by consulting the HTML specification (cited below). Also check out: http://docs.oracle.com/javase/1.4.2/docs/api/java/net/URLEncoder.html

The URLEncoder handles the following: "The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same. The special characters ".", "-", "*", and "_" remain the same." The space character " " is converted into a plus sign "+". "

Here is the kicker for binary conversion...

"All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used."

Specify UTF-8 always.

Here is the HTTP specification.


This actually seems challenging enough to try to tackle myself, so let's do it.

I just added this to Boon for you.
    String response = HTTP.postForm ( "http://localhost:9220/test",
            Collections.EMPTY_MAP,
            map("hI", (Object)"hi-mom", "image", new byte[] {1,2,3})
    );
Now you can do it in one method call. :)
:)

Let me break that down for you. (You can grab it here btw:http://richardhightower.github.io/site/Boon/Welcome.html)
I added this to boon:
public static String postForm(final String url, final Map<String, ?> headers,
                                            final Map<String, Object> formData
)
The key here is encoding binary data:
    String response = HTTP.postForm ( "http://localhost:9220/test",
            Collections.EMPTY_MAP,
            map("hI", (Object)"hi-mom", "image", new byte[] {1,2,3})
    );

    boolean ok = true;
    ok |= response.startsWith ("hI=hi-mom&image=%01%02%03\n") ||
            die("encoding did not work");
The above is a test showing it works as I understand the spec.
The key is that it is turning "image", new byte[] {1,2,3} into image\u0000=%01%02%03.
BTW map is just a utility method that creates a map (listing at bottom).
The http server is just an echo.
    return Exceptions.tryIt(String.class, new Exceptions.TrialWithReturn<String>() {
        @Override
        public String tryIt() throws Exception {
            URLConnection connection;
            connection = doPostFormData(url, headers, formData);
            return extractResponseString(connection);
        }
    });
The magic happens in the doPostFormData:
private static URLConnection doPostFormData(String url, Map<String, ?> headers,
                                    Map<String, Object> formData
) throws IOException {
    HttpURLConnection connection;/* Handle output. */


    connection = (HttpURLConnection) new URL(url).openConnection();
    connection.setConnectTimeout(DEFAULT_TIMEOUT_SECONDS * 1000);

    connection.setDoOutput(true);

    connection.addRequestProperty ( "Content-Type", "application/x-www-form-urlencoded" );

    ByteBuf buf = ByteBuf.create ( 244 );



    final Set<String> keys = formData.keySet ();

    int index = 0;
    for ( String key : keys )  {

        Object value = formData.get ( key );

        if (index > 0) {
            buf.addByte ( '&' );
        }


        buf.addUrlEncoded (  key  );
        buf.addByte ( '=' );

        if ( ! ( value instanceof byte[] ) ) {
            buf.addUrlEncoded ( value.toString () );
        } else {
            buf.addUrlEncodedByteArray((byte[]) value);
        }
        index++;
    }


    manageContentTypeHeaders ( "application/x-www-form-urlencoded",
            StandardCharsets.UTF_8.name (), connection );

    manageHeaders(headers, connection);


    int len = buf.len ();
    IO.write(connection.getOutputStream(),
            new String(buf.readForRecycle (), 0, len, StandardCharsets.UTF_8), IO.DEFAULT_CHARSET);
    return connection;
}
Notice the call to addUrlEncodedByteArray you pass a byte array. Java works fine with URL encoding of strings. I could not find an easy way to encode a byte array so I just wrote it.
public void addUrlEncodedByteArray ( byte[] value ) {



    final byte[] encoded = new byte [2];

    for (int index = 0; index < value.length; index++) {
        int i = value[index];

        if ( i >= 'a' && i <= 'z' ) {
            this.addByte ( i );
        } else if ( i >= 'A' && i <= 'Z' ) {
            this.addByte ( i );
        } else if ( i >= '0' && i <= '9' ) {
            this.addByte ( i );
        } else if ( i == '_' || i == '-' || i == '.' || i == '*') {
            this.addByte ( i );
        } else if ( i == ' ') {
            this.addByte ( '+' );
        } else {
            encodeByteIntoTwoAsciiCharBytes(i, encoded);
            this.addByte ( '%' );
            this.addByte ( encoded [0] );
            this.addByte ( encoded [1] );
        }

    }
}
It is not the prettiest. But the unit tests work. I am sure you get the gist. It follows the spec and converts accordingly.
All data not in a certain range get encoded with %hexdigit hexdigit.
Then you just have these two methods to finish up the encoding:
/**
 * Turns a single nibble into an ascii HEX digit.
 *
 * @param nibble the nibble to encode.
 *
 * @return the encoded nibble (1/2 byte).
 */
protected static int encodeNibbleToHexAsciiCharByte( final int nibble ) {

    switch ( nibble ) {
        case 0x00:
        case 0x01:
        case 0x02:
        case 0x03:
        case 0x04:
        case 0x05:
        case 0x06:
        case 0x07:
        case 0x08:
        case 0x09:
            return nibble + 0x30; // 0x30('0') - 0x39('9')
        case 0x0A:
        case 0x0B:
        case 0x0C:
        case 0x0D:
        case 0x0E:
        case 0x0F:
            return nibble + 0x57; // 0x41('a') - 0x46('f')
        default:
            die("illegal nibble: " + nibble);
            return -1;
    }
}


/**
 * Turn a single bytes into two hex character representation.
 *
 * @param decoded the byte to encode.
 * @param encoded the array to which each encoded nibbles are now ascii hex representations.
 */
public static void encodeByteIntoTwoAsciiCharBytes(final int decoded, final byte[] encoded) {

    Objects.requireNonNull ( encoded );

    boolean ok = true;


    ok |= encoded.length == 2 || die("encoded array must be 2");


    encoded[0] = (byte) encodeNibbleToHexAsciiCharByte((decoded >> 4) & 0x0F);
    encoded[1] = (byte) encodeNibbleToHexAsciiCharByte(decoded & 0x0F);
}
That is the important bits. The rest is just dealing with HTTP request / header gak.
Here is manageContentTypeHeaders
    manageContentTypeHeaders ( "application/x-www-form-urlencoded",
            StandardCharsets.UTF_8.name (), connection );

...

private static void manageContentTypeHeaders(String contentType, String charset, URLConnection connection) {
    connection.setRequestProperty("Accept-Charset", charset == null ? StandardCharsets.UTF_8.displayName() : charset);
    if (contentType!=null && !contentType.isEmpty()) {
        connection.setRequestProperty("Content-Type", contentType);
    }
}
Here is manage headers
    manageHeaders(headers, connection);

...

private static void manageHeaders(Map<String, ?> headers, URLConnection connection) {
    if (headers != null) {
        for (Map.Entry<String, ?> entry : headers.entrySet()) {
            connection.setRequestProperty(entry.getKey(), entry.getValue().toString());
        }
    }
}
Then we encode the stream to send with UTF_8:
    int len = buf.len ();
    IO.write(connection.getOutputStream(),
            new String(buf.readForRecycle (), 0, len, StandardCharsets.UTF_8), IO.DEFAULT_CHARSET);
The IO write just does this: IO.write...
public static void write ( OutputStream out, String content, Charset charset ) {

    try ( OutputStream o = out ) {
        o.write ( content.getBytes ( charset ) );
    } catch ( Exception ex ) {
        Exceptions.handle ( ex );
    }

}
ByteBuf is just like a ByteBuffer but easier to use and very fast. I have benchmarks. :)
What did I miss?
Let me know if it works for you.
--Rick
The map function are just utility methods so I can concisely represent a map as I find I use them a lot. It only goes to 9 or ten. Beyond that I have a way to pass a list of entries.
public static <K, V> Map<K, V> map(K k0, V v0) {
    Map<K, V> map = new LinkedHashMap<>(10);
    map.put(k0, v0);
    return map;
}

public static <K, V> Map<K, V> map(K k0, V v0, K k1, V v1) {
    Map<K, V> map = new LinkedHashMap<>(10);
    map.put(k0, v0);
    map.put(k1, v1);
    return map;
}


public static <K, V> Map<K, V> map(K k0, V v0, K k1, V v1, K k2, V v2) {
    Map<K, V> map = new LinkedHashMap<>(10);
    map.put(k0, v0);
    map.put(k1, v1);
    map.put(k2, v2);
    return map;
}

public static <K, V> Map<K, V> map(K k0, V v0, K k1, V v1, K k2, V v2, K k3,
                                   V v3) {
    Map<K, V> map = new LinkedHashMap<>(10);
    map.put(k0, v0);
    map.put(k1, v1);
    map.put(k2, v2);
    map.put(k3, v3);
    return map;
}

public static <K, V> Map<K, V> map(K k0, V v0, K k1, V v1, K k2, V v2, K k3,
                                   V v3, K k4, V v4) {
    Map<K, V> map = new LinkedHashMap<>(10);
    map.put(k0, v0);
    map.put(k1, v1);
    map.put(k2, v2);
    map.put(k3, v3);
    map.put(k4, v4);
    return map;
}

It goes on to ten. I end up using these a lot. Java needs literals for maps, until then....

See the post here: http://stackoverflow.com/questions/19876933/http-post-both-string-and-binary-parameters-in-java/19886873#19886873
Is the answer useful? How could you possibly pay me back? Up vote me on stackoverflow. I want to get above 1,000. :)



Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training