uriparser  0.7.6
uriparser Documentation

Table of Contents

Introduction

Welcome to the short uriparser integration tutorial. It is intended to answer upcoming questions and to shed light where function prototypes alone are not enough. Please drop me a line if you need further assistance and I will see what I can do for you. Good luck with uriparser!

Parsing URIs (from string to object)

Parsing a URI with uriparser looks like this:

        UriParserStateA state;
        UriUriA uri;

        state.uri = &uri;
        if (uriParseUriA(&state, "file:///home/user/song.mp3") != URI_SUCCESS) {
                /* Failure */
                uriFreeUriMembersA(&uri);
                ...
        }
        ...
        uriFreeUriMembersA(&uri);

While the URI object (UriUriA) holds information about the recogized parts of the given URI string, the parser state object (UriParserStateA) keeps error code and position. This information does not belong to the URI itself, which is why there are two seperate objects.

You can reuse parser state objects for parsing several URIs like this:

        UriParserStateA state;
        UriUriA uriOne;
        UriUriA uriTwo;

        state.uri = &uriOne;
        if (uriParseUriA(&state, "file:///home/user/one") != URI_SUCCESS) {
                /* Failure */
                uriFreeUriMembersA(&uriOne);
                ...
        }
        ...
        state.uri = &uriTwo;
        if (uriParseUriA(&state, "file:///home/user/two") != URI_SUCCESS) {
                /* Failure */
                uriFreeUriMembersA(&uriOne);
                uriFreeUriMembersA(&uriTwo);
                ...
        }
        ...
        uriFreeUriMembersA(&uriOne);
        uriFreeUriMembersA(&uriTwo);

Recomposing URIs (from object back to string)

According to RFC 3986 glueing parts of a URI together to form a string is called recomposition. Before we can recompose a URI object we have to know how much space the resulting string will take:

        UriUriA uri;
        char * uriString;
        int charsRequired;
        ...
        if (uriToStringCharsRequiredA(&uri, &charsRequired) != URI_SUCCESS) {
                /* Failure */
                ...
        }
        charsRequired++;

Now we can tell uriToStringA() to write the string to a given buffer:

        uriString = malloc(charsRequired * sizeof(char));
        if (uriString == NULL) {
                /* Failure */
                ...
        }
        if (uriToStringA(uriString, &uri, charsRequired, NULL) != URI_SUCCESS) {
                /* Failure */
                ...
        }
Remarks:
Incrementing charsRequired by 1 is required since uriToStringCharsRequiredA() returns the length of the string as strlen() does, but uriToStringA() works with the number of maximum characters to be written including the zero-terminator.

Resolving References

Reference Resolution is the process of turning a (relative) URI reference into an absolute URI by applying a base URI to it. In code it looks like this:

        UriUriA absoluteDest;
        UriUriA relativeSource;
        UriUriA absoluteBase;
        ...
        /* relativeSource holds "../TWO" now */
        /* absoluteBase holds "file:///one/two/three" now */
        if (uriAddBaseUriA(&absoluteDest, &relativeSource, &absoluteBase) != URI_SUCCESS) {
                /* Failure */
                uriFreeUriMembersA(&absoluteDest);
                ...
        }
        /* absoluteDest holds "file:///one/TWO" now */
        ...
        uriFreeUriMembersA(&absoluteDest);
Remarks:
uriAddBaseUriA() does not normalize the resulting URI. Usually you might want to pass it through uriNormalizeSyntaxA() after.

Creating References

Reference Creation is the inverse process of Reference Resolution: A common base URI is "substracted" from an absolute URI to make a (relative) reference. If the base URI is not common the remaining URI will still be absolute, i.e. will carry a scheme

        UriUriA dest;
        UriUriA absoluteSource;
        UriUriA absoluteBase;
        ...
        /* absoluteSource holds "file:///one/TWO" now */
        /* absoluteBase holds "file:///one/two/three" now */
        if (uriRemoveBaseUriA(&dest, &absoluteSource, &absoluteBase, URI_FALSE) != URI_SUCCESS) {
                /* Failure */
                uriFreeUriMembersA(&dest);
                ...
        }
        /* dest holds "../TWO" now */
        ...
        uriFreeUriMembersA(&dest);

The fourth parameter is the domain root mode. With URI_FALSE as above this will produce URIs relative to the base URI. With URI_TRUE the resulting URI will be relative to the domain root instead, e.g. "/one/TWO" in this case.

Filenames and URIs

Converting filenames to and from URIs works on strings directly, i.e. without creating an URI object.

        const char * const absFilename = "E:\\Documents and Settings";
        const int bytesNeeded = 8 + 3 * strlen(absFilename) + 1;
        char * absUri = malloc(bytesNeeded * sizeof(char));
        if (uriWindowsFilenameToUriStringA(absFilename, absUri) != URI_SUCCESS) {
                /* Failure */
                free(absUri);
                ...
        }
        /* absUri is "file:///E:/Documents%20and%20Settings" now */
        ...
        free(absUri);

Conversion works ..

All you have to do is to choose the right function for the task and allocate the required space (in characters) for the target buffer. Let me present you an overview:

Normalizing URIs

Sometimes we come accross unnecessarily long URIs like "http://example.org/one/two/../../one". The algorithm we can use to shorten this URI down to "http://example.org/one" is called Syntax-Based Normalization. Note that normalizing a URI does more than just "stripping dot segments". Please have a look at Section 6.2.2 of RFC 3986 for the full description.

As we asked uriToStringCharsRequiredA() for the required space when converting a URI object back to a sring, we can ask uriNormalizeSyntaxMaskRequiredA() for the parts of a URI that require normalization and then pass this normalization mask to uriNormalizeSyntaxExA():

        const unsigned int dirtyParts = uriNormalizeSyntaxMaskRequiredA(&uri);
        if (uriNormalizeSyntaxExA(&uri, dirtyParts) != URI_SUCCESS) {
                /* Failure */
                ...
        }

If you don't want to normalize all parts of the URI you can pass a custom mask as well:

        const unsigned int normMask = URI_NORMALIZE_SCHEME | URI_NORMALIZE_USER_INFO;
        if (uriNormalizeSyntaxExA(&uri, normMask) != URI_SUCCESS) {
                /* Failure */
                ...
        }

Please see UriNormalizationMaskEnum for the complete set of flags.

On the other hand calling plain uriNormalizeSyntaxA() (without the "Ex") saves you thinking about single parts, as it queries uriNormalizeSyntaxMaskRequiredA() internally:

        if (uriNormalizeSyntaxA(&uri) != URI_SUCCESS) {
                /* Failure */
                ...
        }

Working with query strings

RFC 3986 itself does not understand the query part of a URI as a list of key/value pairs. But HTML 2.0 does and defines a media type application/x-www-form-urlencoded in in section 8.2.1 of RFC 1866. uriparser allows you to dissect (or parse) a query string into unescaped key/value pairs and back.

To dissect the query part of a just-parsed URI you could write code like this:

        UriUriA uri;
        UriQueryListA * queryList;
        int itemCount;
        ...
        if (uriDissectQueryMallocA(&queryList, &itemCount, uri.query.first,
                        uri.query.afterLast) != URI_SUCCESS) {
                /* Failure */
                ...
        }
        ...
        uriFreeQueryListA(queryList);
Remarks:
  • NULL in the value member means there was no '=' in the item text as with "?abc&def".
  • An empty string in the value member means there was '=' in the item as with "?abc=&def".

To compose a query string from a query list you could write code like this:

        int charsRequired;
        int charsWritten;
        char * queryString;
        ...
        if (uriComposeQueryCharsRequiredA(queryList, &charsRequired) != URI_SUCCESS) {
                /* Failure */
                ...
        }
        queryString = malloc((charsRequired + 1) * sizeof(char));
        if (queryString == NULL) {
                /* Failure */
                ...
        }
        if (uriComposeQueryA(queryString, queryList, charsRequired + 1, &charsWritten) != URI_SUCCESS) {
                /* Failure */
                ...
        }
        ...
        free(queryString);

Ansi and Unicode

uriparser comes with two versions of every structure and function: one handling Ansi text (char *) and one working with Unicode text (wchar_t *), for instance

This tutorial only shows the usage of the Ansi editions but their Unicode counterparts work in the very same way.

Autoconf Check

You can use the code below to make ./configure test for presence of uriparser 0.6.4 or later.

URIPARSER_MISSING="Please install uriparser 0.6.4 or later.
   On a Debian-based system enter 'sudo apt-get install liburiparser-dev'."
AC_CHECK_LIB(uriparser, uriParseUriA,, AC_MSG_ERROR(${URIPARSER_MISSING}))
AC_CHECK_HEADER(uriparser/Uri.h,, AC_MSG_ERROR(${URIPARSER_MISSING}))

URIPARSER_TOO_OLD="uriparser 0.6.4 or later is required, your copy is too old."
AC_COMPILE_IFELSE([
#include <uriparser/Uri.h>
#if (defined(URI_VER_MAJOR) && defined(URI_VER_MINOR) && defined(URI_VER_RELEASE) \
&& ((URI_VER_MAJOR > 0) \
|| ((URI_VER_MAJOR == 0) && (URI_VER_MINOR > 6)) \
|| ((URI_VER_MAJOR == 0) && (URI_VER_MINOR == 6) && (URI_VER_RELEASE >= 4)) \
))
/* FINE */
#else
# error uriparser not recent enough
#endif
],,AC_MSG_ERROR(${URIPARSER_TOO_OLD}))
 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator