DNS-based web metrics collection mechanism

For the 30 years that the worldwide web has existed people have sought to understand and classify the behavior of end user. From initial harvesting of web server logs, to the invention of beaconing web bugs, industry has been trying to extract analytics and perform data analysis. The current state in industry is that there is a plethora of competing collectors with each part of the enterprise selecting their own tracking mechanism. This leads to substantial overhead and even resource exhaustion. We need a mechanism that will reduce the impact of collecting web analytics, both in terms of page load times and cookie proliferation.

The W3C has added the ping attribute to the <A> tag to instruct the browser to simultaneously load the HREF target and POST to the PING target, which ameliorates the impact of redirect chains. It is however dependent on the User Agent to implement it, which is not a foregone conclusion that it is widely implemented.

Security-conscious practitioners know very well that the DNS protocol can be used to quietly exfiltrate data from a protected network. The curious reader may consult the DNS Tunneling technique  (here).

What if we were to combine these things? What if we were to somehow leverage the DNS Tunneling technique to collect web metrics. We could alternatively replace the current method or augment the current methods.

The web metric data is encoded within a purposely written DNS Query packet –eg the mechanism used for DNS Tunnelling— for transmission to collector instantiated as Authoritative DNS server. We show several methods, and embodiments, to perform this type of white-hat data exfiltration to provide performance and reliability gains from other existing methods.

How it would work

  • Script on client’s side takes all the metrics data to be sent and create a string out of them. Also adds verification string as well as a nonce string.
  • Data are encoded to produce a string base64, if the data string is too long, it may be compressed using common compression techniques such as gzip, or bzip2.
  • The client requests this host from a subdomain (foo) on a domain (example.org), where host is equal to generated string. The fully qualified domain name would be a1b2c3d4e5f6g7.foo.example.org in this examplar.
  • DNS resolution of the name would carry through to the Authoritative DNS server for the example.org domain, which decodes base64 and optionally verifies data validity (looks for predefined string or compares the checksum). If data validation is not turned on, it is assumed that data are correct.
    • If the data are correct, responds with “does not exist” or IP address of classic collector, depending on the embodiment (replacement vs augmentation).
    • If the data are incorrect, responds with bogus IP address (eg. 172.172.172.172)

 

In some embodiments the method can behave as a pass-through proxy or as an exploding proxy. In these cases, response to client is instant and Authoritative DNS server is communicating with Pixel collector(s) and/or Data Processing Service(s) after user receives the response.

  • The client would load the image response from Pixel Collectors, no IP address or fail instantly.
  • The Authoritative DNS server and optionally the pixel collector would then extract the transmitted data from their logs and pass the data to a service designated to collect and process metrics. It is important to note that an extension to this embodiment may have the ADNS device post metrics to more than one collector, thus behaving as an broker for multiple collecting agencies.

Embodiments

Let me walk you through a few embodiments of this idea might work through an example of sending the value “uniqueID=userID” for the “foobar” page to a pixel server at example.org. In other words, the web client would be loading “http://example.org/uc.gif?pageData=foobar” to transmit the “uniqID=UserID” cookie value to the collector.

As a replacement capability of the existing web metrics collectors

  • DNS resolution of the name would carry through to the Authoritative DNS server for the example.org domain, which decodes base64 and optionally verifies data validity (looks for predefined string or verifies checksum). If the optional data validation is turned off, it is assumed that the data are correct.
    • If the data are correct, responds with no IP address, eg “does not exist” message.
    • If the data are incorrect, responds with bogus IP address (eg. 172.172.172.172)
  • The client would fail in its attempt to load the resource in 2 possible ways – if the data validation was successful, no ip address, otherwise a bogus IP address. This would be an immediate failure, thereby freeing the client to continue processing other directives in the loaded page.
  • The Authoritative DNS server would then process the query logs to extract the transmitted data and pass the data to the analysis component of a web analytics service.

 

A flow chart of the above might look like this

As an augmented capability to the existing web metrics collectors

  • Combine “pageData=foobar” and “uniqID=userID” tokens into a single string, eg “pageData=foobar;uniqID=userID”.
  • Add predefined string (or checksum) into tokens string, eg “pageData=foobar;uniqID=userID;%DNS_COLLECTOR_V1%”.
  • Use base64 encoding to produce a string, eg “cGFnZURhdGE9Zm9vYmFyO3VuaXFJRD11c2VySUQ7JUROU19DT0xMRUNUT1JfVjEl”.
    If the data string is too long, it may be compressed using common compression techniques such as gzip, or bzip2.The client would request http://cGFnZURhdGE9Zm9vYmFyO3VuaXFJRD11c2VySUQ7JUROU19DT0xMRUNUT1JfVjEl.example.org/. However should the data be compressed the client would request http://cGFnZURhdGE9Zm9vYmFyO3VuaXFJRD11c2VySUQ7JUROU19DT0xMRUNUT1JfVjEl.compressed.example.org/.
  • DNS resolution of the name would carry through to the Authoritative DNS server for the example.org domain, which decodes base64 and optionally verifies data validity (looks for predefined string or verifies checksum). If the optional data validation is turned off, it is assumed that the data are correct.
      • If the data are correct, responds with no IP address, eg “does not exist” message.
      • If the data are incorrect, responds with bogus IP address (eg. 172.172.172.172)
  • The client would load the uc.gif resource thereby transmitting the metrics to the classic collector as well as the DNS-based collector or fail instantly when incorrect data sent.
  • The Authoritative DNS server would then process the query logs to extract the transmitted data and pass the data to the analysis component of a web analytics service in parallel to a classic HTTP-based collector’s metrics analysis.

A flow chart of the above might look like this

Further thoughts

Suppose that that the program resident of the Authoritative DNS server which reads the query logs extracts the data and performs POST back to a classic HTTP-based pixel collector. There would now be 2 POSTs for each metric, unless something happens. However, maybe there’s an additional component that expands the receipt and processing of this one signal to many collectors. By configuration the same process could notify a plurality of collectors from this one signal sent by the client.

 

 

 

 

One thought on “DNS-based web metrics collection mechanism

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s