In my last two blog entries I have shown you the result of creating virtual denizens based on actual census data. I feel compelled to give a little bit more details on the simulation process as the generation of virtual denizens is only a small cog in the bigger picture of that simulation.
Let us start with the beginning, as the census data is based on people living address, we want to generate random locations in an area of interest where the home of the virtual denizens will be located. In our case we are generating locations around the greater Montréal area. First step is easy enough, we generate uniformly distributed latitude and longitude in the area of interest. However, peoples are not evenly distributed, so we need a way to achieve proper population density distribution. To achieve that we first query the Google API in order to obtain the postal code associated to the random location (this also has the side effect of removing most of the locations that are inside water body… we will not complain!). We still need a second transformation for the location information in order to obtain the census tract (the smallest area for which census information is available and corresponding to approximately 5000 peoples). That transformation can be achieved via a web page query provided by the Canadian census bureau. We can now look up the census information and keep random locations according to the actual population density, dropping extra locations. From that process we generated about 11000 virtual denizens home locations. As we need to drop a good proportion of the generated random locations and as we are throttled by the Google API in our query rate, this process alone can take many days.
Next step if what you saw the result of. From the census data, we generate a random Denizen for each of the retained home locations. The Canadian census data is complemented with the Québec statistical bureau for the age pyramid information. The overall picture is decorated with information obtained from a data bank providing the most popular first names and a similar one providing the most popular last names. Again, the result looks like the following:
Maxim Lavoie a Female of 27 year old, born in Quebec.
- Residence is located around coordinates: [45.4872,-73.4226], postal code: J3Y 4Z1.
- Phone number: +15146589535
- Attended No certificate, diploma or degree.
- Is currently working full-time on a rotating or other shift in Health occupations for the Health care and social assistance industry and goes to an usual place for work.
- Usually work/attend school on: Mon Tue Wed Thu Fri from 05h30 to 12h45 (including commute).
- Has an income of $60,000 to $79,999.
- Has a regular activity/hobby performed around coordinates: [45.4925,-73.4375] on: Mon Wed Thu from 15h00 to 16h00 (including commute).
- Usually move around by means of Car, truck or van – as a driver for a commute distance of 10.3 km to a location around coordinates: [45.4760,-73.3308].
- Usually sleep from 20h00 to 04h00.
Following that we have the basic statistical information defining a denizen, we now need to simulate their daily life. Google API now become a major bottleneck as the throttling of direction information between two points would make our simulation take too much time. So in order to remove that bottleneck we created from Google Map two low fidelity transit maps: one for cars (mainly highways and major roads) and one for public transit such as subway and trains. These Maps, combined with a shortest path finding algorithm enables us to simulate virtual denizen’s location with sufficient fidelity. For example, our virtual denizens will follow a straight path to the closest entry point on the low fidelity highway map and will then proceed along that path to the exit point, the closest point to their destination. From there they will complete their travel in straight line to their final destination. Walking and Bicycling will follow straight lines as approximation as they are usually shorter travels. Finally speed on and off highways/public transit network is dependant on the mode of transportation and the time of the day to simulate rush hours.
Fig. 1: Low Fidelity Subway and Train (left) and Highway (right) network.
Now we have the virtual denizen’s daily life simulated. We know who they are, where they are at every moment being it be home, work, usual hobby place, groceries/restaurant/… or in transit between those places. This is where I am currently at. My next steps are to use the Transmission Sites data I found on the web site of Innovation, Science and Economic Development Canada to determine which cell site they are currently using based on their location, generate mobile usage behavior based on their current activities and then simulate QoS experienced by the virtual denizens and other mobile network usage information in order to get a proper simulation of mobile usage in a metropolitan area.
For us the journey will not stop there. In parallel we are creating information dashboards and applying machine learning algorithm in order to digest that simulated data and show compelling insights about our virtual denizens. Eventually we hope to demonstrate it with real mobile network data, but let’s take one step at a time!