Challenges faced in heterogeneous traffi c data collection: a comparison of traffi c data collection technologies

This article is published under the Creative Commons CC-BY-ND License (http://creativecommons.org/licenses/by-nd/4.0/). This license permits use, distribution and reproduction, commercial and non-commercial, provided that the original work is properly cited and is not changed in anyway. Abstract: Traffi c data are the fundamental inputs to traffi c fl ow analysis and simulation studies, which facilitate decision making in the fi eld of traffi c engineering. Hence, the accuracy of traffi c data is of paramount importance. This study compares new technologies available for traffi c data collection considering their accuracy and applicability in the Sri Lankan context. Traffi c in Sri Lanka is of heterogeneous nature, as opposed to the homogeneous nature observed in most developed countries. Hence, collection of traffi c data poses several challenges that aff ects its accuracy. Three techniques, the infrared driven TIRTL instrument, the video image processing-based TRAZER application and the traffi c data collection method using the Google distance matrix application programming interface (API), with respect to their data collection accuracy are reviewed in this study. The fundamental macroscopic traffi c data variables (fl ow and speed) were evaluated against control surveys. It was found that each technology has its strengths and weaknesses and needs to be used appropriately. The TIRTL instrument fared better on road sections on level terrain when the crossfall did not obstruct the infrared beams. Such occasions provided a rich set of microscopic traffi c data. The TRAZER software delivered data up to a 100 % accuracy. However, this required the user to go through a lengthy postprocessing routine to extract the fi nal set of traffi c data. Google traffi c data collection provides highly accurate results when estimating link speeds. This method is ideal for collection of bulk data with spatio-temporal variations and the process can be fully automated to reduce the human resource requirement.


INTRODUCTION
Traffi c engineering theories are mainly developed for homogeneous traffi c conditions where operating speeds are constant, driver behaviour is uniform, and the sizes of the vehicles do not vary. However, the actual nature of traffi c on roads diff er from this. Vehicles with diff erent dimensions operate at non-uniform speeds while depicting diverse driver behaviour resulting in heterogeneous traffi c fl ows. There are variations in headways, lateral spacings, and acceleration/deceleration rates. Furthermore, vehicles possess many other diverse operating characteristics. Therefore, when collecting fi eld data for such heterogeneous traffi c fl ows, the use of accurate and appropriate data collection methods is exceedingly important.
Traffi c surveys generally provide the necessary empirical data for traffi c analyses used in transport planning, traffi c management, safety studies, etc. Depending on their application, several types of traffi c data are collected in the fi eld. The fundamental types of macroscopic traffi c data are speed (km/h), fl ow (veh/h), density (veh/km), fl ow direction, number of turning movements, queue length, vehicle class, occupancy, headway and presence (Versavel, 2007). These data along with the geometric characteristics of a given section provide the necessary information to address September 2020 Journal of the National Science Foundation of Sri Lanka 48 (3) typical traffi c engineering problems. Microscopic traffi c data include information such as the dynamics of individual vehicles and the manner in which they interact with adjacent vehicles in the traffi c stream. Some of these microscopic parameters are individual vehicle speed, headway, position and characteristics of vehicles such as class, type, length, height, weight, length of the wheelbase and axle count (Versavel, 2007).
Both manual and automated traffi c data collection methods are available at present for the purpose of data collection. Manual traffi c data collection is the oldest as well as the most basic method currently in practice. This is usually carried out by employing enumerators to collect the relevant traffi c data. This method is still useful at present since automated methods cannot accurately gather some data types such as vehicle occupancy, vehicle classifi cation and pedestrian details (Leduc, 2008). However, the manual data collection method is neither cost eff ective nor reliable.
With the increase in transport infrastructure development projects in Sri Lanka, the need for accurate traffi c data for feasibility studies, traffi c forecasting studies, etc., is pertinent. At present, the authorities collect data using conventional manual methods which are time-consuming and prone to producing errors unless stringent data quality control measures are in place. The use of pneumatic tubes can be observed on some roads maintained by the Road Development Authority of Sri Lanka. Even so, the authorities are still in the process of fi nding and introducing alternative traffi c data collection methods to increase the effi ciency of project planning and designing. Although there are several methods available to cater to the above need, the responsible authorities have not yet studied the viability of such methods in the local context (Jayaratne et al., 2016). This study is focused on conducting a review on the traffi c data collection technologies and challenges faced in heterogeneous traffi c data collection.

Modern traffi c data collection methods
Automated data collection methods are developed based on various scientifi c principles and phenomena. Timelapse photography was used by Chari and Badrinath (1983) to take aerial photographs of traffi c streams to estimate space-mean speeds. Several studies have used the simple videography technique where the videos are manually analysed later to collect traffi c data (Nagaraj et al., 1990;Kumar, 1994;Singh, 1999;Chandra, 2004). The Inductive Loop Detector (ILD) is another type of non-intrusive traffi c sensor, which is installed underneath the road pavement for traffi c data collection. Piezoelectric sensor-based instruments (e.g. pneumatic tubes) convert mechanical energy into electrical energy, measuring the speeds and weights of vehicles (Swann, 2010). The Doppler principle is utilised in microwave radar-based instruments while infrared technology is used in other instruments for vehicle detection. Further, image processing which is a rapidly developing technology, is used to extract traffi c data from videos captured at roadside locations (Leduc, 2008).
All aforementioned automated data collection methods have their advantages and disadvantages in the context of the present level of technological development. These instruments are vulnerable to the changes in the environment such as lighting conditions, weather, traffi c conditions, obstructions, etc. Therefore, the reliability of these instruments is not 100 %. This can be observed by comparing control manual counts with the automated counts. Videography as a method for data collection is popular as the data can be visually observed later, which is useful for quality control purposes. Further, the cost incurred in this method is low compared to the other methods and has low human resource requirement. However, extraction and analysis of traffi c data from a video is a time-consuming process. Hence, software programs such as TRAIS, COUNTcam, Traffi cVision, TRAZER, MediaTD, Picomixer STA, etc., have been developed to automate this process. These software programmes primarily use image processing techniques for data analysis with the facility to verify outputs manually if required (Kalaanidhia et al., 2015). Infrared based data collection is another commonly used method in automated traffi c counters. The principle behind this system is the intervention of infrared beams. When a vehicle passes by and obstructs the infrared rays, it detects and counts the vehicle. This method has a vast number of capabilities based on how the technology is used including the ability to measure the speed, length and lateral placement of a vehicle.
In this research, the TRAZER software, TIRTL (the infrared traffi c logger) instrument and the Google distance matrix application programming interface (API) based technique are tested for their suitability for data collection on Sri Lankan roads. TRAZER is a videobased software that provides traffi c fl ow and speed data. TIRTL is an infrared (IR) based instrument that provides a wide array of traffi c data including fl ow and speed. The Google Distance Matrix API is used in this study to collect traffi c stream speed data using a method developed by Kumarage et al. (2017).

Journal of the National Science Foundation of Sri Lanka 48(3)
September 2020

Field study locations
To evaluate the three selected automated data collection methods, traffi c surveys were conducted at three locations using each of the respective methods, and in addition a verifi ed manual traffi c count was carried out as the control study. The outputs of TRAZER and Google Distance Matrix API do not depend on the road cross-section geometry as TIRTL does. Therefore, those two methods were only used at a single location. The analysis was conducted by comparing the manually collected control dataset with the data collected from the alternative techniques. A summary of the survey locations is shown in Table 1.
Since the most reliable method to obtain an accurate control sample is by analysing visual evidence, videos were recorded at all survey locations ( Figure 1) along with the automated methods.  Location P1

TIRTL instrument
The TIRTL instrument consists of two units, the transmitter and the receiver. Each has to be placed on either side of the road, next to the edge of the road carriageway. IR beams which are transmitted between these two units at tyre level are used for vehicle detection. TIRTL has the ability to classify vehicles into fi fteen categories (Kalaanidhia et al., 2015). They are as follows; bicycles, cycle rickshaws, two-wheelers, threewheelers, tractors, tractors with trailers, SCV (2 axle small commercial vehicles), LMV (2 axle light motor vehicles), LCV (2 axle light commercial vehicles), MCV (medium commercial vehicle; includes 2 axle rigid truck and bus), HCV (heavy commercial vehicle; includes 3 axle rigid truck, articulated truck and bus), MAV (multi axle vehicle, includes rigid truck and articulated truck) and OSV (oversized vehicle).
In a study in 2010, Shou et al. compared the classifi cation capability of vehicles of the TIRTL instrument under diff erent weather conditions. It was observed that in clear weather conditions, fog, snow and rain, the TIRTL vehicle counts agreed very well with the actual counts, although during thunderstorms, the TIRTL instrument underestimated the number of vehicles.

September 2020
Journal of the National Science Foundation of Sri Lanka 48(3) Figure 3: Loci of infrared beams-TIRTL instrument Further, it was detected that the accuracy of counts was not equally distributed among diff erent vehicle classes.
In this study, the TIRTL instrument was tested during sunny conditions at all three test locations P1, P2, and P3. The TIRTL instrument has to be set-up in such a manner that the IR beams are located approximately 60 mm above the road surface with a tolerance of -25 mm to +35 mm. This can be observed in Figure 2. Since various types of roads are available in Sri Lanka, the three test locations were selected in such a way that it test the instrument's accuracy over varied road geometries. The road geometries of the locations P1, P2, and P3 are as listed below and illustrated in Figure 3 (not to scale).

TRAZER
The TRAZER software uses image processing techniques on videos of traffi c fl ows to collect speed and fl ow data. The videos to be processed through the software should be recorded parallel to the road and aligned to the centre of the lane/lanes with the vehicles moving towards the camera. The version of TRAZER software used for this research provides the user with the facility to detect 4 vehicle categories: namely, light moving vehicles (LMV), heavy moving vehicles (HMV), three-wheelers (3W) and two-wheelers (2W). HMVs can be classifi ed further as buses (BUS) and trucks (TRUCK) manually through the software interface. Extracting fl ow and speed data using the TRAZER software is a four-step process, which includes a manual component where the user has to review the automatically identifi ed vehicles. Through this process, the fi nal accuracy of fl ow data can be elevated to 100 %. Mallikarjuna et al. (2009) used the TRAZER software to collect classifi ed traffi c volume, average occupancy, and average speeds. They observed that the detection accuracy was dependent upon the placement of the video camera with respect to the road. If the camera position deviates from the central lane, the detection accuracy decreases. Hence, the TRAZER software was tested at location P1 ( Figure 1) where setting up the camera at the centre of the road was achievable.

Google distance matrix API
The Google Distance Matrix Application Programming Interfaces (APIs) facilitate traffi c data collection such as travel distance and travel time for a matrix of origins and destinations. The API calls return the requested information based on the inputs given such as start and end Journal of the National Science Foundation of Sri Lanka 48 (3) September 2020

RESULTS AND DISCUSSION
The accuracy of classifi ed vehicle fl ow counts by the TIRTL and TRAZER software is discussed in this section. The errors in the automated methods are calculated using equation 1. ...(1)

Flow analysis -TIRTL instrument
As can be observed from  As can be observed from Table 3, the errors in lanes 1 and 4 (outer lanes) are higher compared to those of the inner lanes. This is because the gap between the TIRTL instrument's IR beam and the road surface is higher than the recommended range. A similar issue was encountered at location P3 (two-lane road with normal -2.5 % crossfall), but due to the shorter carriageway width (7 m) the vertical rise of the road is lesser. Hence, an error of only -4 % is observed in the results at this location. On the other hand, the error in the vehicle count estimate was minimum (-2 %) at location P2 since there was no cross-fall at that section.
The TIRTL counts were plotted against the actual counts for locations P2 (Figure 4) and P3. The R 2 values of 0.96 and 0.98 obtained for the two respective locations indicate that the TIRTL instrument accurately estimates the fl ow values of Sri Lankan traffi c on two-lane roads and multi-lane roads at super-elevated sections.

September 2020
Journal of the National Science Foundation of Sri Lanka 48 (3) At location P1, the R 2 value was 0.85, indicating a lower accuracy in the predicted count. It was observed that motorcycles were the least captured vehicle category by the instrument. Only 67 % of the motorcycles were recognised, whereas 92 % of the other vehicle categories were identifi ed. Consequently, it was attempted to build a model taking into account the percentage of motorcycles, along with the total TIRTL count and lane fl ow to predict the actual fl ow. However, no statistically signifi cant relationship was found in the data sample considered.
Accordingly, it was established that the TIRTL instrument is not suitable to be used to estimate fl ow data on multi-lane roads with normal crossfalls. Alternatively, one unit of the TIRTL set up may be placed on the centre of the road to capture fl ow data on one direction of a multi-lane road. However, this is bound to cause disruptions to the traffi c fl ow.

TRAZER software
As shown in Table 4, a total of 3,529 vehicles were analysed at location P1 using the TRAZER software. The analysis procedure of TRAZER has 4 main steps.
Step 1: Inputting geometric and vehicle class dimensions to the software and processing the video.
Step 2: Reviewing the automatically identifi ed vehicles and deleting false vehicle recognitions.
Step 3: Reviewing the identifi ed vehicles and confi rming/classifying vehicles. In this step, vehicles that are identifi ed but are in the wrong category are moved to the correct one.
Step 4: Adding the unidentifi ed vehicles by reviewing the video manually using TRAZER software.
As seen in Table 4, the estimate provided by the TRAZER software after step 1 is incorrect by a margin of 843 vehicles. The error was calculated using equation 1 substituting counts from steps 1 to 4 to 'x'. An error of + 24 % was observed after step 1. Through further analysis, it was observed that LMV and 2W categories were overestimated by the software, whereas 3W and HMV categories were underestimated. Out of the 2W count of 1303, only 579 were accurate identifi cations.
The remaining 724 were either false positives or incorrect classifi cations. This is a major factor that aff ects the initial estimate of vehicles. It was observed that vehicle side mirrors are identifi ed by the software as 2W's leading to this error. This observation is shown in Figure 5.
Once steps 2 and 3 (deletion and reclassifi cation) were completed, the total vehicle count estimated by the software was found to be 20 % less than the actual value. The HMV and 2W categories were incorrect by a margin of -48 % and -27 %, respectively. This shows that the software is less capable of identifying vehicles with irregular dimensions (large and small). This is observed within the HMV category where only 37 % of buses were identifi ed as opposed to the 67 % of trucks. The reason a higher percentage of trucks were identifi ed is because medium sized trucks were misidentifi ed by the software as 'LMV's. The estimate of LMV's were at an acceptable level of 87 %.
The fi nal step is the addition of unidentifi ed vehicles manually. This is a tedious and time-consuming process as the video needs to be analysed frame by frame to detect vehicles that have not been identifi ed by the software. However, at the end of this process 100 % accuracy can be achieved.

Category Precision
Step 1: Process Step 2  Where, Y = no. of vehicles per min after step 1, step 2 N = no. of 1-minute intervals per category (N ≥ 20) From the results in Table 5 it is observed that precision of the software is not aff ected by the rate of fl ow.

TIRTL speed analysis
Speed data collected through the instrument were compared with speed data computed manually. The manual speed data calculation was carried out by analysing the video and calculating the time taken by vehicles to traverse a known distance. A sample of 177 vehicles were selected for the speed survey ranging between the speeds 81km/h and 12 km/h. The mean absolute error (MAE) of the data was 3.47 and the root mean square error (RMSE) was 4.65 while the mean absolute percentage error (MAPE) was 8.9 % (< 10 %), which are acceptable values denoting that the instrument was able to capture the speeds of individual vehicles with high accuracy. Figure 7 depicts the absolute percentage error of each data point in ascending order with MAPE drawn for reference.

TRAZER speed analysis
Traffi c speed data of a group of 60 vehicles were collected by analysing the captured videos using TRAZER software. The data were compared with the corresponding actual speed data to evaluate the accuracy of the outputs of TRAZER using a similar methodology as used in the TIRTL speed analysis. The speed range of the surveyed vehicles were between 50 km/h and 19km/h. The MAE was 2.57 and the root RMSE was 3.31 while the MAPE was 2.6 %. According to the results of the study it is observed that the TRAZER software predicts the speeds of vehicles at a higher accuracy than the TIRTL instrument. Figure 8 depicts the absolute percentage error of each data point in ascending order with MAPE drawn for reference.

Google Maps -distance matrix API
Google Distance Matrix API provides traffi c stream speeds as opposed to the individual vehicle speeds obtained through the TRAZER software and TIRTL. One minute stream speed data were collected through the programme and combined to fi ve minute average traffi c stream speed counts for a period of two hours and compared with the actual average stream speeds.
Considering the statistical data, it was observed that the speeds predicted are of high accuracy given that the MAE was 0.87, RMSE was 0.97 and the MAPE was 1.7 %.
The ability to evaluate individual vehicle speeds is not available through this method because Google distance matrix provides only the traffi c stream speeds and not individual vehicle speeds. Since traffi c stream speed is the parameter that is predominantly used in traffi c engineering applications, this method can be successfully employed for data collection. A limitation in this study is that the speed values observed had a small spread (47-53 km/h) and the fl ow volume of vehicles during the study was between low to moderate (maximum directional fl ow 2470 veh/h). Nevertheless, it is understood that Google Distance Matrix API data can be used to estimate link speed at high accuracy and can be used as a substitute to conventional speed calculation methods.

Comparison of speed estimation capability of TIRTL and TRAZER methods
In the comparison of the two automated speed detection methods at location P1, it was observed that TRAZER had a comparatively smaller spread in error in speed detection. This can be seen in Figure 9. The TIRTL instrument has a RMSE of 4.65 whereas TRAZER has a RMSE of 3.31. Hence, the TRAZER software has better precision in measuring speeds between the two methods. Since the Google distance matrix API speed data were collected at a diff erent location it was not considered in this comparison.
Journal of the National Science Foundation of Sri Lanka 48 (3) September 2020  user with a comprehensive set of data with more vehicle classifi cation, as well as macroscopic data such as headway spacing and lateral placement. The instrument can be set up in road sections where the road camber does not interfere with the cross beams of the instrument. The TRAZER software on the other hand provides fl ow and speed data. A disadvantage of using TRAZER software is that it requires the video to be captured from a high elevation parallel to the roadway for processing purposes. This limits the software's usability as it is typically inconvenient to set up the camera at higher elevations at the centre of the road. Further, the processing of the software is a time-consuming venture albeit producing a result of 100% accuracy in terms of the vehicle count.
In conclusion, the TIRTL instrument was found to be more practical to use and provides the user with a more comprehensive set of data.
Google Distance Matrix API is a convenient way of collecting traffi c data. Data collection can be automated, and it is suitable for studies that require collection of data with a high frequency at multiple locations. Google travel times are estimated with both historical and realtime data, and therefore, the accuracy is high as observed procedure. This is one of the major drawbacks of the software. When considering the traffi c variables presented by the three methods, the TIRTL instrument gives a rich set of data including speed, fl ow, vehicle classifi cation and headway. The TRAZER software provides speed, fl ow and vehicle classifi cation, whereas Google Traffi c data provides only the traffi c stream speed. If Google Traffi c data is to be used to collect fundamental traffi c data, manual fl ow counts will have to be carried out along with the speed survey (approximately 60 man-hours: 5 enumerators). When comparing this with the other two methods, it is observed that using Google Traffi c data requires a higher investment of labour in cases where all traffi c data types are required.

CONCLUSIONS
This study incorporates a few of the latest technologies available in the fi eld of traffi c data collection in order to test their applicability to the heterogeneous traffi c conditions observed in Sri Lanka. It was observed that these technologies provided satisfactory results with a few exceptions. The TIRTL instrument provides the

Comparative analysis of data collection methods
A comparison of primary traffi c data that can be collected and the typical number of man-hours required for the three methods used in the study are shown in Table 6. The TIRTL instrument requires two individuals to position it and the process takes approximately 0.5 hours. A similar time period is required to set up the video camera to collect video data for the TRAZER software. The setting up of the script to collect traffi c data using Google Distance Matrix API requires less than 0.5 man-hours. No post-processing is required in the TIRTL instrument and Google Traffi c Data to extract the traffi c data but the TRAZER software has a time-consuming analysis Journal of the National Science Foundation of Sri Lanka 48 (3) September 2020 in this study. When compared with manually collected data, it showed an acceptable level of compliance which suggests that this method is suitable to be adopted in transportation engineering related research. Further studies should be carried out by changing traffi c, geometric and other variables such as composition, fl ow, weather conditions, road conditions, network coverage and driver behaviour to evaluate the accuracy and reliability of the Google Distance Matrix API for mining data for transportation engineering related studies.
Overall, from this study, it is understood that technologies such as TIRTL, TRAZER and Google distance matrix APIs, when applied appropriately can be used to reliably collect traffi c data in Sri Lankan conditions.