Predictive Analytics for Electric SmartMeter data
Our goal was to create a data integration process and model that can accurately predict kWh usage at scale.
The data comes from a study conducted by the University of Massachusetts (UMASS), and includes kWh usage data by minute from 114 Smartmeters for 114 apartments. Each Smartmeter exhausts approximately 500,000 readings per year, and there are approximately 65 million readings in total. Weather readings for the same time period were also provided.
After some experimentation we determined that predicting the kWh usage by apartment wasn’t very accurate with the data provided. However, we expected there were clusters of electric usage (e.g. setting the thermostat to 65 versus 70 in the Winter). As a result, our plan was to cluster each apartment into one of six clusters.
We created IQR statistics for the kWh and outside temperature reading for each apartment over the entire period of the data. The apartments were in the same vicinity so the outside temperature readings were the same for each apartment. We also used a “Feels Like” temperature to align closer to usage behavior. With the clustered IQR statistics, we created a Lookup File (enrichment) for each apartment containing the apartment number and the usage cluster number.
We then enriched the 65 million records with the Usage data and calculated average kWh and average Temperature by week by Usage cluster. Finally we fed the Usage cluster statistics into a variety of algoriths. Random Forest Regressor and Decision Tree Regressor machine learning algorithms yielded nearly perfect results. Using this information an electricity provider could predict kWh usage based on projected temperatures.
Practically speaking when predicting usage for larger groups like an entire metropolitan area, there would be a need for more clusters to represent the different usage behaviors. The process required for a production implementation would require each part of the prediction process to run independently. As a result, we built our solution with that in mind, and the solution could be scaled to a very large, real time production deployment.
- To begin we cleaned up the data, and created a single file that included all the data for all the apartments for time period. We then enriched the base kWh dataset by adding Weather data to each record.
- We then created an average weekly kWh and average weekly “Feels LIke” temperature by Apartment, for all 65 million records, and exported the result to an intermediate CSV file for later use.
- We then turned to clustering. We began by creating detailed IQR (Inter Quartile Range) statistics including: min, max, median, p25, p75, IQR, Lower Whisker, and Upper Whisker (see below) for both kWh and “Feels Like” temperature by Apartment, and output the results into a CSV file for later use.
- Using the CSV file from step 3, we fit a KMeans model using all the IQR statistics for both kWh and Temperature. We asked 6 clusters (k=6) and placed the cluster number in a field called, Usage. Finally we output the results to a Lookup file with the OutputLookup command.
- Now we turned back to the statistical data from Step 2. We enriched the data with the Usage data from step 4. Now with Average kWh and Average Temperature. We then averaged the kWh and Temperature by Usage cluster. Saving the results to another CSV for later use. This approach of saving intermediate output allows each part of the process for a large data enviornment like Smartmeters to be run in real time independently, keeping the models updated.
- Now with the fully enriched Usage cluster data, we Fit algorithms with the data by Usage cluster including: Linear Regression, Random Forest Regressor, Lasso, Kernel Ridge, ElasticNet, Ridge, and Decision Tree Regressor. This creates 36 models, 6 models for each of 6 clusters.
- We then applied the models to the Data from Step 5, creating a dashboard to view all the results. You can access the live dashboard by clicking HERE the login for the site is username: guest password: guest.
You must log in and be a buyer of this download to submit a review.
Leave a reply Cancel reply
Foxpatches from the Same Maker
AI for prescribing treatments for marketing and operations for a consumer service.
A step-by-step tutorial to finish a Business Intelligence solution includes data and partially constructed Foxpatches.
A comprehensive set of algorithms that analyzes the CDC NAMCS Emergency Room Patient Survey for 2013.
Analyzing the nature of crime in Chicago
An analysis of Tweets related to politics