Utilizing Python to Aggregate Enterprise Geodatabases
October 27, 2010 1 Comment
Background
The Navy’s Regional Shore Installation Management System (RSIMS) program manages approximately 35,000 feature classes spread across 12 regions and needed a way to serve up data easily with ArcGIS server. Each region manages multiple schemas within the regional SDE geodatabase representing different geographic areas within their regions. This allows for the customization of data structure and spatial reference to best fit the data for that area. The challenge was to aggregate each regional schema to a single schema for the region as well as to project the data into Web Mercator Auxiliary Sphere, which allows for integration with external web services such as ArcGIS Online.
ArcGIS has the capability to utilize database replication which replicates data from a child geodatabase to a parent geodatabase at the feature level. This solution however requires a tremendous amount of implementation planning and isn’t possible if the data is not versioned or if the version allows the edits to move to the base tables within ArcSDE. GISi has developed a solution that performs the same function as database replication, but is not dependent on versioning and utilizes only standard ArcGIS geoprocessing tools.
Setup
The first was to create the destination schemas in the publishing database along with staging file geodatabases to house the data before being published to the publishing database. The reason for staging geodatabases was that they allow the scripts to do the projection and aggregation of the data without requiring ArcGIS Server Object Manager (SOM) service to be stopped until the data was processing was complete. Because users of the Navy RSIMS web application are worldwide, it was important to reduce the down time of the services as much as possible. By staging the data first, the system could process all 35,000 feature classes and only cause about a 30 – 60 minute downtime of the web application per night.
Example Regional Database Implementation
Once the publishing database was set up for each region a batch file was created and scheduled to run nightly from Windows task scheduler on a machine that has ArcGIS desktop installed. This is to allow the python scripts to access all the required geoprocessing tools to project and aggregate the data.
Process
The first python script run within the batch file compares each schema within the editing SDE database to their respective staging file geodatabases. It does this by reading a schema text file which has all of the schemas to be checked along with the path to the .sde connection file for the editing SDE database and the path to the staging geodatabase. Looping through the text file the script connects to the SDE database and loops through the feature datasets using gp.ListDatasets. For each dataset the script executes gp.ParseTableName to parse out the owner of the dataset. If the owner matches the schema in the text file it runs gp.Exists on the dataset to see if it exists in the staging geodatabase. If it does exist the script runs gp.ListFeatureClasses on the dataset and loops through each of the feature classes. It then runs gp.Exists again using the feature class to see if it exists in the staging geodatabase. If any new feature datasets or feature classes are found the script creates a log of new feature classes, copies the new datasets or feature classes to the staging geodatabase using gp.Copy_management and emails the RSIMS support team that there were new feature classes added by the region. The email notification uses the smtplib module that comes standard with python. By notifying the support team they can make the necessary changes to the publishing SDE database to allow the data to be aggregated on future runs of the script.
Once the schemas match between the editing SDE database and the staging geodatabases, another python script is executed within the batch file that compares each feature class. This is done by looping though the schemas in the schemas text file then connecting to the editing SDE database for each schema. Once connected it uses gp.ListDatasets to loop through all of the datasets in the database. It again uses gp.ParseTableName to compare the data owner to the schema name in the text file. If they match the script uses gp.ListFeatureClasses to loop through all of the feature classes within the dataset. For each feature class, gp.FeatureCompare_management is used to compare the features in the editing SDE feature class to the features in the staging geodatabase feature class. The output of Feature Compare is a comma delimited text file which is then read by the script to see if there are any feature additions/deletions, attribute changes, or shape changes. If a feature class has been flagged as changed it is logged in a changed feature class text file update python script.
The update script runs through the list of changed feature classes and replaces the feature classes in the staging geodatabases so that the staging databases exactly match the editing SDE database. It does this by executing gp.Delete_management to delete the feature class from the staging geodatabase and then gp.Copy_management to copy the feature class from the editing SDE database to the staging geodatabase. It then compiles a regional changed feature class text file. If a feature class in one schema changes for the region all of the same feature classes need to be projected and aggregated back into the regional staging geodatabase. The script then loops through that text file and finds the schema for that region in the schemas text file. For each schema within the region, the script projects them from their native spatial reference into Web Mercator Auxiliary Sphere and loads the data into the regional staging database. The schema text file contains the geographic transformation needed to project the data from the native spatial reference to Web Mercator Auxillary Sphere. gp.Describe(dataset).SpatialReference is used to get the source projection. gp.Project_managment is then executed on the feature class using the source projection and the geographic transformation for the schema to project the feature class into Web Mercator Auxillary Sphere. The features are cleared from the regional staging geodatabase using the gp.DeleteFeatures_management tool and then the projected feature class is loaded into the staging regional database using gp.Append_management.
When the update of the regional staging databases is complete, the batch file executes the service control (sc) command on the ArcGIS Server Object Manager (SOM) service to stop the service. This unlocks all of the data in the publishing SDE database so that the data can be updated.
Once the SOM has been stopped, another python script is executed to loop through the changed feature class text file. For each changed feature class, it loops through a region text file to determine the path to the .sde connection file for the region in the publishing SDE database. Once it locates the correct connection file, it connects to the database and loops though the datasets within the region to find the correct dataset using gp.ListDatasets. Once the dataset is found, it loops through the feature classes using gp.ListFeatureClasses. When the feature class is found the script executes gp.Delete_management to delete the feature class, then runs gp.Copy_management to copy the changed feature class from the regional staging geodatabase to the publishing SDE database.
After the python script makes the updates to the publishing SDE database the batch file then executes the sc command to start the SOM service. The overall time to run the batch takes up to 6 hours to compare all of the feature classes, but depending on the number of changed feature classes, only impacts the the ArcGIS Server map services for approximately 30 minutes.
Conclusion
In summary, an overall batch file is run which executes python scripts and stops and starts the SOM. The python scripts read and write to various log files and loop through schemas, datasets, and feature classes to determine any changes and update the publishing database with those changed feature classes. Finally, most of the heavy lifting is done in staging geodatabases to reduce the amount of downtime for web application. This method of aggregating and projecting geodatabases can be scaled to meet the needs of any size enterprise GIS system.
Please feel free to discuss this topic in the comments section or contact me directly at nlebel@gisinc.com

Great article..! ESRI did great to choose python.