Processing Files In Place With Groovy

I recently had the need to process a bunch of files and directory structures that were emitted from a generation process that I didn’t have control over.  The basic needs were to delete some files, delete some directories, and to modify selected content of some of the files.  This is a very straight forward, trivial thing to do but I was looking for a solution that was both cross platform and very quick to develop. I also had to deal with a bit of XML and wanted an easy way to parse and modify XML documents in a natural way.   This post shows how to do this using a Groovy script.  I’ve been using Groovy for testing of Java code and other one off random tasks for a couple years now but it still surprises me how fast you can accomplish certain tasks yet keep your code readable and at a relatively high level of abstraction, especially compared to shell scripts, sed, awk.

In a nutshell, the following method will process a file in place.

def processFileInplace(file, Closure processText) {
    def text = file.text
    file.write(processText(text))
}

If you know Groovy, this probably doesn’t require any additional explanation. But if you don’t know Groovy, don’t worry. The remainder of this post shows examples of using this method and touches on a few details regarding file and directory deletion.

Deleting Files and Directories

I hesitated even writing anything about deleting files and recursively deleting directory structures because it is so trivial and the Groovy documentation on Files has a huge number of examples. But that being said, I wanted the script to read as cleanly a possible for people maintaining it in the future that might not know Groovy. There are a couple of ways to recursively delete directories and the syntax isn’t consistent with how you delete a File. So I simply created two methods encapsulate these operations an make it read cleaner.

def deleteDirectory(directory){
    new AntBuilder().delete(dir: directory)
}
 
def deleteFile(file){
    file.delete()
}

Another, more Groovy-like (i.e. uses closures) option for recursively deleting directories is shown here.

Using those methods, file and directory deletion is consistent and clean.

basedir = ...
 
deleteDirectory(new File(basedir + ".settings"))
deleteFile(new File(basedir + ".project"))
deleteFile(new File(basedir + ".classpath"))

Processing Text Files In Place

Processing text files in place can be done with a simple method that takes the file to be modified and a closure that performs the modifications. This is the method I introduced above.

def processFileInplace(file, Closure processText) {
    def text = file.text
    file.write(processText(text))
}

The closure can be arbitrarily complex as long as it returns the String value of what you want the file contents to look like after modification. For example, you can use any of the basic Java or the expanded Groovy string methods and utilities.

projectName = 'My New Project'
overview = new File(basedir + "overview.txt")
processFileInplace(overview) { text ->
    text.replaceAll(/The Old Project/, '\\${projectName}')
}

You are by no means limited to a single operation within the closure.

projectName = 'My New Project'
def today = Calendar.getInstance()
def todayFormatted = String.format('%tY/%<tm/%<td', today)
 
howTo = new File(basedir + "how-to.txt")
processFileInplace(howTo) { text ->
    text = text.replaceAll(/The Old Project/, '\\${projectName}')
    text = text.replace(/<name>Legacy System 1982<\/name>/, '<name>New and Improved</name>')
    text.replace(/<date>1982/10/12<\/date>/, '<date>${todayFormatted}</date>')
}

Bring on XML

Unfortunately, XML can often be much more difficult to deal with than it should be. Groovy has some great utilities for simplifying XML processing. Details on these utilities can be found on the Groovy web site. I’ll combine a couple of these in the following examples.

Example XML Document

Below is an XML document that will be used in the examples.

<?xml version="1.0" encoding="UTF-8"?>
<CustomerManagement>
   <MetaData>
   </MetaData>
   <Customers>
      <Customer name="Kermit The Frog">
         <HelpDeskCalls>
            <Call Id="1">
               <Status Id="In Progress"/>
	    </Call>
	    <Call Id="2">
	        <Status Id="Completed"/>
            </Call>
	    <Call Id="3">
	         <Status Id="UnableToResolve"/>
	    </Call>
         </HelpDeskCalls>
      </Customer>
      <Customer name="Fozzie Bear">
         <HelpDeskCalls>
	     <Call Id="5">
	        <Status Id="Completed"/>
	     </Call>
         </HelpDeskCalls>
      </Customer>
   </Customers>
</CustomerManagement>

Removing Content From XML

Given the example XML document, say we wanted to remove all the HelpDeskCalls for Kermit the Frog where the Status is “UnableToResolve.”

customers = new File(basedir + "important-customers.xml")
processFileInplace(customers) { text ->
    customerManagement = new XmlSlurper().parseText(text)
    customer = customerManagement.Customer.Customer.find{ it.@name.text().contains('Kermit') }
    customer.HelpDeskCalls.Call.findAll{ it.Status.@Id.text().equals('UnableToResolve')}.replaceNode{}
    serializeXml(customerManagement)
}

So what are each of the lines in the closure doing?

  1. Parses the text in the file using an XmlSlurper
  2. Finds all Customers whose names contain Kermit.
  3. Removes Call elements from Kermit where the Status of the Call is equal to UnableToResolve.
  4. Calls a method that serializes the XML as a string.

The second and third lines are using GPath to query the XML. GPath provides a consistent expression language over both Groovy/Java POJO’s and XML. The serialzeXML() method is a short, custom method to turn the GPathResult created by the XmlSlurper back into a String. This method requires that you import a few

import groovy.xml.XmlUtil
import groovy.xml.StreamingMarkupBuilder
import groovy.util.slurpersupport.GPathResult
 
// ...
 
def String serializeXml(GPathResult xml){
    XmlUtil.serialize(new StreamingMarkupBuilder().bind {
        mkp.yield xml
      } )
}

Refer to the link above on Groovy XML Processing for more details on this. It uses StreamingMarkupBuilder and XmlUtils to turn the XML back into a String.

Adding Content To XML

Say you wanted to add some CustomerManagers to the MetaData section of our example XML file. You can do that again using the XmlSlurper and our processFileInPlace method.

customers = new File(basedir + "important-customers.xml")
processFileInplace(customers) { text ->
    customerManagement = new XmlSlurper().parseText(text)
    customerManagement.MetaData.appendNode{
        CustomerManagers{
            Manager(Name: "Animal")
            Manager(Name: "Swedish Chef")
            Manager(Name: "Gonzo")
        }
    }
    serializeXml(customerManagement)
}

The above results in the important-customers.xml file now containing

<?xml version="1.0" encoding="UTF-8"?>
<CustomerManagement>
  <MetaData>
    <CustomerManagers>
      <Manager Name="Animal"/>
      <Manager Name="Swedish Chef"/>
      <Manager Name="Gonzo"/>
    </CustomerManagers>
  </MetaData>
  <Customers>
    <Customer name="Kermit The Frog">
      <HelpDeskCalls>
        <Call Id="1">
          <Status Id="In Progress"/>
        </Call>
        <Call Id="2">
          <Status Id="Completed"/>
        </Call>
        <Call Id="3">
          <Status Id="UnableToResolve"/>
        </Call>
      </HelpDeskCalls>
    </Customer>
    <Customer name="John Doe">
      <HelpDeskCalls>
        <Call Id="5">
          <Status Id="Completed"/>
        </Call>
      </HelpDeskCalls>
    </Customer>
  </Customers>
</CustomerManagement>

A Full Example

The above examples can be combined into a full script. Obviously you can structure this to your needs. The script can be executed directly from the command line.

./processFiles.groovy
#!/usr/bin/env groovy
 
import groovy.xml.XmlUtil
import groovy.xml.StreamingMarkupBuilder
import groovy.util.slurpersupport.GPathResult
 
if(args.length < 1){
    println "You must provide a base directory as an argument to the script."
    System.exit(1)
}
 
basedir = args[0] + "/"
println "Current working path: " + new File(".").getAbsolutePath()
 
def processFileInplace(file, Closure processText) {
    def text = file.text
    file.write(processText(text))
}
 
def deleteDirectory(directory){
    new AntBuilder().delete(dir: directory)
}
 
def deleteFile(file){
    file.delete()
}
 
def String serializeXml(GPathResult xml){
    XmlUtil.serialize(new StreamingMarkupBuilder().bind {
        mkp.yield xml
      } )
}
 
projectName = 'My New Project'
def today = Calendar.getInstance()
def todayFormatted = String.format('%tY/%<tm/%<td', today)
 
deleteDirectory(new File(basedir + ".settings"))
deleteFile(new File(basedir + ".project"))
deleteFile(new File(basedir + ".classpath"))
 
overview = new File(basedir + "overview.txt")
processFileInplace(overview) { text ->
    text.replaceAll(/The Old Project/, '\\${projectName}')
}
 
howTo = new File(basedir + "how-to.txt")
processFileInplace(howTo) { text ->
    text = text.replaceAll(/The Old Project/, '\\${projectName}')
    text = text.replace(/<name>Legacy System 1982<\/name>/, '<name>New and Improved</name>')
    text.replace(/<date>1982/10/12<\/date>/, '<date>${todayFormatted}</date>')
}
 
// Delete anything that makes our call center people look bad  :-0
customers = new File(basedir + "important-customers.xml")
processFileInplace(customers) { text ->
    customerManagement = new XmlSlurper().parseText(text)
    customer = customerManagement.Customer.Customer.find{ it.@name.text().contains('Kermit') }
    customer.HelpDeskCalls.Call.findAll{ it.Status.@Id.text().equals('UnableToResolve')}.replaceNode{}
    serializeXml(customerManagement)
}
 
// Add information on the CustomerManagers
customers = new File(basedir + "important-customers.xml")
processFileInplace(customers) { text ->
    customerManagement = new XmlSlurper().parseText(text)
    customerManagement.MetaData.appendNode{
        CustomerManagers{
            Manager(Name: "Animal")
            Manager(Name: "Swedish Chef")
            Manager(Name: "Gonzo")
        }
    }
    serializeXml(customerManagement)
}

Conclusion

Processing files in place using Groovy is both easy and powerful. A single two-line method provides the basis for modifying the content of a file in any way you may need.


2,026 Responses to “Processing Files In Place With Groovy”

Leave a Reply