DataMagic Technical Column Vol.12

<String replacement>

Introduction

Corporate business systems are often built independently, resulting in a wide variety of data. Even data with the same meaning can be interpreted as completely different data in a system if it's expressed differently. For example, in a customer master database, the same customer's name may be duplicated in the master database. A customer named "ABC Co., Ltd." (tentative name) may be registered in the master database with different names, such as "ABC Co., Ltd.", "ABC Co., Ltd.", "ABC Co., Ltd.", and "ABC Marunouchi Branch." While this type of duplicate data may be registered without issue in individual systems, as data duplication progresses, important data can become "unusable" and require reorganization to make it "usable."

DataMagic makes it easy to cleanse such duplicate data, so you can replace customer data that contains duplicates of not only company names but also phone numbers and addresses with a single piece of data.

This TIPS will introduce examples of removing spaces from name and address information, changing the format of "1-chome 2-banchi 3-go" to "1-2-3", and completing values such as "NULL". Note that in order to use this TIPS, DataMagic must be installed on your computer. For information on where to obtain DataMagic and how to install it, please refer to the separate article "Installing DataMagic".

How to operate

Step 1 - Prepare the source data

First, prepare the CSV data to be converted. Once the data is ready, save it in the following folder on the PC where DataMagic is installed.
C:\work\SAMPLE12\ (saved with the file name "in")

Source data: Name: There is sometimes a space between the first and last name, and sometimes there isn't. Address: There are different notations, such as "x-chome x-banchi x-go" and "xxx". Also, numbers are written in both full-width and half-width characters. Apartment name: I want to interpolate the missing values.

»Download the source data (Note: The sample file is in zip format. Please unzip it before use.)

Step 2 - Download and configure DataMagic script file

Download the script file below and save it in the folder where DataMagic is installed.
C:\work\SAMPLE12\ (saved as "sample12.igen")

Launch DataMagic management screen and import the downloaded script file into DataMagic. To import, click the [Import Management Information] icon under [Tools] on the start screen. Specify the downloaded file on the Import Management Information screen and click the Import button. The ID SAMPLE12 will then be registered on the Data Processing Information List screen.

»Download the script file (Note: The sample file is in zip format. Please unzip it before use.)

Step 3 - Run DataMagic

Double-click the data processing information ID SAMPLE12 registered in step 2 and check that the file names for the input and output settings are set correctly. Open the data processing information ID SAMPLE12 from the screen and click the "Execute" button at the top of the screen.

Step 4 - Check the execution results

When execution is complete in step 3, a file called "out" will be created in C:\work\SAMPLE12\, which is specified in the output settings. Check this file using an editor such as Notepad.

Execution result data Name: Remove spaces between first and last names, unify addresses to "xxx" and numbers to half-width, apartment name: Set "NULL" string to parts with no value

lastly

This time, we introduced examples of "removing spaces," "replacing block, address, and number with "XXX," and "completion of values." The REPLACE_REG function we used can use regular expressions to search for replacement characters. This allows you to set very detailed replacement conditions, making it possible to search for and replace a wide variety of strings. Please download DataMagic trial version and try out the technical column explanations.

  • The trial version is free to use for 60 days.
  • After you sign up for the trial version, you will receive 90 days of free technical support.

DataMagic Column List

Related Content

Return to column list