Test dataflows and evaluate data exchanges for Unicode-readiness

Last updated on July 9, 2024

Find guidance on how to test and evaluate data exchanges for Unicode-readiness. This is last stage of  how to assess your systems for Unicode-readiness

On this page:


Investigate system data flow

The final task to confirm if your system is ready for Unicode is to investigate how a text string that contains Unicode characters flows into, through, and beyond the system.

Consider a simple system that:

  1. Inputs a name
  2. Stores it in a database
  3. Does a query on the name to find related information
  4. Outputs this related information

 You can test this system by

  • Inputting a name containing Unicode characters
  • Then checking that the output is as expected and that no errors are generated.

Dataflow diagrams

Dataflow diagrams can be used to model the flow of data:

  • Into and out of the system
  • Into and out of the processes within the system
  • Into and out of data stores such as files and databases

 

To access dataflow diagrams or resources and standards for the creation of your diagrams within B.C. government, contact the branch responsible for IM/IT system management: 

  • Information Management Branch
  • Computing Services Branch

For a typical system that inputs and outputs “name” data, the data flow diagram might look like this:

Dataflow diagrams illustrate the pathways that data can take when entering a system, traversing a system, and leaving a system. Different symbols are used to differentiate between people/processes supplying data, internal processes that act on the data, people/processes consuming data, and data stores where the data might be saved.

The system might guard against invalid input being entered, then do some processing on the accepted data. It may store the data in a query-able database and/or make the data available to consumers through an API. A dataflow diagram captures all of these touchpoints when data needs to be handled in a way that supports Indigenous languages. 

Dataflows are useful in assessing whether a system will properly handle specific types of data. View example dataflows


Dataflow architecture

Dataflow architecture, typically expressed in dataflow diagrams , have four types of objects:

  • Entities
  • Internal processes
  • Data stores
  • Dataflows

Entities (Actors)

Entities, also known as actors, are the users or processes that input or output data to or from a system. In a dataflow diagram these are indicated using a rectangle that includes text describing what the user or process is doing.

To assess Unicode-readiness, you must identify all the ways that strings containing Unicode characters can enter and leave the system. This includes checking that Unicode characters display correctly on output entities like:

  • PDF documents
  • Screen outputs
  • Printed reports

Review test data and guidance that can be used to test your systems for Indigenous language support. 

Processes

A process is a sequence of actions performed on a data element as it moves from its input to its destination.  The end point determines whether the data will leave the system or be stored in it. Data flow diagrams represent processes with circles. 

To assess Unicode-readiness you must identify all the processes that can operate on strings containing Unicode characters.

Data stores

Data stores are the places where data gets stored within a system. Data stores include:

  • Files in a file system
  • Tables in a database

Data stores are represented by parallel horizontal lines (a box with no sides).

Dataflows

Dataflows connect entities with processes and data stores. They are represented by directed lines.

In assessing Unicode-readiness, each dataflow needs to be tested.

File formats and transfer types

Where applicable, include notes to identify the types of files being read or written (e.g., PDF, CSV, Excel, etc.), and any file transfer protocols used (e.g., FTP, HTTPS, SFTP etc.)

Evaluate data exchanges

Assess compatibility of other systems that exchange data with yours. If these systems are not compatible with Unicode, take measures to ensure that data is exchanged correctly.

Assessment is the only way to know whether an existing system is Unicode-ready. Assessing your systems will also help you understand which parts, if any, are problematic. 

Update or replace your system

Summarize the findings of your assessment and find out how you can update or replace your system