Chapter 2. Planning the directory data

PDF

The directory data can contain user names, email addresses, telephone numbers, user groups, and other information. The types of data you want to store in the directory determines the directory structure, access given to the data, and how this access is requested and granted.

2.1. Introduction to directory data

The suitable data for a directory has the following characteristics:

The data is read more often than written.
The data is expressible in attribute-value format (for example, surname=jensen).
The data is useful for not only one person or a group. For example, several people and applications can use an employee name or a printer location.
The data is accessed from more than one physical location.

For example, preference settings of an employee for a software application are not good for the directory because only a single instance of the application needs access to the information. However, if the application can read the preference settings from the directory and users want to use the application according to their preferences from different sites, then including such settings in the directory is useful.

2.1.1. Information to include in the directory

You can add to an entry useful information about a person or asset as an attribute. For example:

Contact information, such as telephone numbers, physical addresses, and email addresses.
Descriptive information, such as an employee number, job title, manager or administrator identification, and job-related interests.
Organization contact information, such as a telephone number, physical address, administrator identification, and business description.
Device information, such as a printer physical location, a printer type, and the number of pages per minute that the printer can produce.
Contact and billing information for a corporation trading partners, clients, and customers.
Contract information, such as the customer name, due dates, job description, and pricing information.
Individual software preferences or software configuration information.
Resource sites, such as pointers to web servers or the file system of a certain file or application.

Using the Directory Server for purposes beyond server administration requires planning what other types of information to store in the directory. For example, you may include the following information types:

Contract or client account details
Payroll data
Physical device information
Home contact information
Office contact information of different sites within the enterprise

2.1.2. Information to exclude from the directory

Red Hat Directory Server manages well large data volumes that client applications read and occasionally update, but Directory Server is not designed for handling large, unstructured objects, such as images or other media. You should maintain these objects in a file system. However, the directory can store pointers to these types of applications by using FTP, HTTPs, and other URL types.

2.2. Defining directory needs

When designing the directory data, you can think not only of the data that is currently required but also how the directory (and organization) is likely to change over time. Considering the future needs of the directory during the design process influences how the data in the directory is structured and distributed.

Consider the following points:

What do you want to have in the directory today?
What immediate problem you want to solve by deploying a directory?
What immediate needs of the directory-enabled application that you use?
What do you want to add to the directory in the near future? For example, an enterprise uses an accounting package that does not currently support LDAP, however this accounting package will be LDAP-enabled in a few months. Identify the data used by LDAP-compatible applications, and plan for the migration of the data into the directory as the technology becomes available.
What information you want to store in the directory in the future? For example, a hosting company can have future customers with different data requirements than the current customers, such as storing images or media files. Planning this way helps you to identify data sources that you have not even considered.

2.3. Performing a site survey

A site survey is a formal method for discovering and characterizing the directory contents. Plan more time for performing the survey, because preparation is crucial for the directory architecture. The site survey consists of the following tasks:

Identify the applications that use the directory.
Determine the directory-enabled applications you deploy across the enterprise and their data needs.
Identify data sources.
Survey the enterprise and identify data sources, including Active Directory, other LDAP servers, PBX systems, human resources databases, and email systems.
Characterize the data the directory needs to contain.
Determine what objects should be in the directory (for example, people or groups) and what attributes of these objects to maintain in the directory (such as usernames and passwords).
Determine the level of service to provide.
Decide the availability of directory data for client applications, and design the architecture accordingly. The directory availability influences how you configure data replication and chaining policies to connect data stored on remote servers.
Identify a data supplier.
A data supplier contains the primary source for directory data. You may mirror this data to other servers for load balancing and recovery purposes. Determine the data supplier for each piece of data.
Determine data ownership.
For every piece of data, determine the person responsible for the data update.
Determine data access.
When importing data from other sources, develop a strategy for both bulk imports and incremental updates. As a part of this strategy, try to manage data in a single place, and restrict the number of applications that can change the data. Also, limit the number of people who write to any given piece of data. Smaller groups ensure data integrity while reducing the administrative overhead.
Document the site survey.

If the directory affects several organizations by the directory, consider creating a directory deployment team that includes representatives from each affected organization to conduct the site survey.

Corporations generally have a human resources department, an accounting or accounts receivable department, manufacturing organizations, sales organizations, and development organizations. Including representatives from each of these organizations can help to perform the survey process and migrate from local data stores to a centralized directory.

2.3.1. Identifying the applications that use the directory

The applications that access the directory and the data needs of these applications guide the planning of the directory contents. The various common applications using the directory include:

Directory browser applications, such as online telephone books. Decide what information users need, and include it in the directory.
Email applications, especially email servers. All email servers require some routing information to be available in the directory. However, some can require more advanced information, such as the place on disk where a user mailbox is stored, vacation notification details, and protocol information, for example, IMAP versus POP.
Directory-enabled human resources applications. These require additional personal information such as government identification numbers, home addresses, home telephone numbers, birth dates, salary, and job title.
Microsoft Active Directory. Through Windows User Sync, Windows directory services can be integrated to function together with Directory Server. Both directories can store user information and group information. Configure the Directory Server deployment after the existing Windows server deployment so that users, groups, and other directory data can synchronize.

When assessing the applications that will use the directory, consider the types of information each application uses. The following table gives an example of applications and the information that the application uses:

Table 2.1. Example Application Data Needs

Application	Class of data	Data
Phonebook	People	Name, email address, phone number, user ID, password, department number, manager, mail stop
Web server	People, groups	User ID, password, group name, group members, group owner
Calendar server	People, meeting rooms	Name, user ID, cube number, conference room name

When you identify the applications and information that each application uses, you will understand which types of data are used by more than one application. This step in planning can prevent data redundancy in the directory, and show clearly what data directory-dependent applications require.

The following factors affect the final decision about the types of data maintained in the directory and when you migrate the information to the directory:

The data required by various legacy applications and users
The ability of legacy applications to communicate with an LDAP directory

2.3.2. Identifying data sources

To determine all the data to include in the directory, perform a survey of the existing data stores. The survey should include the following:

Identify organizations that provide information.
Locate all the organizations managing crucial information, such as the information services, human resources, payroll, and accounting departments.
Identify the tools and processes that are information sources.
Common sources for information include networking operating systems (such as Windows, Novell Netware, UNIX NIS), email systems, security systems, PBX (telephone switching) systems, and human resources applications.
Determine how centralizing each piece of data affects the management of data.
Centralized data management can require new tools and new processes. In some cases, centralization might require staffing and unstaffing in organizations.

During the survey, develop a matrix that identifies all the information sources in the enterprise as in the table below:

Table 2.2. Information sources example

Data Source	Class of Data	Data
Human resources database	People	Name, address, phone number, department number, manager
Email system	People, Groups	Name, email address, user ID, password, email preferences
Facilities system	Facilities	Building names, floor names, cube numbers, access codes

2.3.3. Characterizing the directory data

Characterize the data you want to include in the directory in the following ways:

Format
Size
Number of occurrences in various applications
Data owner
Relationship to other directory data

Find common characteristics in the data you want to include in the directory. This helps save time during the schema design stage described Designing the directory schema.

Consider the table below that characterizes the directory data:

Table 2.3. Directory data characteristics

Data	Format	Size	Owner	Related to
Employee Name	Text string	128 characters	Human resources	User entry
Fax number	Phone number	14 digits	Facilities	User entry
Email address	Text	Many characters	IS department	User entry

2.3.4. Determining level of service

The service level you provide depends on the expectations of the people who rely on directory-enabled applications. To determine the service level that each application requires, determine how and when the application is used.

As the directory evolves, the directory may need to support various service levels, from production to mission-critical level. Raising the service level after the directory deployment is difficult, so ensure the initial design meets the future needs.

For example, to eliminate the risk of total failure, use a multi-supplier configuration, where several suppliers handles the same data.

2.3.5. Considering a data supplier

A data supplier is a server that supplies the data. Storing the same information in multiple locations degrades the data integrity. A data supplier ensures that all information stored in multiple locations is consistent and accurate. The following scenarios require a data supplier:

Replication between Directory Servers
Synchronization between Directory Server and Active Directory
Independent client applications which access the Directory Server data

With multi-supplier replication, Directory Server can contain the main copy of information on multiple servers. Multiple suppliers keep changelogs and safely resolve conflicts. You can configure a limited number of supplier servers that can accept changes and replicate the data to replica or consumer servers ^[1]. Several data supplier servers provide safe failover if a server goes off-line. See TBA[Designing the replication process] for more information about multi-supplier replication.

Using synchronization, you can integrate Directory Server users, groups, attributes, and passwords with Microsoft Active Directory users, groups, attributes, and passwords. If you have two directory services, decide whether they will manage the same information, what amount of that information will be shared, and which service will supply data. Preferably, select one application to manage the data and let the synchronization process to add, update, or delete the entries on the other service.

Consider the supplier source of the data if you use applications that communicate indirectly with the directory. Keep the data changing processes as simple as possible. After deciding on the place for managing a piece of data, use the same place to manage all of the other data contained there. A single place simplifies troubleshooting when databases lose synchronization across the enterprise.

You can implement the following ways to supply data supplying:

Managing the data in both the directory and all applications that do not use the directory.
Maintaining multiple data suppliers does not require custom scripts for transfering data. In this case, someone must change data on all the other sites to prevent data desynchronization across the enterprise, however this goes against the directory purpose.
Managing the data in a non-directory application, and writing scripts, programs, or gateways to import that data into the directory.
Managing data in non-directory applications is the most ideal when you already use applications to manage data. Also, you will use the directory only for lookups, for example, for online corporate telephone books.

How you maintain the main copies of data depends on the specific directory needs. However, always keep the maintenance simple and consistent. For example, do not attempt to manage data in multiple places and then automatically exchange data between competing applications. Doing so leads to an update loss and increases the administrative overhead.

For example, the directory manages an employee home telephone number that is stored in both the LDAP directory and a human resources database. The human resources application is LDAP-enabled and can automatically transfer data from the LDAP directory to the human resources database, and vice versa.

If you try to manage changes to that employee telephone number in both the LDAP directory and the human resources database then the last place where the telephone number was changed overwrites the information in the other database. This is only acceptable if the last application that wrote the data had the correct information.

If that information is outdated (for example, because the human resources data were restored from a backup), then the correct telephone number in the LDAP directory will be deleted.

2.3.6. Determining data ownership

Data ownership refers to the person or organization responsible for making sure the data is up-to-date. During the data design phase, decide who can write data to the directory. Here are some common strategies for deciding data ownership:

Allow read-only access to the directory for everyone except a small group of directory content managers.
Allow individual users to manage their strategic subset of information, such as their passwords, their role within the organization, their automobile license plate number, and contact information such as telephone numbers or office numbers, descriptive information of themselves.
Allow a person manager to write a strategic subset of that person information, such as contact information or job title.
Allow an organization administrator to create and manage entries for that organization, enabling them to function as the directory content managers.
Create roles that give groups of people read or write access privileges. You can create roles for human resources, finance, or accounting. Allow each of these roles to have read access, write access, or both to the data that the group require. This could include salary information, government identification numbers, and home phone numbers and address.

Multiple individuals might require write access to the same information. For example, an information systems or directory management group may require write access to employee passwords. Also employees require the write access to their own passwords. While multiple people can have access to the same information, try to keep this group small and identifiable to ensure data integrity.

Additional resources

2.3.7. Determining data access

After determining data ownership, decide who gets access to read each piece of data. For example, employees home phone numbers can be stored in the directory. This data may be useful for a number of users, including the employee manager and human resources department. Employees should be able to read this information for verification purposes. However, home contact information can be considered sensitive.

Consider the following for every information stored in the directory:

Can someone read the data anonymously?
The LDAP protocol supports anonymous access and allows easy lookups for information. However, due to this anonymity, where anyone can access the directory, use this feature wisely.
Can someone read the data widely across the enterprise?
You can set access control the way that a client must log in to (or bind to) the directory to read specific information. Unlike anonymous access, this type of access control ensures that only members of the organization have access to directory information. In addition, the Directory Server access log contains a record about who accessed the information.
For more information about access controls, see Designing access control.
Is there an identifiable group of people or applications that must access the data?
Anyone who has write privileges to the data also needs read access (with the exception of write access to passwords). The directory can also contain data specific to a particular organization or project group. Identifying these access needs helps determine what groups, roles, and access controls the directory needs.
For information about groups and roles, see Designing the directory tree. For information about access controls, see Designing access control.

Making these decisions for each piece of directory data defines a security policy for the directory. These decisions depend upon the nature of the site and the security already available at the site. For example, having a firewall or no direct access to the Internet means it is safer to support anonymous access than if the directory is placed directly on the Internet. Additionally, some information may only need access controls and authentication measures to restrict access adequately. Other sensitive information may need to be encrypted within the database as it is stored.

Data protection laws in most countries govern how enterprises maintain and access personal information. For example, the laws may prohibit anonymous access to information or require users to have the ability to view and edit information in entries that represent them. Check with the organization legal department to ensure that the directory deployment complies with data protection laws in countries where the enterprise operates.

The creation of a security policy and the way it is implemented is described in detail in Designing a secure directory.

In replication, a consumer server, or replica server, receives updates from a supplier server or hub server.

2.4. Documenting the site survey

Due to the complexity of data design, document the results of the site surveys. Every step of the site survey can use simple tables to track data. You can build a supplier table that outlines the decisions and outstanding concerns. Preferably, use a spreadsheet where you can easily sort and search the content.

The table below identifies data ownership and data access for each piece of data identified by the site survey.

Table 2.1. Example: Tabulating data ownership and access
Data Name	Owner	Supplier Server/Application	Self Read/Write	Global Read	HR Writable	IS Writable
Employee name	HR	PeopleSoft	Read-only	Yes (anonymous)	Yes	Yes
User password	IS	Directory US-1	Read/Write	No	No	Yes
Home phone number	HR	PeopleSoft	Read/Write	No	Yes	No
Employee location	IS	Directory US-1	Read-only	Yes (must log in)	No	Yes
Office phone number	Facilities	Phone switch	Read-only	Yes (anonymous)	No	No

Each row in the table indicates the type of information being assessed, the departments that have an interest in it, and how to use and access the information. For example, on the first row, the employee names data have the following management considerations:

Owner. Human Resources owns this information and therefore is responsible for its updates and changes.
Supplier Server/Application. The PeopleSoft application manages employee name information.
Self Read/Write. One can read their own name but not write (or change) it.
Global Read. Employee names can be read anonymously by everyone who has access to the directory.
HR Writable. Human resources group members can change, add, and delete employee names in the directory.
IS Writable. Information services (IS) group members can change, add, and delete employee names in the directory.

2.5. Repeating the site survey

You might need more than one site survey, particularly if an enterprise has offices in multiple cities or countries. The informational needs might be so complex that several different organizations have to keep information at their local offices rather than at a single, centralized site.

In this case, each office that keeps a main copy of information should perform its own site survey. After the completion of the site survey, the results of each survey should be returned to a central team (probably consisting of representatives from each office) for use in the design of the enterprise-wide data schema model and directory tree.

^[1] In replication, a consumer server, or replica server, receives updates from a supplier server or hub server.

Chapter 2. Planning the directory data

2.1. Introduction to directory data

2.1.1. Information to include in the directory

2.1.2. Information to exclude from the directory

2.2. Defining directory needs

2.3. Performing a site survey

2.3.1. Identifying the applications that use the directory

2.3.2. Identifying data sources

2.3.3. Characterizing the directory data

2.3.4. Determining level of service

2.3.5. Considering a data supplier

2.3.6. Determining data ownership

2.3.7. Determining data access

2.4. Documenting the site survey

2.5. Repeating the site survey

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links