Content Model (data)

What is CM?

Q: What does CM stand for?

A: CM stands for Content Model, which is a data protocol optimized for quantitative modeling. It is structured as a 2D pandas dataframe where the index is generally a time-series (1D) and the columns are security IDs. All data is structured in this manner, and if more dimensions are needed, matrices are included in the values.

Example of a simple CM:

DateSecurity ID 1Security ID 2Security ID 3

2020-01-01

100.5

101.2

102.3

2020-01-02

101.0

102.1

103.5

2020-01-03

102.0

103.0

104.0

This table represents a simple Content Model with a time-series index (Date) and security IDs as columns with their corresponding values.

CM (Content Model) Loading Issues

Q: Why can't I load the CM (Content Model)?

A: First, check if there is a typo in the identity name. Typos can prevent the CM from being recognized and loaded correctly.

Additional Information:

  • If the data was created recently, it might not have been synchronized with the cache yet.

  • When the environment is set to development, a new staging database is used. If the new identity has not been synchronized with this database, it may not be available yet.

Data accessibility Limitations

Q: Why is data from the most recent two years not accessible?

A: To prevent overfitting during modeling and to maintain data integrity, access to the most recent two years of data is restricted. This policy ensures that historical data can be used for analysis without the risk of in-sample bias and overfitting. Data older than two years is available in the research phase before model submission.

CM Loader Options

Q: What options are available for the loader in CM?

A: The loader options in CM include an 8-digit integer type start and end (e.g., 20230101), as well as fill_nan. The loader is an option that goes into the get_df() function. The fill_nan option is used to prevent in-sample bias in the data. In general, only start and end are used.

Last updated