City Vs. Rural Population: A Regional Analysis
Understanding population distribution is vital for policymakers, researchers, and anyone interested in global trends. This analysis delves into the distribution of population between urban and rural areas across different regions. We'll explore how to access data, interpret key metrics, and understand the factors driving urbanization. So, if you've ever wondered about the balance between city dwellers and rural residents, you're in the right place!
Understanding the Data Requirements (R24)
This analysis stems from a specific requirement (R24) within a larger project focused on urbanization. The goal is to provide a REST endpoint that delivers crucial population data for selected regions. This data will empower data analysts, policymakers, researchers, API consumers, and data journalists to analyze urbanization trends effectively. Here’s a breakdown of the requirement:
- Epic: Urbanisation (% in cities)
- As a: data analyst / policy maker / researcher / API consumer / data journalist
- I want: a REST endpoint that returns, for a selected region, the total population, the population living in cities, the population not living in cities, and the corresponding percentages
- So that: I can analyse urbanisation within a region and the assessor can verify SET09803 requirement R24
- Dataset scope: region
To fulfill this requirement, we need a system that can efficiently retrieve and present data on total population, urban population, rural population, and their respective percentages for any given region. This involves designing an API endpoint and ensuring accurate data computation.
Data Points and Output
The system should provide a single-row output with the following fields:
name(STRING): Region nametotal_population(INT): Total population in the regionin_cities_population(INT): Population living in cities within the regionin_cities_percent(DECIMAL, 0.0–100.0): Percentage of the population living in citiesnot_in_cities_population(INT): Population not living in cities within the regionnot_in_cities_percent(DECIMAL, 0.0–100.0): Percentage of the population not living in cities
This comprehensive data set will allow for a detailed understanding of urbanization levels across different regions.
API Endpoint and Parameters
To access this data, we'll use the following API endpoint:
GET /api/v1/population/region/{region}/cities-vs-noncities
- Parameters
region(STRING, case-insensitive): A valid region name from thecountry.Regiondataset. The system should handle case-insensitive inputs for ease of use.
Computation Logic (SQL Reference)
The data is computed using the following SQL queries as a reference:
-
Total population in the region:
SELECT SUM(co.Population) AS total_pop FROM country co WHERE co.Region = :region; -
Population living in cities in that region:
SELECT SUM(ci.Population) AS city_pop FROM country co JOIN city ci ON ci.CountryCode = co.Code WHERE co.Region = :region;
These queries efficiently aggregate population data to provide the required metrics. Understanding the SQL logic helps in validating the accuracy of the API responses.
Rules and Assumptions
To ensure the data is consistent and reliable, several assumptions and rules are in place:
- Inputs are trimmed, and comparisons are case-insensitive, ensuring user-friendly input handling.
- The system validates the
regioninput against known values and rejects invalid inputs. - If the
total_popis 0, the API responds with a 204 status code (empty but valid). - Percentages are rounded to 1 decimal place and must sum to 100.0 ± 0.1, ensuring accuracy in calculations.
- Parameterized queries are used to prevent SQL injection vulnerabilities.
- Appropriate status codes are returned for different scenarios:
- Invalid input: 400
- Unknown item: 404
- Empty but valid: 204
These rules are critical for maintaining data integrity and providing a robust API.
Acceptance Criteria
The acceptance criteria, defined using Gherkin syntax, outline the expected behavior of the API.
- Scenario: Region urbanisation summary
- Given the world DB is loaded
- When I GET /api/v1/population/region/Southeast%20Asia/cities-vs-noncities
- Then the status is 200
- And the JSON has fields name,total_population,in_cities_population,in_cities_percent,not_in_cities_population,not_in_cities_percent
- And in_cities_population + not_in_cities_population = total_population
- And in_cities_percent + not_in_cities_percent is within 0.1 of 100.0
This scenario checks the basic functionality of the API and the correctness of the returned data.
- Scenario: Case-insensitive region
- When I GET /api/v1/population/region/wEsTeRn EuRoPe/cities-vs-noncities
- Then the status is 200
This ensures the API correctly handles case-insensitive region names.
- Scenario: Unknown region
- When I GET /api/v1/population/region/Unknownland/cities-vs-noncities
- Then the status is 404
This validates the API's ability to return a 404 status code for unknown regions.
- Scenario: Empty but valid region
- Given a region with total population = 0
- When I GET /api/v1/population/region/Antarctica/cities-vs-noncities
- Then the status is 204
This confirms the API's behavior when a region has a total population of 0.
These scenarios provide a comprehensive set of tests to ensure the API functions as expected. Remember, rigorous testing is key to a reliable API.
Non-Functional Constraints
Non-functional constraints are crucial for ensuring the performance and reliability of the API.
- Completes ≤ 1.5s on a typical laptop/CI runner
- Memory ≤ 256 MB; UTF-8 JSON responses
- Checkstyle/PMD/CodeQL: no critical findings
- Endpoint + example documented in README
These constraints ensure the API is performant, efficient in resource usage, and adheres to coding standards. Performance matters! A fast and efficient API is crucial for a good user experience.
INVEST Self-Check
The INVEST criteria are used to ensure the task is well-defined and manageable:
- [x] Independent
- [x] Negotiable
- [x] Valuable
- [x] Estimable
- [x] Small
- [x] Testable
This self-check confirms that the task meets the necessary criteria for effective development and implementation. Always remember to INVEST in your tasks for better outcomes!
Priority and Estimate
The priority for this task is P0 - Critical, indicating its importance. The estimated effort is 2 story points, suggesting a relatively small but crucial piece of work. Prioritizing tasks helps in effective project management.
Definition of Done (DoD)
The Definition of Done (DoD) outlines the criteria that must be met before the task is considered complete:
- [x] Unit tests added/updated
- [x] Integration tests added/updated
- [x] CI green (build, tests, coverage)
- [x] CodeQL/PMD no critical findings
- [x] README endpoints and example updated
- [x] Evidence table row + screenshot added
- [x] PR created (base: develop) and linked to this story
- [x] Linked to parent Epic / Requirement label
- [x] Reviewer approval obtained
The DoD ensures that all aspects of the task are completed to a high standard before it is considered finished. Quality is key! A clear DoD helps ensure a high-quality deliverable.
Useful Links
- PR: # (add when opened)
- Endpoint: GET /api/v1/population/region/{region}/cities-vs-noncities
- Evidence screenshot: docs/evidence/r24_population_region_cities_vs_noncities.png
Conclusion
Analyzing the distribution of population between urban and rural areas is crucial for understanding regional dynamics and urbanization trends. This detailed exploration of the requirements, API design, and testing criteria provides a solid foundation for developing a reliable and valuable tool. By focusing on data accuracy, performance, and usability, we can empower policymakers and researchers to make informed decisions based on population data. Remember, understanding urbanization is key to planning for the future.