Assignmentbest

Individual and Group Assignment: BUSS6002 Data Science in Business

Individual and Group Assignment Learning Outcomes Assessed
1. Identify types and sources of data, data quality issues and interact with data storage systems
2. Explain and apply foundational techniques of data analysis to business problems
3. Categorise business problems in order to select appropriate data analysis techniques and tools
4. Interpret and evaluate the outputs of data analysis techniques and tools
5. Evaluate data science capabilities of businesses and apply data science process models
7. Communicate effectively with technical and non-technical audiences
General Notices
1. All plots, analyses and technical work must be completed using Python
2. The late penalty is 5% of the assignment mark per day starting at the due date.
3. The assignment is marked anonymously.
4. Collusion and plagiarism are obvious to markers and will not be tolerated.
Academic Integrity
Please be aware of the University’s academic integrity policies. Issues of academic integrity are
taken seriously by the University and the BUSS6002 team. If you are suspected of dishonest
behaviour you will be referred to the Academic Integrity Office who will process your case. This
may result in delayed results, mark reduction, failure of the unit or expulsion.
Dishonest behaviour includes but is not limited to:
using contract cheating services
plagiarism such as copying phrases, paragraphs etc
not appropriately referencing
It is unfair to leave your group members feeling like they need to complete work you were
supposed to complete. It is even more unfair (and dishonest) to accept marks for group work if
you have let others do your work for you.
You are encouraged to refer to the full policy and guidelines on the University of Sydney website
https://www.sydney.edu.au/students/academic-integrity.html
(https://www.sydney.edu.au/students/academic-integrity.html)
You can access the Academic Honesty module on canvas at any time
https://canvas.sydney.edu.au/courses/29833 (https://canvas.sydney.edu.au/courses/29833)
Group Work
This assignment requires you to work with others in a group. You are expected to equally
contribute.2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 2/9
From the University’s website:
Acting with academic honesty in group work also means that you have to commit fully to
participating in group discussions and meeting agreed deadlines. It is unfair to leave your group
members feeling like they need to complete work you were supposed to complete. It is even more
unfair (and dishonest) to accept marks for group work if you have let others do your work for you.
Each group will be awarded a group mark per the marking criteria. Individual adjustments to
grades may be made if there is a dispute in a group or the quality/quantity of contributions made
by individuals are significantly different. In such a case the unit coordinator will seek meeting
minutes and peer review reports from individuals within a group to decide on individual marks.
Our recommendations for effective group work:
1. Discuss problems with your group members as soon as you can
2. Immediately notify the unit coordinator (buss6002.admin@sydney.edu.au) if you have
problems that cannot be resolved – do not wait until after the submission date
3. Work collaboratively on each section of the final report.
1. Do not divide the group report into sections for each student to complete.
2. Doing so might be disastrous if one group member does not pull their weight or is unable
to complete the work due to changes in personal circumstance.
4. Set deadlines well ahead of the due date
1. Do not wait until the day before to finalise the report
2. This will lead to a poor-quality report or in the worst case an incomplete submission
5. Keep accurate meeting minutes
1. In case of a dispute between group members we will require some evidence of the issue
2. If you do not record meeting minutes or collect other forms of evidence, it will be difficult for
us to take any action
3. We also suggest that you communicate with your group members in English
Group Size and Forming Groups
Groups have been formed. Refer to this announcement for details
https://canvas.sydney.edu.au/courses/36716/discussion_topics/702139
(https://canvas.sydney.edu.au/courses/36716/discussion_topics/702139)
Meeting Minutes
Each group is required to submit at least 3 meeting minutes as the appendix attached to the final
report. A template will be provided for preparing meetings minutes. You may use the template2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 3/9
provided or a template of your choice.
Peer Feedback
We may ask for peer review from each student within a group. The instructions about how to do
this will be released later.
Allowed Packages
For the EDA component of this task you must use only the Python packages covered in tutorials
e.g. Pandas, NumPy, SciPy, Matplotlib, sqlite3, scikit-learn or statsmodels. Use of automatic EDA
packages will result in a Fail grade for this component.
For modelling you may use more advanced models or feature engineering and packages that
provide such functionality. However, you are required to provide a technical explanation of the
model and feature engineering. If you are not confident in your understanding or explanation, then
we recommend that you use simpler techniques.
Description
Please note that the scenario for this assignment is fictional.
Modelling and forecasting the volatility of financial asset returns is of great importance to financial
institutions world wide. Volatility forecasts are used in risk management, derivative pricing and
hedging, market making, market timing, portfolio selection and many other financial activities
(Engle and Patton, 2000)
(https://sydney.primo.exlibrisgroup.com/permalink/61USYD_INST/2rsddf/cdi_repec_primary_tafquant
f_v_3a1_3ay_3a2001_3ai_3a2_3ap_3a237_245_htm) . Volatility is usually interpreted as a measure
of risk, i.e., an increase in volatility points to an increase in the dispersion of returns, which then
leads to an increase in the investment risk of the underlying asset. Recent research has shown
that "realised variance (RV)" is an efficient measure of volatility for daily returns of a financial
asset. The RV of day is given by the sum of squared intra-daily returns:
where is the -th intra-daily return (typically calculated using prices sampled at 5-minute
intervals). Because is always positive, it is often more convenient to model its natural
logarithm . You are hired by a Sydney-based hedge fund, Alpha Capitals, to develop a
predictive model for the log RV of daily stock returns of the Commonwealth Bank of Australia
(CBA).
The management team at Alpha Capitals has provided you with a dataset containing the
observed values of log RV and a set of potential features constructed using intra-daily returns
(based on prices sampled at 5-minute intervals) of CBA from January 7, 2003 to August 20, 2021.
Note that you are not required to calculate the values of log RV yourself; they are provided to you
in the dataset. All the features are constructed using only past information to ensure that they are
t
RVt = ∑
m
j=1 r2
t,j
rt,j j
RVt
log RVt2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 4/9
free of look-ahead-bias. E.g., a feature for 2021-08-20 only uses all the information up to and
including 2021-08-19, and this is true for all of the provided features. Please pay attention to the
time indices inside "[ ]" after each variable description in the data dictionary file; they tell you
exactly which days are used in the calculation of the feature. The features are organised into
three groups:
1. Log RV based: features constructed using past values of the log RV itself. These features are
designed to capture any serial dependencies of log RV, i.e., past values of log RV might be
predictive of future values of log RV.
2. Range based: features constructed using past values of the range of intra-daily returns, i.e.,
calculated by taking the difference between the highest and the lowest returns within a day.
The range based features provide a measure of dispersion or scale of the past intra-daily
returns. That is, the higher the range, the more dispersed the returns are within a day.
3. Quantile range based: features constructed using past values of the quantile range of intra
daily returns. The quantile range is the difference between the 95th percentile and the 5th
percentile of the returns within a day. The quantile range can be interpreted as an outlier
robust version of the range, since it only considers the middle 90% of the observations.
You are hired as a data analytics professional to help Alpha Capitals address two key questions:
1. Which features are predictive of future log RV of CBA at least one-day-ahead?
2. Can a model be built to forecast one-day-ahead the log RV of CBA?
Dataset
The management team of Alpha Capitals has provided you with two files:
1. (https://canvas.sydney.edu.au/courses/36716/files/19476840?wrap=1) cba_log_rv.sqlite (https://can
vas.sydney.edu.au/courses/36716/files/19476840/download)
2. (https://canvas.sydney.edu.au/courses/36716/files/19476846/download) data_dictionary.txt (http
s://canvas.sydney.edu.au/courses/36716/files/19476846/download)
The first file is an SQLite file containing the daily observations of log RV and the features; the
second file contains descriptions of the variables and the database structure.
Individual Task
Due Date : 23:59 September 27, 2021
Weight: 15%2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 5/9
Max Length: 500 words +/- 10% (excluding code)
Type: Individual Submission
Submission Type: Jupyter Notebook (.ipynb) via Canvas
Submit Here (https://canvas.sydney.edu.au/courses/36716/assignments/328384)
In this task you are required to produce a short EDA vignette that explores how the features from
one of the following sets are predictive of future log RV at least one-day-ahead.
1. Log RV based features.
2. Range based features.
3. Quantile range based features.
Your vignette will consist of a Jupyter Notebook, in which you will use Markdown cells to provide
commentary on your EDA process.
Due to the small word limit we do not expect a large amount of detail. This vignette should be
used as a way to share your preliminary findings with your group which is to be expanded on in
the group report.
As a team you will need to assign each team member to one of the three feature sets. Each team
member is expected to work independently as you will be marked separately for this component
of the assignment. However, you are allowed to discuss your hypothesis, approaches and seek
suggestions or feedback from your team members.
Suggested Structure
1. Heading – include which attribute you are exploring
2. Main body (approx. 400 words) – this should contain the plots, tables, corresponding
commentary and your Python code that make up your exploratory analysis
3. Further work (approx. 100 words) – this is a small note to your teammates about how your
analysis could be further refined and used in your final report
Submission Items
These items are to be submitted via Canvas individually:
1. A Jupyter Notebook
1. Filename format: BUSS6002_FEATURESETX_SID.ipynb
2. If you chose feature-set 2 (range based features) and your SID is 55554444 then your
filename would be BUSS6002_FEATURESET2_55554444.ipynb2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 6/9
Marking Criteria
View on google docs
(https://docs.google.com/document/d/1G6G8AL80ujChRejJwN0n23N6OVrj6OkquyiCVoJXR44/edit?
usp=sharing)
Group Task
Due Date : 23:59 November 5, 2021
Weight: 25%
Max Length: 15 pages, excluding references, meeting minutes and appendices
Type: Group Submission
Submission Type: PDF and Jupyter Notebook (.ipynb) via Canvas
The context has
not clearly
informed the
analysis.
as o ed
the analysis.
e co te t s
carefully
considered
and has clearly
informed the
analysis.
FA PS CR DI HD
Results
Presentation
4 marks
LO2, LO6
The figures or
tables produced
are
inappropriate or
do not support
the analysis.
The figures
produced are of
sub standard
quality.
Grammar and
spelling pose
significant
barriers to the
reader’s
comprehension.
The figures or
tables produced
are mostly
appropriate and
support the
analysis.
The figures
produced are of
poor visual
quality.
Written
language
presents some
barrier to the
reader’s
comprehension.
The figures or
tables produced
are mostly
appropriate and
support the
analysis.
The figures
produced are of
average visual
quality.
The written
analysis may
contain some
grammatical or
spelling errors,
but none that
pose any
significant
barrier to
reader
comprehension.
The figures
or tables
produced are
appropriate
and support
the analysis.
The figures
produced are
of good
visual quality
and easy to
read.
Written
language
demonstrates
outstanding
precision,
clarity, and
concision.
The figures or
tables
produced are
appropriate
and support
the analysis.
The figures
produced are
of high visual
quality, are
well appointed
and easy to
read.
Written
language
demonstrates
outstanding
precision,
clarity, and
concision.
FA PS CR DI HD
Notebook
Presentation
3 marks
Features of the
Jupyter
Notebook are
not used
appropriately.
The notebook is
Features of the
Jupyter
Notebook are
used mostly
appropriately.
The notebook is
Features of the
Jupyter
Notebook have
been used
appropriately.
The notebook is
Features of
the Jupyter
Notebook
have been
used
appropriately
Features of the
Jupyter
Notebook have
been used to
pleasing effect.
The notebook
Submit Report
(%24CANVAS_OBJECT_REFERENCE%24/assignments/ge694994258caeb743b4361359ce34cb0)
Submit Notebook
(%24CANVAS_OBJECT_REFERENCE%24/assignments/g02e458d095b051583d2ab6d25b1af17b)2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 7/9
This task requires you to produce a final report which is the deliverable for the Alpha Capitals
project. This report must address the two key questions.
To increase the integrity of your findings, others must be able to replicate your work. Therefore,
the management team of Alpha Capitals has asked that you provide all code used to produce the
report. This code should be runnable without error, neat and documented.
Q1: Which features are predictive of future log RV of CBA at least one-day-ahead?
To answer this question, you should provide a cohesive EDA that combines and extends your
individual analyses. You may use any combination of EDA techniques and traditional statistical
testing. This might include non-graphical or graphical EDA, clustering or preliminary statistical
testing. You must provide accompanying written explanations of your intentions and subsequent
findings. Finally you should provide a summary statement.
It is up to you how you choose to answer this question. You will be assessed on the depth, quality
of written analysis/justification and the clarity/presentation of results.
We have not covered how to do statistical testing in tutorials. However, we have equipped you
with the skills and knowledge to research this on your own should you feel the need to take this
approach.
Q2: Can a model be built to forecast one-day-ahead the log RV of CBA?
This question will require you to carefully consider the goal of the problem and the available data.
You must provide a description of how one could build a model to answer question 2 including:
technical explanation of the model
discussion of how such a model could be evaluated
a thorough discussion of assumptions and shortcomings of said model
a discussion of how to interpret the model generally and an example interpretation of the
prototype model
It is expected that you will have to make simplifications or assumptions to make the model work. It
is important that you clearly explain your steps and provide justification for your work.
You must include a proof of concept in your Jupyter Notebook. The prototype only needs to be a
close facsimile of the model you describe in your report that implements the core components. In
your report you may describe more advanced techniques which could be incorporated later.
Reflection and Future Work
Provide a reflection of how you have or have not followed the CRISP-DM process model during
this project and which types of analytical capabilities you exercised. You must:
explain how the work you have done in each stage or question of the project aligns with the
different phases of the process model2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 8/9
identify where and when you were exercising specific analytic capabilities
provide references where to appropriately justify your reflection
Suggested Structure
1. Q1
2. Q2
3. Reflection
4. References
5. Appendices
6. Meeting Minutes (DOWNLOAD TEMPLATE
(https://canvas.sydney.edu.au/courses/36716/files/18423711?wrap=1)
(https://canvas.sydney.edu.au/courses/36716/files/18423711/download?download_frd=1) )
Submission Items
These items are to be submitted via Canvas. Only one group member needs to submit:
1. A report in PDF format
1. Filename format: BUSS6002_REPORT_GROUPX.pdf
2. If you are in Group 10 then your filename will be BUSS6002_REPORT_GROUP10.pdf
2. A Jupyter Notebook that contains all the code used in the development of your report.
1. The notebook will be used to assist our marking process and as proof of work.
2. Filename format: BUSS6002_GROUPX.ipynb
3. If you are in Group 42 then your filename will be BUSS6002_GROUP42.ipynb
Marking Criteria
View on google docs (https://docs.google.com/document/d/1_RryxHPTwuFP-
MltwGqvekirs7AkEB4fg6mLG0JWW5w/edit?usp=sharing)2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business
https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment


微信或qq:1197239543

电子邮箱:1197239543@qq.com